Installing Doc Finder - a tutorial
How it works
-
Doc Finder is a suite of scripts and programs based on freeWAIS.
-
The basic idea is to create indexes of all the html documents beforehand,
and then to do searches against this index database to locate the html
documents.
-
Nightly, a Unix script makeindex first locates all the html files
on the server, and hands them over to waisindex , which creates
full-text indexes of each document and store them in a wais database.
-
When Doc Finder is invoked by a user through Mosaic, a Perl script
docfind.pl prompts the user for keywords. The script then passes
the keywords
to waissearch which will search the database against the keywords.
-
The results are then filtered through a C-program jwais-html which
add links to each query result (which is a filename). The result of jwais-html
is what the user sees.
Installation
-
Download freeWAIS .
The version I'm using is 0.3.
Clicking on
this
will download the compressed tar archive.
Or browse the
ftp subdirectory .
(If the archive has a .gzextension, you need a tool called
gzipto uncompress it.
gzip version 1.2.2 is available from anonymous ftp from prep.ai.mit.edu
[18.71.0.38] and its mirrors, in the files /pub/gnu/gzip-1.2.2.tar).
From it, compile and make waisindex and waissearch .
- Download docfinder.tar . Click here to download.
-
Edit jwais-html.c.
You need to change 2 defines :
- MYHOST- This is the prefix of a URL that would reach your
documents on the server, eg http://arena.ncsa.uiuc.edu:4000.
-
PATHPREFIX- This is the front portion of the full path to
a document on your server that corresponds to the DocumentRoot
parameter in the srm.conf file in the HTTP conf directory .
An example will make this clear.
-
Compile and make jwais-html:
cc jwais-html.c -o jwais.html
-
Create 2 subdirectories, binand db.
-
In binyou should put waissearch, waisindex,
jwas-html and jwais-html.c. Set their modes to executable.
-
Move docfind.plto a sub-directory that the HTTP server knows to
contain executable, ie set the ScriptAliasparameter
in srm.confto point to the directory where docfind.plis.
- Edit docfind.pl.
You need to tell the script the full path to waissearch, jwais-html,
your wais database, and an optional link to information about youself.
Follow instructions in the script to do this.
- In the other sub-directory db, place the shell-scripts
makeindexand makeall. Set their modes to executable.
This directory will contain your wais index database. The index database
is roughly twice the size of all the html files you will be indexing.
-
Edit makeindexand type in the correct path to waisindex.
You can also change the database name if you like (it's currently
called webdb).
-
Edit makealland enter the subdirectories containing your
html docs.
-
Makeallrepeatedly calls makeindexwith a new
directory containing html docs. I use it because it's a convenient
way to index only certain directories and leave out others.
Also, I must be careful
that I don't pass too many files to waisindex (in makeindex)
which can crash, which is why I'm only passing a sub-directory's worth
of files to makeindexat a time.
- Create your index database by running makeall.
This would create the database (a bunch of 7 files webdb.*)
in the dbdirectory.
-
Your Doc Finder is ready! . The searching is entirely driven
by the Perl script.
You can test this offline by giving it a keyword as input, eg:
docfind.pl jason
to look for "jason" in the database.
Your Doc Finder can then be invoked via Mosaic by a URL of the form
http://your-machine:your-port/bin/docfind.pl
Tips
-
You can set up a cronjob to update the database periodically. ie run
makeallperiodically do a full re-index.
Alternatively, run makeindexwhenever you have new files or
files that are modified.
-
Currently,makeindexwill append new keywords to the existing
database (using the "-a" option to append). So if you are doing a
complete re-index, remember to delete the old index database files, or you
will get duplicate keys and also double the database size.
And Finally..
-
This is public domain software from NCSA. If you do use it, please mention
the source of the software and in your service documents, please provide a
link to this tutorial.
-
Send comments, your new Doc Finder URL, and anything you've discovered
about WAIS databases that you'd like to share, to me at jng@ncsa.uiuc.edu.
jason ng
ncsa may 1994
APPENDIX
Changing MYHOST and PATHPREFIX in jwais-html.c - an EXAMPLE:
The PATHPREFIX is that part of the full path specification of a document
that needs to be chopped off to form the URL for that document.
If
-
your document root is set to /usr6/likkai/pub
-
you have a document called /usr6/likkai/pub/projects/atlas.html
-
you have a http server on arena.ncsa.uiuc.edu:7777
Then in order for Doc Finder to create a link to atlas.html, ie
http://arena.ncsa.uiuc.edu:7777/projects/atlas.html
you must specify
-
MYHOST as "http://arena.ncsa.uiuc.edu:7777"
-
PATHPREFIX as "/usr6/likkai/pub"
Back to main tutorial