hgcollstat 1.1
Name
hgcollstat - create statistics about WWW-access to HyperWave collections
Synopsis
hgcollstat parameters file(s)
For a short description of the parameters try hgcollstat -h. For
more details see below.
Description
Based on
hglogstat's (hwlogstat's)
detailed output which describes all objects that have been requested
from a HyperWave server within a defined period, as well as successful
and failed search requests, hgcollstat generates access
statistics for single HyperWave collection hierarchies.
This is achieved by finding the details for those objects only, which
are children of the collection to be analyzed.
The result of this analysis is an HTML document which contains the
following information:
- All requested objects, sorted in order of descending number of
requests. To give some kind of user profile, the top domains are
listed along with each object.
(The top two levels of domains are counted.)
- All successful search requests with their scope within the tested
collection.
- All failed search requests with their scope within the tested
collection.
By default, this HTML document is written to standard output, but the
user may define a collection into which it will immediately be
inserted. This collection may be located at any Hyper-G host.
Moreover, as collections may contain large numbers on subdocuments,
the output may consist of several documents, each containing the
information for a specific data type. This means, there is a document
showing all requested objects of type text, another one for all images
etc. These documents are put into a cluster, so they are easier
accessible.
Of course, in this mode output cannot be directed to standard output.
Parameters
The parameters may be abbreviated by their first four characters.
- -name ...
- Name of the collection to analyze.
- -hghost ...
- HyperWave host, where the above collection (-name) is located.
- -pname ...
- Name of the collection to put the output HTML
document into. If no name is given, output goes to stdout.
- -dhost ...
- HyperWave host, where the above collection (-pname) is located.
- -sort
- give a separate report for each type of object
(text, image, ...); the result is a cluster of several HTML documents;
- -cname ...
- the name of the above cluster.
- -join
- Input to hgcollstat may come from
arbitrarily many files. This way, long term statistics may be
generated, even if hglogstat (hwlogstat) is run weekly
or monthly and produces detailed output for these periods only.
To save space and to put the details into a more compacted format, the
information in the input files may be combined and written to a single
file.
The option -join takes as parameter a filename to which the
joined input files will be written.
- -fuzzy
- By default, hgcollstat counts objects only
if the GOid matches the one appearing in the detailed
output. However, these Ids change as objects are modified and
reinserted into the database.
So, sometimes objects should be identified by their titles instead of
their Ids, which is done when in fuzzy mode.
This mode must be used carefully, the results may be very inexact. The
reason is that there may be several objects of the same title on the
server, and there is no way to decide which of them is the one we are
really looking for.
- -top ...
- Defines how many elements shall be displayed;
default is 100.
- -v
- Verbose mode.
What is necessary to run hgcollstat?
- hgcollstat is a perl script and takes advantage of
the features new to perl 5.
So, the first prerequisite is perl 5 to be installed on your
system.
- The collection hierarchy is searched by recursively calling
hginfo.
- The HTML document is inserted into the database using
hginstext, hifimport respectively.
That, of course, is necessary only if you want the document directly
put into the database.
If you have that installed on your system, hgcollstat should be
working fine with you.
Author
Alfons Schmid (aschmid@iicm.edu) - July 31, 1996