hgcollstat 1.1

Name

hgcollstat - create statistics about WWW-access to HyperWave collections

Synopsis

hgcollstat parameters file(s)
For a short description of the parameters try hgcollstat -h. For more details see below.

Description

Based on hglogstat's (hwlogstat's) detailed output which describes all objects that have been requested from a HyperWave server within a defined period, as well as successful and failed search requests, hgcollstat generates access statistics for single HyperWave collection hierarchies.
This is achieved by finding the details for those objects only, which are children of the collection to be analyzed.

The result of this analysis is an HTML document which contains the following information:

All requested objects, sorted in order of descending number of requests. To give some kind of user profile, the top domains are listed along with each object.
(The top two levels of domains are counted.)
All successful search requests with their scope within the tested collection.
All failed search requests with their scope within the tested collection.

By default, this HTML document is written to standard output, but the user may define a collection into which it will immediately be inserted. This collection may be located at any Hyper-G host.

Moreover, as collections may contain large numbers on subdocuments, the output may consist of several documents, each containing the information for a specific data type. This means, there is a document showing all requested objects of type text, another one for all images etc. These documents are put into a cluster, so they are easier accessible.
Of course, in this mode output cannot be directed to standard output.

Parameters

The parameters may be abbreviated by their first four characters.

-name ...: Name of the collection to analyze.
-hghost ...: HyperWave host, where the above collection (-name) is located.
-pname ...: Name of the collection to put the output HTML document into. If no name is given, output goes to stdout.
-dhost ...: HyperWave host, where the above collection (-pname) is located.
-sort: give a separate report for each type of object (text, image, ...); the result is a cluster of several HTML documents;
-cname ...: the name of the above cluster.
-join: Input to hgcollstat may come from arbitrarily many files. This way, long term statistics may be generated, even if hglogstat (hwlogstat) is run weekly or monthly and produces detailed output for these periods only.
To save space and to put the details into a more compacted format, the information in the input files may be combined and written to a single file.
The option -join takes as parameter a filename to which the joined input files will be written.
-fuzzy: By default, hgcollstat counts objects only if the GOid matches the one appearing in the detailed output. However, these Ids change as objects are modified and reinserted into the database.
So, sometimes objects should be identified by their titles instead of their Ids, which is done when in fuzzy mode.
This mode must be used carefully, the results may be very inexact. The reason is that there may be several objects of the same title on the server, and there is no way to decide which of them is the one we are really looking for.
-top ...: Defines how many elements shall be displayed; default is 100.
-v: Verbose mode.

What is necessary to run hgcollstat?

hgcollstat is a perl script and takes advantage of the features new to perl 5.
So, the first prerequisite is perl 5 to be installed on your system.
The collection hierarchy is searched by recursively calling hginfo.
The HTML document is inserted into the database using hginstext, hifimport respectively.
That, of course, is necessary only if you want the document directly put into the database.

If you have that installed on your system, hgcollstat should be working fine with you.

Author

Alfons Schmid (aschmid@iicm.edu) - July 31, 1996