hglogstat - create statistics about WWW-access to Hyper-G servers
hglogstat parameters
For a short description of the
parameters try hglogstat -h. For more details see below.
Based on the logfiles produced by the WWW-gateway, hglogstat produces access-statistics by collecting various information (see Modes). Depending on the users' selection the tool may put the results into an HTML-document (which may immediately be inserted into a defined Hyper-G collection, optionally along with some graphic representation), or the script writes detailed information about requested objects, searches and failed searches to a file, or, as a third possibility, the tool may produce overall statistics, based on information gathered during either of the previous modes. (The first two actions may be combined in a single run, the third one needs an extra run.)
hglogstat may execute in three different modes, the first two of which can be combined into a single run.
In this mode, information of different categories in collected from the logfiles and presented as an HTML document. If the user defines a destination collection for the document (parameter -pname), the document is immediately inserted there; otherwise, the HTML text goes to standard-output, and no graphics will be produced at all.
Categories:
In this mode, all requested objects, searches and failed searches (all
along with the number of occurrence) are written to a file. From
there, this information may further be processed by other tools.
The script hgcollstat, for example, uses these files to produce
statistics about single collections instead of the whole server.
When hglogstat executes in the first or second mode, it outputs the
number of sessions per day (for each day within the analyzed period)
to the file sessions.log, the number of requests per day to the
file requests.log. (These files are located in the current
directory.)
From this information, mode three generates daily and monthly
summaries, covering the whole period that has been examined by
hglogstat so far, and produces an HTML document, which will be
inserted into the given collection.
Most parameters may be abbreviated by the first four
characters. Exceptions are -lastse[ven] and -lastmo[nth].
(<...> defines the type of data required)
It has been mentioned above that unwanted items may be excluded from
the summaries by describing them in an rc-file. By default, the script
looks for hglogstat.rc in the current directory, but an
alternative filename may be defined by the -rc parameter.
The list of items in this file may be divided into several categories,
each headed by a line identifying the type of objects to follow. So
far, requested objects and entry pages may be skipped, the
corresponding heading lines are _SKIP_OBJECTS_,
_SKIP_ENTRIES_ and _SKIP_HOSTS_.
Lines starting with # are considered to be comments.
The objects may be described in one of two ways:
Although this may be a bit bothersome, this method has the advantage of speed - the lists are kept in structures that can be searched very fast.
Example:
# unwanted objects _SKIP_OBJECTS_ coll_open.gif coll_clos.gif # unwanted entry pages _SKIP_ENTRIES_ / identify.gif text.gif info.gif
While whole classes of objects may be described by just a few expressions, performance is cut down, since these expressions have to evaluated for each object discovered in the logfile.
An example:
# unwanted objects _SKIP_OBJECTS_ \.gif$ statisticsThe first expression describes all objects ending in ".gif", the second one describes all object containing "statistics" at any position.
It shall be emphasized, however, that the items that appear in the rc-file are excluded from the top-n lists only; they still count as requested objects or entry pages!
hglogstat is a perl script and takes advantage of the
features new in perl 5. So, the first prerequisite is perl
5 to be installed on your system.
The graphics are produced by Gnuplot, which is called by the
script. So, this has to be installed, too. Since Gnuplot does
not produce gif outputs (at least my version 3.5 (pre 3.6) does
not), ppmtogif is called to do the translation. So, this, too,
should be installed.
Finally, insertion of the HTML document into the database is done by
hginstext. If you have this installed on your system, too,
nothing can keep you from working with hglogstat.
Changes since hglogstat 1.12
Changes since hglogstat 1.11
Of course, there are some minor bugs, but none of them is really serious.
These requests sometimes are the start of a new session, sometimes
they are not. In the logfile, however, they are simply declared as
POST Requests. As a consequence, the exact number of sessions cannot
be figured out, the result slightly diverges from the result obtained
by analyzing the dbserver's logfiles.
In numbers, the deviation within a month is a few hundred, which is
less than 0.5% and usually may be neglected.
To eliminate this bug, the logfile's format must be changed, which it
will anyway soon.
It is a great graphics tool, but sometimes it behaves a bit strange.
I place two plots on one screen, and although they start at the same
x-position, the second plot is moved one unit to the right - but only
on some architectures. There is a simple remedy to this - forcing a
plot at (0,0) which is invisible - but this produces faulty behaviour
on other architectures.
Till now, I have not found an elegant solution.
Alfons Schmid (aschmid@iicm.edu) - September 25, 1996