hwlogstat 1.20
Name
hwlogstat - create statistics about WWW-access to HyperWave servers
Synopsis
hwlogstat parameters
For a short description of the
parameters try hwlogstat -h. For more details see below.
But first of all, you might want to take a look at an example.
Note
With the release of a new HyperWave version in June 1996, the www
logfile format has been changed. According to that, hwlogstat
(and beyond) parses the new format only.
So, if you still want to analyze old logfiles, you should use
hglogstat.
Description
Based on the logfiles written by the WWW-gateway, hwlogstat
produces access-statistics by collecting various information (see
Modes). Depending on the users' selection, the tool may present the
results as an HTML-document, it may write detailed information about
requested objects, searches and failed searches to a file, or, as a
third possibility, the tool may produce overall statistics, based on
information gathered during either of the previous modes. (The first
two actions may be combined in a single run, the third one needs an
extra run.)
Modes
hwlogstat may execute in three different modes, the first two
of which can be combined into a single run.
- Produce Exhaustive Statistics (-html)
In this mode, information of different categories in collected from
the logfiles and presented as an HTML document. If the user defines a
destination collection for the document (parameter -pname), the
document is immediately inserted there; otherwise, the HTML text goes
to standard-output, and no graphics will be produced at all.
Categories:
- General Information lists the total number of sessions,
user requests, robot requests, cache rate, redirected requests, failed
requests, successful search requests, failed search requests,
requesting hosts, bytes transferred etc.
- Details about Requested Objects include how many distinct
objects have been requested how many times within the relevant period
of time. A commandline option gives the user the chance to add the
most frequent domains to each requested object.
This section also lists which objects have been requested but could
not be delivered; the reason why the request failed is given in
parentheses.
- Details about Searches list the objects that have
successfully been searched for, and also those for with a searches did
not yield any result.
For each search term, the activated scope is given in brackets.
- Details about Requests show the most frequently requested
object types (text, image, multimedia etc.) and list the most frequent
reasons why requests failed.
- Details about User Access show the most frequent referring
pages along with their entry points, the most frequent entry points,
user agents and domain names of the requesting hosts.
- Transfer Characteristics list number of bytes transferred,
total transfer time and transfer rate for each country (taking into
account the highest level of the domain names only).
- Time Information finally shows requests and sessions per
hour and day. Since this information does not say too much when
presented in a tabular form, it may also be presented in a graphic
form, which is made accessible via hyperlinks. But still, the user has
the chance to suppress these graphics by adding the -nographics
commandline argument.
(For the creation of these graphics, Gnuplot is applied).
The parameter -top specifies how many of items shall be listed
in each of the reports; the default value is 20.
There is also a possibility to make sure certain items do not appear
in the report. This is achieved by adding them to a list in an rc-file
(for more details, see below.)
- Save Detailed Information
In this mode, all requested objects, searches and failed searches
along with the number of occurrence are written to a file. For
requested objects, the top domains are listed, too.
This information may then be further processed by other tools. The
script
hgcollstat, for example, uses these files to produce
statistics about single collections instead of the whole server.
- Produce Overall Statistics
When hwlogstat executes in the first mode, it outputs the
number of sessions per day (for each day within the analyzed period)
to the file sessions.log, the number of requests per day to the
file requests.log. (These files are located in the current
directory.)
From this information, mode three generates daily and
monthly summaries, covering the whole period examined by hwlogstat so
far.
Parameters
Most parameters may be abbreviated by the first four
characters. Exceptions are -lastse[ven] and -lastmo[nth].
(<...> defines the type of data required)
- -html
- Mode 1. The script will produce detailed
statistics, output an HTML document and optionally insert it into the
specified collection.
- -details
- Mode 2. Output all requested objects, search
requests and failed search requests in plain ASCII (for further use).
- -overall
- Mode 3. Produce overall statistics using
results from previous runs.
- -dir
- Defines the
directory the logfiles are stored in. Only logfiles in this directory
will be examined.
- -dir <directory>
- Defines the directory the logfiles are stored
in. Only the logfiles in this directory will be examined
(e.g. ~hgsystem/logs).
- -file <filename>
- Specifies the name of the current logfile (as
defined in .db.contr.rc). The default name is wave.log, so
this parameter may be omitted.
Old logfiles are supposed to be consist of the given filename followed
by a timestamp (e.g. wwwlog.30703723). Optionally, these files may
also be gzipped; in this case, the tool temporarily expands them
(using gzip -c). So, giving 'wwwlog' as filename actually means all
files matching wwwlog[.timestamp[.gz]].
- -hghost <name>
- Name of HyperWave host ...
- -pname <coll>
- ... and name of collection to put the HTML document into.
If this parameter is missing, output goes to stdout.
- -imgcoll <coll>
- Name of collection to put images into (by
default, equals the collection defined by -pname).
- -hname <string>
- Hostname that shall appear in the summary's
title. This option may be used in Mode 1, when an alias name shall be
used instead of the host's domain name within the report.
When the script is executed in Mode 2 only, there is no need to define
-hghost, -pname and -imgcoll, since an ASCII-file
is the only thing that will be output. So -hname may be used to
still give the host a name.
- -from <yy/mm/dd>
- First day to analyze. Should be in the form
yy/mm/dd.
- -to <yy/mm/dd>
- Last day to analyze. By default, yesterday's date
is assumed. Format as above.
- -lastseven
- Analyzes the last seven days (may be used
instead of -from and -to).
- -lastmonth
- Analyzes the last month (may be used
instead of -from and -to).
- -top <number>
- Specifies the top n items to be listed (20
by default).
- -regex
- Tells the script to treat the entries in the
rc-file as regular expressions (in perl-fashion);
without this option the entries are supposed to be object titles
- -domains
- Add the top domains to each requested object.
- -maxdomain <number>
- Specifies the number of domains to be
displayed (5 by default).
- -nographics
- Do not produce any graphics for the time
information.
- -cmd <filename>
- Take the parameters defined in the
given file. These parameters can still be overridden by those given in
the commandline.
- -test
- output current settings.
- -v
- Verbose mode.
The rc-file
It has been mentioned above that unwanted items may be excluded from
the summaries by describing them in an rc-file. By default, the script
looks for hwlogstat.rc in the current directory, but an
alternative filename may be defined by the -rc parameter.
The list of items in this file may be divided into several categories,
each headed by a line identifying the type of objects to follow. So
far, requested objects and entry pages and requesting hosts may be
skipped (the corresponding heading lines are _SKIP_OBJECTS_,
_SKIP_ENTRIES_ and _SKIP_HOSTS_.
Lines starting with # are considered to be comments.
The objects may be described in one of two ways:
- Simply list their titles
Although this may be a bit bothersome, this method has the advantage
of speed - the lists are kept in structures that can be searched very
fast.
Example:
# unwanted objects
_SKIP_OBJECTS_
coll_open.gif
coll_clos.gif
# unwanted entry pages
_SKIP_ENTRIES_
/
identify.gif
text.gif
info.gif
- Describe the objects with perl's regular expressions
\While whole classes of objects may be described by just a few
expressions, performance is cut down, since these expressions have to
evaluated for each object discovered in the logfile.
An example:
# unwanted objects
_SKIP_OBJECTS_
\.gif$
statistics
The first expression describes all objects ending in ".gif", the
second one describes all object containing "statistics" at any
position.
(For perl hackers: the above descriptions are all
combined to a single expression by joining them with "|" and putting
the result between the matching operators. For the above example this
would yield /\.gif$|statistics/)
It shall be emphasized, however, that the items that appear in
hglogstat.rc are excluded from the top-n lists only; they still
count as requested objects or entry pages!
The command file
When a series of statistics should be generated with most of the
parameters remaining constant, the user does not have to type the
whole lot of command-line parameters again and again. Instead, the
parameters may be put into a command file, and calling the script
parameter -cmd <file> induces it to read the parameters from
that file.
But still, additional parameters given in the command-line will
override those in the file.
Format:
- There must not be more than one parameter per line.
- The parameters must not be abbreviated.
- Lines starting with '#' are ignored.
- Parameters which are not accepted in this file are: -from, -to, -cmd, -test, -v
Example:
# what a command-file might look like
-dir /usr3/users/hgsystem/log
-hname "My own Server"
What is necessary to run hwlogstat?
- hwlogstat is a perl script and takes advantage of the
features new to perl 5.
So, the first prerequisite is perl
5 to be installed on your system.
- The graphics are produced by Gnuplot, which is called by
the script. So, this has to be installed, too, if a graphic output is
desired.
Since Gnuplot does not produce gif outputs (at least my
version 3.5 (pre 3.6) does not), ppmtogif is called to do the
translation. So, this, too, should be installed.
These tools, of course, are necessary only if a graphic output is
desired.
- Finally, insertion of the HTML document into the database is done by
hginstext, the graphics are inserted by hginsdoc.
If you have these installed on your system, too, nothing can keep you
from working with hwlogstat.
History
Changes since 1.10
- Daily stats are skipped if a single day is analyzed.
- Skipping objects, hosts and entry-pages is possible again.
- Domains all in lowercase.
- Transmission sorted by #bytes.
- -rc to define an rc-file
- -cmd for commandfile
- -test to show settings
Changes since 1.00
- HTML output goes to stdout if no parent collection is given.
- Wherever possible, hyperlinks are added to objects within the database.
- Graphics appear with lines instead of impulses.
- Timeinfo headers changed.
- Changed default-logfile name to wave.log
Known Bugs
Of course, there are some minor bugs, but none of them is really
serious.
- Gnuplot
It is a great graphics tool, but sometimes it behaves a bit
strange.
I place two plots on one screen, and although they start
at the same x-position, the second plot is moved one unit to the right
- but only on some architectures. There is a simple remedy to this -
forcing a plot at (0,0) which is invisible - but this produces faulty
behaviour on other architectures.
Till now, I have not found an
elegant solution.
Author
Alfons Schmid (aschmid@iicm.edu) - September 25, 1996