Anonlog is a program to "anonymize" web server logfiles. This means
that sensitive details are encoded so that you can send your logfiles to
someone else without them being able to see confidential data.
This documentation describes version 0.91beta of the program. See
the anonlog home page for the
latest version.
Anonlog is a program from the author of
analog.
Anonlog is copyright (c) Stephen R. E. Turner 2000, and is licensed under
version 2 of the GNU General Public License. This licence allows you to modify
and redistribute the program under certain conditions - principally that the
modified or redistributed program is licensed in the way as the original.
Since the program is free software, it is distributed without any warranty,
even the implied warranties of merchantability or fitness for a particular
purpose.
See the file Licence.txt for the full licence
conditions. (If this file is missing, see http://www.gnu.org/copyleft/gpl.html or get a copy
from the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307, USA).
Anonlog anonymizes the following items from the original logfile: filenames
(but preserves the extension), visitors' hostnames, referrers, usernames and
virtual hostnames. (Some of these items, especially the last two, may not be
present in every logfile.)
The following items are left unchanged: date and time of each request, HTTP
status code, file size, processing time and browser name.
Search arguments on files and referrers are deleted, and replaced with an
indication that they were present.
Anonlog can read logfiles in several different commonly-used formats. The
anonymized logfile is written to a new file.
The translation uses real words where possible. Furthermore, items are
translated "hierarchically" - for example, if
maths.cam.ac.uk became lemon.bee.to.de then
statslab.cam.ac.uk might become greatest.bee.to.de. (It
is configurable whether the new names should be the same length as the old
ones).
The key to the translation can be written to a file if you want.
Note that running the program on the same file twice will
not produce the same results. (This is a deliberate security
feature.)
If you run analog on the original and the
anonymized logfiles, the results should be almost exactly analogous (with minor
differences due to different parsing routines) except for analog's Search Word
Report and Search Query Reports, which will be lost, and the Organisation
Report, which will be wrong.
Anonlog is written in Perl. To run it you need at least version 5.004 of Perl.
Change settings in the configuration file anonlog.cfg (see
below). Then type perl anonlog.pl
to run the program.
(If you don't have a new enough version of Perl, download it free from
http://www.perl.org/).
If you don't have Perl already, download it free of charge from
http://www.activestate.com/Products/ActivePerl/ .
Then change settings in the configuration file anonlog.cfg
(see below) and run anonlog.pl.
The configuration file is called anonlog.cfg. In this file, you can
control the behaviour of anonlog.
In the configuration file, anything following a # is a comment.
Other lines follow the format "variable = value".
Here is the full list of variables which you can set. You will want to set at
least the first three.
(There is no reason to declare the same variable more
than once, but if you do, only the last occurrence will take effect.)
- logfile
- The logfile to be anonymized. Unix users might like to set
logfile=- for stdin.
- newlog
- Where to write the translated logfile. Unix users might like to set
newlog= for stdout.
- servernames
- Names by which your server is known (a comma-separated list). These are
treated specially in the referrer field. For referrers from these
servers, the hostname is left un-anonymized, and the filename is
translated as a local filename.
- logformat
- Anonlog can parse logfiles in several commonly-used formats. Normally it
can detect the format of your logfile, but if it has trouble you can
coerce it with this command. Legal values are common,
combined, extended, ms-extended (a
buggy version of extended in IIS) and iis (IIS native
format).
- dictionary
- A file from which to select words for use in the translated logfile. One
is supplied with the program (it's all the words from Jane Austen's
Pride and Prejudice, in case you're wondering), but any
text file will do.
- translations
- Where to write the key to the translations. Leave blank if you don't
want it to do this.
- unchfiles
- Filenames to leave alone (a comma-separated list). It is convenient to
leave index.html (or equivalent) alone so that
/dir/index.html is still the same as /dir/ after
the anonymization.
- matchlength
- Whether the new names should be the same length as the old ones (1 for
yes, 0 for no). The default is 0 for maximum security. Setting this to 1
tends to lead to shorter and more readable output.
- case_sensitive
- Whether filenames on your server are case-sensitive or not (1 for yes, 0
for no). This is normally 1 on a Unix machine, 0 on a Windows machine.
- usercase_sensitive
- The same for usernames.
The default configuration file is set for maximum security. You can reduce
security a little, but with increased functionality, by setting
translations and changing matchlength to 1.
The program needs to make sure that no two original names are given the same
anonymized name. This is a memory- and processor-intensive task, so the
program does not run very fast - I'm processing about 10,000 logfile lines per
minute on a 266 MHz chip with 96 MB of RAM.
If you want to increase the speed, you can try unsetting
dictionary (although this will make the output substantially less
readable). If you keep the dictionary, leaving matchlength at 0 may
help.
- Version 0.91beta (23-Jun-2000)
- First public version. Only trivial changes.
- Version 0.9beta (30-May-2000)
- First version released to sponsor.
I welcome feedback on this program. Contact me at analog-author@lists.isite.net.
Many thanks to an anonymous sponsor for funding the
development of this program.
Stephen Turner
E-mail: analog-author@lists.isite.net
23-Jun-2000