Readme for analog 6.0Although analog is free software, its distribution and modification are covered by the terms of the GNU General Public License. You are not required to accept this licence, but nothing else gives you permission to modify or distribute the program. Analog comes with no warranty.
Although analog is free, if you like it, please consider making a donation towards its development. Thank you.
This Readme describes analog 6.0. For the latest version of analog, see the analog home page. For examples of the output see
This is a version of the Readme in one page. If you're reading it on line, you might prefer the version on several smaller pages. Beginners should start with the licence followed by the section on Starting to use analog. There is an index at the end of this document.
You might also find the How-To's helpful; these are descriptions by other authors of how to use analog for particular tasks.
Now you can go to
If you log in to your ISP's machine from your home machine, you have two options. If you have the right permissions, you can run analog on your ISP's machine. Otherwise, you can download (e.g., ftp) the logfiles from their machine to yours, and then run analog on your machine.
Once you've downloaded the right version of analog for your computer from the analog home page (or a mirror site), you need to know how to set it up and run it. This is very easy, but the instructions are slightly different depending which platform you're using.
If you can't manage to set up analog after reading the instructions, send a message to the analog-help mailing list.
LOGFILE logfilename # to set where your logfile livesThe logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program.
There are already some configuration commands to get you started in the configuration file, but there are lots of others available. You can find the most common ones in the section on basic commands later in the Readme, and you can read about all of them in the section on customising analog. There are also some sample configuration files in the examples folder.
There is another way to give options, via command line arguments. You'll see these mentioned in this Readme from time to time, but MacOS before MacOS X doesn't have a command line, so ignore these unless you've downloaded the Darwin version of analog.
If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the analog home page. (It's the same source code for all versions).
Here is the really short summary:
There's also a How-To written by Simon Handfield, which explains how to get started in more detail with lots of pictures.
(Some unzip programs are broken, and do not create folders when they should. If you don't have a folder called lang inside the analog folder, create one and put all the files called *.lng and *.tab into it.)
There are two ways of running analog. You can either run it from Windows (by single-clicking or double-clicking on its icon, depending on your setup), or you can run it from the DOS command prompt (under Start-Programs). If you run it from Windows, it will create a DOS window to run in. When it's finished, it will produce an output file called Report.html and some graphics; and a file called errors.txt which contains any errors there might have been. The first time you run it, this will all happen almost instantly. This is not a bug. For help in interpreting the output, see What the results mean.
LOGFILE logfilename # to set where your logfile livesThe logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program.
There are already some configuration commands to get you started in the configuration file, but there are lots of others available. You can find the most common ones in the section on basic commands later in the Readme, and you can read about all of them in the section on customising analog. There are also some sample configuration files in the examples folder.
If you run analog from the DOS command prompt, there is another way to give options, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands. You can use the command line arguments if you run analog from a batch file too.
If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the analog home page. (It's the same source code for all versions).
If you're not using one of the platforms for which a precompiled version is available, you'll have to compile your own version from the source. But don't worry -- it's written in standard C throughout, so it will compile out of the box on most platforms. (The source code is the same for all platforms.)
First, change to the src/ directory.
Then look at the file anlghead.h, and see if there's anything you want to edit.
When you have done that, you need to compile the program. How to do that depends on which operating system you're using.
makewithin the src/ directory to compile the program. On most systems, that will be sufficient, and the compiled program should appear in the parent directory. If it fails to compile, have a look in the Makefile to see if there's anything that you need to change to suit your configuration, and try again. It says in that file what to do. In particular, Solaris 2 (SunOS 5+) users need to change the LIBS= line.
(Experts can pass some arguments in on the make command line instead of by editing anlghead.h: e.g.
make DEFS='-DLANGDIR=\"/usr/etc/apache/analog/lang/\"'This is useful if you have a script to compile analog.)
If you haven't got gcc, you will need to change the compiler - try acc or cc instead.
Compiling under OpenVMS. You can find OpenVMS build scripts within the src/build directory. Unzip them within the src directory. Then to build Analog interactively from the command line, type
$ @ Build_Analogor to submit the Build_Analog procedure to a batch queue, type
$ Submit /NoPrint /Keep Batch.comThe command procedure will use MMS (or MMK) if it is available, otherwise it will compile everything from raw command procedures.
Compiling under Acorn RiscOS. The Makefile can be found in the src/build directly, although at this point it has not been updated for version 5 of analog. You will have to make directories called C, H and O, and move the sources files into the appropriate directories: e.g., alias.c must be renamed C.alias. And you will find that there are some filenames in the header file anlghead.h that you want to change to fit into the RiscOS directory structure.
Compiling under OS/2. To compile analog for OS/2, you will need the EMX package. You should edit the Makefile to have OS=OS2 and LIBS=-lsocket. Then after editing anlghead.h and running Make, you need to run the command
EMXBIND -b ANALOGto generate the analog.exe executable.
analogto run the program. (Or ./analog if for some reason . isn't in your $PATH.)
You can configure analog by putting commands in the configuration file, which is called analog.cfg by default. Two commands you will need straight away are
LOGFILE logfilename # to set where your logfile lives OUTFILE outputfile.html # to send the output to a file instead of the screenThe logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program. For help in interpreting the output, see What the results mean.
There are already some configuration commands to get you started in the configuration file, but there are lots of others available. You can find the most common ones in the section on basic commands later in the Readme, and you can read about all of them in the section on customising analog. There are also some sample configuration files in the examples directory.
There is one other way to give options to analog, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands.
The following section is a technical (i.e., dull but important) one on the
Then there's documentation on all the configuration commands in the following categories. Analog has over 200 configuration commands and over 40 command line options, so sometimes these sections turn into lists of commands. But here's where you find out everything you can do with analog.Later there's an index of all the commands and topics, and also a quick reference containing the syntax of all the commands and examples.
LOGFILE my_logfile OUTFILE output.htmlwhere, of course, you should substitute the names of the files you want to use. The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet, so you may have to fetch it yourself first. You can read several logfiles by giving several logfile commands, or by giving a comma-separated list, or by using wildcards in the logfile name. So, for example, if you use the commands
LOGFILE new1.log,old*.log LOGFILE new2.loganalog will analyse the logfiles new1.log, new2.log, and all the old logfiles. Analog will recognise logfiles in several different formats. You can read more about this in the section on Choosing a logfile.
HOSTNAME "Spam Widgets Inc." HOSTURL http://www.spam-widgets.com/
If you have broken images in the output instead of graphs, you need to say in which directory on your server the images are stored. You do this by a command like
IMAGEDIR /analog/images/(This is just put in the <img> tags in the output page, so it's the URL of a directory, not the name of the directory on your disk. The images are distributed with the program - you will have to move them to whichever directory you choose.)
MONTHLY ON # one line for each month WEEKLY ON # one line for each week DAILYREP ON # one line for each day DAILYSUM ON # one line for each day of the week HOURLYREP ON # one line for each hour of the day GENERAL ON # the General Summary at the top REQUEST ON # which files were requested FAILURE ON # which files were not found DIRECTORY ON # Directory Report HOST ON # which computers requested files ORGANISATION ON # which organisations they were from DOMAIN ON # which countries they were in REFERRER ON # where people followed links from FAILREF ON # where people followed broken links from SEARCHQUERY ON # the phrases and words they used... SEARCHWORD ON # ...to find you from search engines BROWSERSUM ON # which browser types people were using OSREP ON # and which operating systems FILETYPE ON # types of file requested SIZE ON # sizes of files requested STATUS ON # number of each type of success and failureThe full list of reports is in the section on Configuring the output. Some reports, for example the Referrer, Browser and Operating System Reports, will only appear if your web server has been configured to record the necessary data in its logfiles.
You can configure lots of other things about each report, such as how many rows are listed, which columns are included, and how the reports are sorted. For example, the command
REQINCLUDE pagestells analog only to list pages, rather than all files, in the Request Report, and
REQFLOOR 10rtells analog to include in the Request Report all files with at least 10 requests. You can read a summary of all the reports and the commands which control them in the section on Analog's reports.
LANGUAGE FRENCHwill give you the output in French. The available languages at the moment include ARMENIAN, BASQUE, BULGARIAN, CATALAN, SIMP-CHINESE (GB2312), TRAD-CHINESE (Big5), CZECH, DANISH, DUTCH, ENGLISH, US-ENGLISH, FINNISH, FRENCH, GERMAN, HUNGARIAN, INDONESIAN, ITALIAN, JAPANESE, KOREAN, LATVIAN, NORWEGIAN (Bokmål), NYNORSK, POLISH, PORTUGUESE, BR-PORTUGUESE, RUSSIAN, SERBIAN, SLOVAK, SLOVENE, SPANISH, SWEDISH, TURKISH and UKRAINIAN.
The following languages were available for previous versions of analog, but have not yet been translated for version 5: BOSNIAN, CROATIAN, GREEK, ICELANDIC, LITHUANIAN and ROMANIAN. As and when they are translated, they will be added to the analog home page. See the section on Configuring the output for how to download, or even translate, new languages.
As I said, these are only a few of the commands available. To find out about all the commands, you'll have to read the remaining sections of the Readme, starting with a short section on the syntax of configuration commands.
CONFIGFILE other.cfgThe commands in the other configuration file are read immediately, in order. The program then continues reading the first configuration file where it left off. Note that reading in several configuration files does not produce several output pages, but a single output page based on all the options.
You can also include another configuration file from the command line by using a command like +gother.cfg. (Note that there is no space between +g and the filename; this is true of all command line arguments.) But note that reading an alternative configuration file does not stop the default configuration file (usually analog.cfg) being read as well. To do that you have to specify -G as well as the +g command. This is because if you want several different configurations, it's most convenient to put all the common options in analog.cfg, and options specific to each configuration in a separate file. Then the +g command line option will read both those files.
If the name of a configuration file given in a CONFIGFILE command doesn't include a directory, it will be looked for wherever analog expects to find its configuration files. (This location is a compile-time option.) For example, in the Windows version it would be in the same folder as the analog executable. This applies to the default and mandatory configuration files as well. But configuration files given with +g are relative to the current directory at the time you run the program.
In the Mac version, you can start up a program with a particular configuration file instead of the default one by dragging the configuration file onto the analog icon. The file must start with "# ".
You can also specify any configuration command on the command line even if it doesn't have a command line abbreviation, by use of the +C command. (NB The C must be upper case.) For example, +C"UNCOMPRESS *.gz gzcat" will include that command.
DAILYSUM OFF # We don't want a Daily Summary DAILYREP "ON" # We want a full Daily Report instead HOSTNAME (Spam Widgets Inc.) # Spaces, so quotes or brackets needed LOGFILE logfile1.log,\ logfile2.log # This line and the previous one are one commandGenerally later commands override earlier ones if you can have only one of that thing (e.g., for the OUTFILE), or supplement them if you can have several (e.g., for the LOGFILE, because you can read several logfiles). Apart from that, the order of commands doesn't matter, except that LOGFORMAT and LOGTIMEOFFSET commands must come earlier in the same configuration file than the LOGFILE to which they refer.
analog -settings [other options]from the command line, or include SETTINGS ON in the configuration commands. Then instead of running normally, analog will just tell you what the values of all the variables will be, based on the defaults in anlghead.h and anlghea2.h, the configuration commands, and the command line options. If you're on Unix or Windows, remember that you can send the output to a file with
analog -settings > fileAlso, analog -version will just give the version number.
LOGFILE logfilenameor just to put the logfile name on the command line without any arguments, e.g., analog logfilename. In the Mac version, you can also analyse a particular single logfile by dragging it onto the analog icon. All logfiles must be within your computer's file system (on disk, or at least mounted under Unix, or on a mapped drive under NT) -- analog won't use FTP or HTTP to fetch them from the internet.
A - sign or the word stdin is interpreted as standard input: this is useful on Unix systems for constructing pipes. There is also an optional second argument to the LOGFILE command which is explained below.
You can have several LOGFILE commands. You can include wildcards in the logfile name (but not necessarily in the directory name: this is system-dependent), and you can use a list of logfiles separated by commas (without spaces). So the following commands would tell analog to read logfile1, c:\logs\logfile2, and all files ending in .log:
LOGFILE logfile1,*.log LOGFILE c:\logs\logfile2Or if you were on a Mac, you might use something like
LOGFILE "Hard Drive:Internet Applications:Analog:Logs:*"You can also use the special command
LOGFILE noneto erase the list of logfiles specified so far.
If the name of a logfile in a LOGFILE command doesn't include a directory, it will be looked for wherever analog expects to find logfiles. (This location is built in when the program is compiled.) For example, on Windows it would be in the same folder as the analog executable. But logfile names given on the command line are within the current directory.
You can also include the date in the LOGFILE name, by using the following codes.
%D date of month %m month name, in English %M month number %y two-digit year %Y four-digit year %H hour %n minute %w day of week, in EnglishSo for example,
LOGFILE access_log%Y%M.logwill look for the logfile access_log200109.log, if it's September 2001. The date used is actually the TO date if one was specified, and otherwise the time of the start of the program. So for example, you can look at all of last month's logfiles with the commands
TO -00-0131 # to end of last month LOGFILE access_log%Y%M??.log # finds access_log200108??.log in Sep 2001
The LOGFILE commands are cumulative, except that any logfiles on the command line or in configuration files specified on the command line override any in the default configuration file or configuration files loaded from there, and are themselves overridden by any in the mandatory configuration file or configuration files loaded from there. Usually you don't need to worry about this, and it will do what you expect! (Actually I should have said "logfiles or cache files" -- but we'll get on to that later).
If your logfile is not in one of the standard formats, you will probably still be OK, because it is possible to tell analog about other formats using a LOGFORMAT command. This is explained in the next section. But most users don't ever need to know about this because they have logfiles in a standard format. So the best thing to do is just to try analysing your logfile and see if analog will understand it. If it does, you don't need to worry about LOGFORMATs.
If analog can't understand your logfile, it will warn you that it can't detect the format, or possibly that it found a lot of corrupt lines. There are basically five reasons why this might happen:
LOGFILE mydomain.log http://www.mydomain.comwould translate a filename /file.html in mydomain.log to http://www.mydomain.com/file.html. (If you only have logfiles from one server, and you just want the prefix so that you can host the output on a different server, then you probably want the BASEURL command instead.)
Note that because this actually changes the name of the file, any FILEINCLUDE, FILEEXCLUDE or FILEALIAS command will have to refer to the new name, including the prefix.
If you are using this command to combine logfiles from several different virtual hosts, then the Virtual Host Report doesn't tell you about the different virtual hosts. The virtual host name has just become part of the filename. So you want to look in the Directory Report instead. (And you will probably want to use the SUBDIR command as well.)
If the logfile contains the name of the virtual host on each line, then the argument can contain a %v, and the name of the virtual host will be inserted at that point. If %v is included and the logfile line doesn't have a virtual host, then that line will be marked as corrupt.
You need to supply the types of file that you want to uncompress in a comma-separated list, together with the name of a command that will uncompress the files to standard output (rather than to a file). For example, on Unix you might use
UNCOMPRESS *.Z "/usr/bin/uncompress -c"whereas on Windows NT, you might use
UNCOMPRESS *.Z ("c:\Program Files\uncompress\uncompress" -c)
If analog determines that a logfile which it's uncompressing isn't wanted for the analysis, a "broken pipe" error can be reported. This is produced by the uncompressing command and is out of analog's control, but it's harmless.
(Hint: There's nothing to stop you using the UNCOMPRESS command for other types of preprocessing, for example DNS resolution.)
The common logfile format is written by most servers. Its lines look like
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243(except all on one line). Some versions of Microsoft software have a buggy version of this with an extra quote mark before the HTTP like this:
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ "HTTP/1.0" 200 1243Analog will understand these, but (as with any two formats) it will reject lines if the format changes half way through.
[25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/and the browser (or agent) log looks like
[25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05)In the referrer log, the date can be omitted.
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)"(except all one line). If you are using the Apache server, you can generate this with the mod_log_config module, using the Apache command
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\""
It is usually better to use the combined log than separate logs, because it
stores more information in less space.
192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10, 2178, 303, 1243, 200, 0, GET, /~sret1/, -,(except all on one line; and sometimes with four-digit years). However, the format is extremely badly designed, in that the date follows local conventions: in other words, in North America the above example would have the date 12/25/98 instead. Analog will diagnose which form the logfile is in if possible: but if both the date and the month are at most 12, there is no way to tell which format it is. In this case, it will advise you to use the command LOGFORMAT MICROSOFT-NA for North American date format, or LOGFORMAT MICROSOFT-INT for international date format. In some countries, the date will not be in either of these formats, in which case you need to write your own LOGFORMAT command, based on the examples in the next section.
There are also various third-party extensions to the Microsoft format to include, for example, the browser and referrer. But they all do it in different ways, so analog can't automatically diagnose them, and again, you need to write a LOGFORMAT command for them.
12/25/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/ http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178(except all on one line, and with the fields separated by tabs). It suffers from the same problem with ambiguous dates as the IIS logfile (above), so again you might have to use LOGFORMAT WEBSITE-NA or LOGFORMAT WEBSITE-INT, or even have to write your own LOGFORMAT command.
12/25/98 17:45:35 OK jay.bird.com /~sret1/ 1243with the fields separated by tabs.
If analog finds that the header line is corrupt, it will usually tell you what was wrong with it. The most common problem is that you're not allowed the time without the date or vice versa -- in particular, having the date just at the top of the logfile is not sufficient; you must have it on each line. By default, Microsoft servers produce extended logs with the date only at the top. But if the date changes during the logfile, the server doesn't then write a new date line. This means that missing days or corrupt entries can make analog get a day out in either direction, with no way to rescue or even recognise the situation!
For this reason analog knows that it can't analyse such logfiles safely, so instead it insists that the date should be on every line. There are some programs on the helper applications page to put the date on each line. If you already have such a logfile you might want to use one of these programs, but they have to assume that the date doesn't change during the logfile, so it would be much safer to tell your server to log the date on every line in future.
The extended log is described at http://www.w3.org/TR/WD-logfile.html. Its header line looks like
#Fields: date time cs-uriIn the rest of the logfile, the fields can be separated by spaces or tabs. Remember the logfile must contain the date as well as the time on every line -- see above.
There is also Microsoft's attempt at the extended format -- unfortunately they didn't read the spec., so they didn't enclose the browser and referrer in quotes, they replaced spaces in the browser name with +'s, and they put the time taken to serve the request in milliseconds instead of seconds. And there is WebSTAR's attempt which is very nearly right except that they erroneously used the CS-HOST field as the client hostname instead of the server hostname. Analog will understand all of these versions.
Extended logs always record the time in GMT, so you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone.
The WebSTAR format is described at http://www.starnine.com/webstar/docs/ws4manual.3f.html. It has a header line like
!!LOG_FORMAT DATE TIME RESULT URL BYTES_SENT HOSTNAMEIn the rest of the logfile, the fields are separated by tabs. The WebSTAR server also records the time in GMT, so again you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone. Some other Mac servers also use the WebSTAR format, or something looking like it. Analog will understand these too.
Finally, the Netscape header line looks like
format=%Ses->client.ip% [%SYSDATE%] "%Req->reqpb.clf-request%" %Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%
The basic command to specify a log format looks like
LOGFORMAT format-- we'll discuss what the formats can be in a minute. Or if you are using the Apache server, you will probably find it more convenient to use
APACHELOGFORMAT apacheformatinstead.
The LOGFORMAT and APACHELOGFORMAT commands only apply to logfiles specified with a LOGFILE command later in the same configuration file. So you must put the LOGFORMAT above the LOGFILE to which it refers. If you declare your logfiles on the command line, or drag them onto the app on the Mac, you must use DEFAULTLOGFORMAT or APACHEDEFAULTLOGFORMAT instead. This is so that different logfiles can have different formats, like this:
LOGFILE log0 LOGFORMAT format1 LOGFILE log1 LOGFORMAT format2 LOGFILE log2 LOGFILE log3In this example, log1 is in format1, log2 and log3 are in format2, and log0 isn't in either format -- analog will try and detect which format it's in.
LogFormat "%h %l %u %t %v \"%r\" %>s %b" myformat CustomLog /var/log/apache/access.log myformatthen your analog.cfg should contain
APACHELOGFORMAT (%h %l %u %t %v \"%r\" %>s %b) LOGFILE /var/log/apache/access.log(Use parentheses instead of quotes round the argument if the argument already contains quotes.) Analog understands all Apache log formats, with the exception that it won't parse Apache's "%...{format}t" construction for customised times: if you have this construction, you will have to use ordinary LOGFORMAT instead. (This is because "%...{format}t" is sometimes localised.)
There are format words for all the built-in formats analog knows about. You might need one of these words if your logfile is in a standard format, but analog can't detect which format it's in for some reason; for example, maybe the first line is corrupt; or maybe analog can't tell whether you're using North American or international dates. So for example
LOGFORMAT COMMONwill select common format; you can also have COMBINED, REFERRER, BROWSER, EXTENDED, MICROSOFT-NA (North American date format), MICROSOFT-INT (international date format), WEBSITE-NA, WEBSITE-INT, MS-EXTENDED (Microsoft's attempt at extended format), WEBSTAR-EXTENDED (WebSTAR's version of extended format), MS-COMMON (a buggy version of common format in some versions of Microsoft software), NETSCAPE, WEBSTAR or MACHTTP. All these formats were defined at the end of the previous section. You can also use the special word AUTO to return to automatic detection.
If your logfile is not in one of the recognised formats, you can tell analog about your format using a log format string. You only ever need this if your logfile has lines which are not in one of the standard formats. (And even if it isn't in a standard format, if you're using the Apache web server, you will find APACHELOGFORMAT easier.)
The format string consists of a template for the logfile line, with the various fields and special characters replaced by codes as follows. Please note that these codes are case sensitive -- for example, %b is completely different from %B!
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243(except all on one line) could be represented by the LOGFORMAT command
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)In other words, it's just the sample line but with the hostname replaced by %S, the username by %u etc. (The parentheses are needed because the argument contains spaces.) Or take another example: if you had lines which looked like
Fri 25/12/98 5:45pm, /~sret1/, jay.bird.com, 200, 1243, http://www.site.com, Mozilla/2.0 (X11; I; HP-UX A.09.05)(all on one line again), you could use the format
LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B)Remember: if you have trouble writing a LOGFORMAT string, you can turn debugging on, and analog will report where each line was corrupt. If you still have trouble, you can write to the analog-help mailing list.
LOGFORMAT COMMON LOGFORMAT COMBINED LOGFILE log1 LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B) LOGFILE log2 LOGFILE log3log1 has lines in both common and combined format, whereas log2 and log3 have lines just in the format in the previous example.
If you specify several formats, analog tries to match each line to the first format first, then if that fails the next, and so on, so the order of the formats is important. Usually you want to specify the most common one first, to minimise the time spent trying to match lines to inappropriate formats.
So let's go back to the first example:
LOGFILE log0 LOGFORMAT format1 LOGFILE log1 LOGFORMAT format2 LOGFILE log2 LOGFILE log3Here log0 actually gets the default log format. If there are no DEFAULTLOGFORMAT commands, the default will be auto-detection. But if there are DEFAULTLOGFORMAT commands, even in another configuration file, that will be the format of log0.
The times you need to use the DEFAULTLOGFORMAT instead of the LOGFORMAT are if you want to change the format of logfiles which aren't given in a LOGFILE command -- for example, ones specified on the command line, or dragged onto the program icon on a Mac, or compiled in.
The "Unix time", %U, is always recorded in GMT. So you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone. Also, it's just the integer part of the time, so if you have decimals you will have to use %U.%j .
The log formats which analog can handle are those which are known as instantaneously decipherable: in practice, this means that the character which terminates a string can never occur in the string. So for example, in common format, which looks like
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b)if the hostname ever contained a space, the line would be marked as corrupt, because analog terminates the host at the first space, not at the first occurrence of space-dash-space, and then the rest of the line wouldn't match. Of course, hostnames should never contain spaces, so this shouldn't be a problem. There are a couple of other restrictions: if there is any date or time information, then the year, month, date, hour and minute must all be present: and the same information may not occur twice in the format (so you can't have both %m and %M, for example, because these both represent the month; make one of them a %j to have it ignored).
Sometimes you need to read one of the fields in a logfile, but not analyse it. For example, if you have a separate common log and referrer log, the referrer log might look like
http://guide-p.infoseek.com/Titles -> /~sret1/analog/But the requests for /~sret1/analog/ would already have been counted when reading the main logfile, so you don't want to count them again now. You get round this by specifying a * in that item in the format string, like this:
LOGFORMAT (%f -> %*r)
A tip: sometimes it is more efficient to specify two or more adjacent fields to ignore with a single %j, as long as the whole group ends with a recognisable character. So common format is more efficiently specified as
LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)-- in the date and time [25/Dec/1998:17:45:35 +0000], the seconds and the timezone can be ignored with a single %j, extending until the close-bracket.
Another tip: %j can also be used to ignore whole lines, rather than just fields analog doesn't use. For example, the extended log format ignores lines beginning with # by using
LOGFORMAT #%jand the Microsoft format ignores lines corresponding to FTP requests with
LOGFORMAT (%*S, %*u, %m/%d/%y, %h:%n:%j, %j)If those formats had not been used, the lines would have been incorrectly marked as corrupt.
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ HTTP/1.0" 200 1243
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ "HTTP/1.0" 200 1243
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%w"HTTP%j" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b)
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200
1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)"
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b "%f" "%B")
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b "%f" "%B")
LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%r" %c %b "%f" "%B")
[25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/ or http://www.site.com/ -> /~sret1/ LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %f -> %*r) LOGFORMAT (%f -> %*r)
[25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05) LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %B)
192.64.25.41, -, 12/25/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
192.64.25.41, -, 12/25/2001, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
LOGFORMAT (%S, %u, %m/%d/%Z, %h:%n:%j, W3SVC%j, %j, %v,
%T, %j, %b, %c, %j, %j, %r, %q,)
LOGFORMAT (%*S, %*u, %m/%d/%Z, %h:%n:%j, %j)
192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
192.64.25.41, -, 25/12/2001, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
LOGFORMAT (%S, %u, %d/%m/%Z, %h:%n:%j, W3SVC%j, %j, %v,
%T, %j, %b, %c, %j, %j, %r, %q,)
LOGFORMAT (%*S, %*u, %d/%m/%Z, %h:%n:%j, %j)
12/25/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/ http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178 LOGFORMAT (%m/%d/%y %h:%n:%j\t%S\t%v\t%j\t%u\t%j\t%r\t%f\t%j\t%B\t%c\t%b\t%T)
25/12/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/ http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178 LOGFORMAT (%d/%m/%y %h:%n:%j\t%S\t%v\t%j\t%u\t%j\t%r\t%f\t%j\t%B\t%c\t%b\t%T)
12/25/98 17:45:35 OK jay.bird.com /~sret1/ 1243 LOGFORMAT (%m/%d/%y\t%h:%n:%j \t%C%w%S\t%r\t%b)
CASE INSENSITIVE CASE SENSITIVEThere are similar commands for usernames, if your logfile records these. By default, usernames are always case insensitive, but you can specify
USERCASE SENSITIVEto override this.
DIRSUFFIX default.htm(You can only have one DIRSUFFIX.) There are other built-in aliases for other items: for example, hostnames are converted to lower case at this point.
FILEALIAS /football.html /soccer.html HOSTALIAS lion lion.statslab.cam.ac.ukThere is also the special command FILEALIAS none, which cancels any other file aliases which might have been specified.
The alias commands for the other items are called BROWALIAS, REFALIAS, USERALIAS and VHOSTALIAS. Only one alias is ever applied to any item. So after
FILEALIAS /football.html /soccer.html FILEALIAS /soccer.html /brazil.htmlthe file /soccer.html would get translated to /brazil.html, but /football.html would only get translated to /soccer.html and would not see the second alias.
You can also use wildcards in ALIAS commands: ? matches any one character and * matches any number of characters (including none). And on the right-hand side, you can use $1, $2 etc. to represent the parts of the original name matched by the *'s. As a special abbreviation, if there is exactly one * on the left-hand side, then a * on the right-hand side can be used to represent $1. So, for example,
FILEALIAS /*/football/* /soccer/would translate /sport/football/rules.html to just /soccer/, but either of
FILEALIAS /*/football/* /$1/soccer/$2 # or FILEALIAS /sport/football/* /sport/soccer/*would translate /sport/football/rules.html to /sport/soccer/rules.html.
You can use $$ to get an actual $ on the right-hand side. Or you can prefix the right-hand side with "PLAIN:" to treat any $'s and *'s on the right-hand side literally. For example
FILEALIAS /*/football/* PLAIN:/$1/soccer/$2would translate /sport/football/rules.html to exactly /$1/soccer/$2
Analog's *'s are un-greedy: if there are two possible ways of matching, the part of the expression on the left matches as little as possible. This is more often what you want. But it contrasts with Perl's regular expressions, for example. (Oh, two consecutive *'s are completely useless, but if you try it they are collapsed into one before counting the $1, $2, etc.)
The behaviour of FILEALIAS and REFALIAS can be slightly unintuitive if the file has search arguments.
A warning to Unix users: if you put an ALIAS command on the command line with +C, the shell may try and expand $1 etc., which is not what you want. To stop the shell doing this, put the command in single quotes instead of double quotes.
TYPEALIAS .txt ".txt (Plain text files)"would provide an explanation of that line in the File Type Report.
There can be some confusion between some normal alias and output alias commands. For example, what is the difference between FILEALIAS and REQALIAS? In fact, there are several differences because of the different things the aliases are doing. FILEALIAS applies to the files themselves, but REQALIAS only applies to the lines in the Request Report. This means that FILEALIAS also affects the other reports which use the filenames, such as the Directory Report, whereas REQALIAS only affects the Request Report.
Another difference is that REQALIAS applies separately to each line of the Request Report. This means that if two separate files translate to the same thing in a FILEALIAS command, they will become one file for all the reports. But if you were to use the same REQALIAS command, they would still be two files, and would still be listed on separate lines in the Request Report, but with the same name.
So in summary, when should you use each command? FILEALIAS should be used if a single file has two different names; i.e., if your web server returns the same file for two different URLs. REQALIAS, on the other hand, would typically be used to annotate or clarify the Request Report. Sometimes it's useful to use both; first combine some files with FILEALIAS, and then annotate them in the Request Report with REQALIAS.
The full list of output aliases is REQALIAS, REDIRALIAS, FAILALIAS, TYPEALIAS, DIRALIAS, HOSTREPALIAS, REDIRHOSTALIAS, FAILHOSTALIAS, DOMALIAS, ORGALIAS, REFREPALIAS, REFSITEALIAS, REDIRREFALIAS, FAILREFALIAS, BROWREPALIAS, BROWSUMALIAS, OSALIAS, VHOSTREPALIAS, REDIRVHOSTREPALIAS, FAILVHOSTREPALIAS, USERREPALIAS, REDIRUSERALIAS and FAILUSERALIAS.
There is one known bug with the output aliases. The report is sorted before the alias is applied. This means that if the SORTBY for the report is set to ALPHABETICAL, then the report will not be sorted correctly.
You include regular expressions in an ALIAS command by prefixing the left-hand side of the alias with "REGEXP:". Or you can specify a case-insensitive match, like Perl m//i or Unix egrep -i, by using "REGEXPI:". (It's automatically case-insensitive for many items, such as hostnames, or filenames if you have specified CASE INSENSITIVE.)
On the right-hand side of the alias you can use $1, $2 etc. to represent the first, second etc. bracketed expression on the left-hand side, counting in order of the left brackets. (Again, you can't put $1, $2 etc. on the command line unless you put them in single quotes.)
Regular expressions match if they match just part of the string. If you want them to have to match the whole of the string, you have to anchor them to the ends of the string with ^ and $.
For example,
REQALIAS REGEXP:^(/~(.+?)/.*) "[$2] $1"would translate
/~sret1/backgammon/rules.htmlto
[sret1] /~sret1/backgammon/rules.htmlin the Request Report. Or
HOSTALIAS REGEXP:^([^.]*)$ $1.mycompany.comwould add .mycompany.com to all hostnames not containing a dot. (See the FAQ for a discussion about whether this is a good idea.)
Regular expressions are greedy: if there are two possible ways of matching, the part of the expression on the left matches as much as possible.
HOSTEXCLUDE mycomputer.myisp.comwould exclude all requests by that computer from the statistics. (To exclude lines just from one specific report, see below.)
The rule for determining whether an item is included or excluded is as follows. All the INCLUDE and EXCLUDE commands for that item are considered one by one in order, and the item is included or excluded according to the last command it matched. Items which don't match any of the INCLUDE or EXCLUDE commands are included if the first command was an exclusion, and excluded if the first command was an inclusion. For example, the configuration
FILEINCLUDE /~sret1/* FILEEXCLUDE /~sret1/backgammon/*,/~sret1/analog/* FILEINCLUDE /~sret1/backgammon/*.gifwould instruct the program to examine only my files, excluding my backgammon and analog files, but including gifs in my backgammon directory. On the other hand,
FILEEXCLUDE /~sret1/*/img/*would analyse all files, except for images in my various directories. (If you get confused with all the inclusions and exclusions, remember that you can always use SETTINGS ON to see what the options you have specified represent.) Note that inclusions and exclusions can contain any number of wildcards, and can be lists separated by commas (but no spaces).
The full list of these commands is HOSTINCLUDE and HOSTEXCLUDE; FILEINCLUDE and FILEEXCLUDE; BROWINCLUDE and BROWEXCLUDE; REFINCLUDE and REFEXCLUDE; USERINCLUDE and USEREXCLUDE; VHOSTINCLUDE and VHOSTEXCLUDE; and STATUSINCLUDE and STATUSEXCLUDE.
Because the inclusions and exclusions take place after the aliasing, the name you must use is the aliased name. (In the absence of output alias commands, this is the name of the item in the output.)
Sometimes a line doesn't contain a particular sort of item, either because there is no field reserved for it on the line, or because the browser didn't send it for that request, or because it was present but corrupt. You can include or exclude these lines by making a special blank entry in the INCLUDE or EXCLUDE command. For example,
USERINCLUDE jim USERINCLUDE ""would include lines from user jim and lines without any user specified.
The behaviour of REQINCLUDE and REFINCLUDE can be slightly unintuitive if the file has search arguments.
You can also use regular expressions for the inclusions and exclusions by prefixing the expression with "REGEXP:" or "REGEXPI:". I've already described this at length in the context of aliases, so you can look there for all the details. A regular expression must be on a line on its own, not within a comma-separated list.
HOSTINCLUDE 131.111.20.18 # simple IP address HOSTINCLUDE 131.111.20.* # wildcard HOSTINCLUDE 131.111.20 # the same meaning HOSTINCLUDE 131.111.20-23 # a range of class C addresses HOSTINCLUDE 131.111.20.18/23 # subnet mask
STATUSINCLUDE 200-206,304,500-would mean only look at lines with status codes 200-206, 304 or 500-599.
Some people want to exclude status code 304 (Not Modified) to stop those requests appearing in the Request Report. But there is a better solution. By default, analog counts code 304 as a successful request, because it assumes that the cached version of the document is then presented to the user. But you can count it as a redirected request with the command
304ISSUCCESS OFFFor most people this is the wrong option, because code 304 is really the same as code 200 to the user. So again, if you don't understand this, stick with the default.
FROM 990701 TO 000615:1300Alternatively, each of the components can be preceded by + or - to represent time relative to the time at which the program was invoked. In this case, the date can have more than 2 digits. This allows constructions like
FROM -01-00+01 # from tomorrow last year
TO -00-0131 # to the end of last month (OK even if last month
# didn't have 31 days)
FROM -00-00-112
TO -00-00-01 # statistics for the last 16 weeks
FROM -00-00-00:-06+01 # statistics for the last 6 hours
There are command line abbreviations +F and +T
for the FROM and TO commands; for example,
+T-00-00-01:1800 looks at statistics until 6pm yesterday.
-F and -T turn off the from and to, as do FROM
OFF and TO OFF.
So, for example, the command
REFREPEXCLUDE http://your.site.com/*would exclude your internal referrers from the Referrer Report. However, it would not exclude them from the Failed Referrer Report, the Referring Site Report, etc. (you need to use FAILREFEXCLUDE, REFSITEEXCLUDE etc. for that); nor would it prevent other analysis of logfile lines with those referrers, as REFEXCLUDE would.
The full list of these commands is REQINCLUDE and REQEXCLUDE; REDIRINCLUDE and REDIREXCLUDE; FAILINCLUDE and FAILEXCLUDE; TYPEINCLUDE and TYPEEXCLUDE; DIRINCLUDE and DIREXCLUDE; HOSTREPINCLUDE and HOSTREPEXCLUDE; REDIRHOSTINCLUDE and REDIRHOSTEXCLUDE; FAILHOSTINCLUDE and FAILHOSTEXCLUDE; DOMINCLUDE and DOMEXCLUDE; ORGINCLUDE and ORGEXCLUDE; REFREPINCLUDE and REFREPEXCLUDE; REFSITEINCLUDE and REFSITEEXCLUDE; SEARCHQUERYINCLUDE and SEARCHQUERYEXCLUDE; SEARCHWORDINCLUDE and SEARCHWORDEXCLUDE; INTSEARCHQUERYINCLUDE and INTSEARCHQUERYEXCLUDE; INTSEARCHWORDINCLUDE and INTSEARCHWORDEXCLUDE; REDIRREFINCLUDE and REDIRREFEXCLUDE; FAILREFINCLUDE and FAILREFEXCLUDE; BROWSUMINCLUDE and BROWSUMEXCLUDE; BROWREPINCLUDE and BROWREPEXCLUDE; OSINCLUDE and OSEXCLUDE; VHOSTREPINCLUDE and VHOSTREPEXCLUDE; REDIRVHOSTREPINCLUDE and REDIRVHOSTREPEXCLUDE; FAILVHOSTREPINCLUDE and FAILVHOSTREPEXCLUDE; USERREPINCLUDE and USERREPEXCLUDE; REDIRUSERREPINCLUDE and REDIRUSERREPEXCLUDE; and FAILUSERINCLUDE and FAILUSEREXCLUDE.
The inclusion or exclusion applies to the unaliased name, if you are doing any output aliases. (This contrasts with the behaviour of normal INCLUDE and EXCLUDE commands, which apply to the aliased name.)
All directory names end in slashes, so DIRINCLUDE and DIREXCLUDE, and REFSITEINCLUDE and REFSITEEXCLUDE, implicitly add a trailing slash even if you don't give one. This sometimes catches people out in the following situation.
REFSITEEXCLUDE http://my.host.com/* # probably not what you wantmeans not to list subdirectories of the referring site http://my.host.com/, but to keep the site itself in the list. To exclude the site completely, just use
REFSITEEXCLUDE http://my.host.com/
You can also use the symbolic word pages in suitable INCLUDE and EXCLUDE commands; one very common command is
REQINCLUDE pagesto include only pages in the Request Report.
PAGEINCLUDE *.asp PAGEEXCLUDE /sret1.htmlI.e., *.asp are pages, but /sret1.html isn't. (If the file has search arguments, the PAGEINCLUDE and PAGEEXCLUDE are reckoned just on the part of the filename before the question mark.)
REQLINKINCLUDE pages,*.pdfwould link to pages and PDF files in the Request Report. The full set of these commands is REQLINKINCLUDE and REQLINKEXCLUDE (Request Report), REDIRLINKINCLUDE and REDIRLINKEXCLUDE (Redirection Report), FAILLINKINCLUDE and FAILLINKEXCLUDE (Failure Report), REFLINKINCLUDE and REFLINKEXCLUDE (Referrer Report), REDIRREFLINKINCLUDE and REDIRREFLINKEXCLUDE (Redirected Referrer Report), and FAILREFLINKINCLUDE and FAILREFLINKEXCLUDE (Failed Referrer Report). Note that the target of the links is also affected by the BASEURL command.
ROBOTINCLUDE Googlebot/*
/cgi-bin/script.pl?x=1&y=2runs the /cgi-bin/script.pl program with arguments x=1 and y=2. (Sometimes the server records these arguments in a separate field in the logfile, but if so you can use the %q field in the LOGFORMAT command, and analog will translate the filename to the above format).
You can tell analog either to read or to ignore the arguments using the commands ARGSINCLUDE and ARGSEXCLUDE which we'll discuss in a minute. But by default, all arguments are read, and as this is usually what you want, you don't usually need those commands.
You don't always see the arguments in the reports, even if they're being read, because analog doesn't show them if there aren't enough of them. In order to see them, you have to set the corresponding ARGSFLOOR parameter low enough.
Also note that within a report, the search arguments are listed immediately under the file to which they refer. This temporarily interrupts the normal order of the files. It may be clearer if you turn the N column on.
The reason is that, for example, the command
FILEINCLUDE /cgi-bin/script.pldoesn't match the file /cgi-bin/script.pl?x=1&y=2. To match that, you would have to use something like
FILEINCLUDE /cgi-bin/script.pl*instead. Similarly
FILEALIAS /cgi-bin/script.pl /script.plwill change /cgi-bin/script.pl itself, but not /cgi-bin/script.pl?x=1&y=2. You might want to use something like
FILEALIAS /cgi-bin/script.pl?* /script.pl?$1as well. (However, PAGEINCLUDE and PAGEEXCLUDE always refer to the part of the filename before the question mark.)
Conversely, because in the Request Report files with arguments are only included if their parent file is included, you can't just
REQINCLUDE /cgi-bin/script.pl?*x=1*or you will end up with nothing listed. You have to
REQINCLUDE /cgi-bin/script.plas well.
ARGSEXCLUDE /cgi-bin/script.plwere given, analog would ignore the arguments to that file, and so read /cgi-bin/script.pl?x=1&y=2 as just /cgi-bin/script.pl. On the other hand, if
ARGSINCLUDE /cgi-bin/script.plwere specified, analog would read the arguments, and so treat /cgi-bin/script.pl?x=1&y=2 as a different file from /cgi-bin/script.pl. REFARGSINCLUDE and REFARGSEXCLUDE are the same for referrers.
Technical note: the check for whether the arguments should be included happens before the filename has been subject to either built-in or user-specified aliases. So you have to use the unaliased name, exactly as it occurs in the logfile. For example, ARGSINCLUDE /~sret1/script.pl won't match /%7Esret1/script.pl even though they are really the same file. It also means that you can't use "pages" in the ARGSINCLUDE or ARGSEXCLUDE command, because we don't know whether a file is a page until after it's been aliased.
http://www.altavista.com/cgi-bin/query?pg=q&kl=XX&q=carrot+cakeThe search term is in the field q= so the appropriate SEARCHENGINE command is
SEARCHENGINE http://www.altavista.com/cgi-bin/query q(or even better
SEARCHENGINE http://*altavista.*/* qto allow for all their mirror sites in different countries.)
The command INTSEARCHENGINE is the same for search engines, or other scripts which take arguments, within your site. For example, you might have requests for files like
/cgi-bin/search?trm=chocolate+cakein which case you would specify
INTSEARCHENGINE /cgi-bin/search trmand (assuming you haven't done an ARGSEXCLUDE for that file) "chocolate cake" would then appear in your Internal Search Query Report.
Sometimes a search engine has two or more possible fields for the search term. In that case you can list all of them separated by commas, like this:
SEARCHENGINE http://*webcrawler.*/* search,searchText
I said previously that %7E in a URL is automatically converted to ~, etc. In fact this is only done to the ASCII-printable characters %20-%7E, because these are the only characters that are the same in every character set. (In fact, even that isn't true. Experts might want to know that ?, &, ; and = aren't converted either, to distinguish them from query-string delimiters: an encoded ?, &, ; or = is one that is not intended to be a delimiter. Also % isn't converted, to avoid confusing %25nm with %nm.)
But in the Search Query Report and Search Word Report it is useful to be able to convert non-ASCII characters too, so that you can see the actual words people typed, rather than get the %nm codes in place of all accented letters. So in these reports analog also converts characters %A0-%FF (if you are using an ISO-8859-* character set) or %80-%FF (for most other character sets).
However, there are reasons why you might not want this feature, and you can turn it off with the command
SEARCHCHARCONVERT OFFThese reasons include:
XHTML is the default. It produces web pages in XHTML 1.0. HTML produces web pages in HTML 2.0.
PLAIN produces plain text files, and ASCII is the same as PLAIN except that it uses all ASCII characters (no accents etc.) if possible. (This is because some applications don't understand accented characters).
LATEX produces LaTeX code which can be turned into PDF if you have the pdflatex command installed. (If you want to use the ordinary latex command, specify PDFLATEX OFF.) It's only available with certain European languages (US-ASCII, ISO-8859-1 and ISO-8859-2 character sets). Yes, I know it gives overfull hboxes sometimes.
COMPUTER is a special format suitable for reading by a computer (useful for reading into a spreadsheet, or post-processing with a graphics package, for example). There is a separate section about this format later.
XML produces an XML output which is an alternative format for post-processing. The DTD for the XML output is distributed with the program. You can find more information about the XML style, and an example of a post-processing program, at http://timian.jessen.ch/.
As well as a command like
OUTPUT PLAINyou can also select PLAIN style with the command line argument +a, and XHTML with the command line argument -a.
You can also specify OUTPUT NONE for no output, if you are producing a cache file.
LANGUAGE FRENCHwill give you the output in French. The available languages at the moment are ARMENIAN, BASQUE, BULGARIAN (Windows-1251), BULGARIAN-MIK (MIK-16), CATALAN, SIMP-CHINESE (GB2312), TRAD-CHINESE (Big5), CZECH (ISO Latin 2), CZECH-1250 (Windows-1250), DANISH, DUTCH, ENGLISH, US-ENGLISH, FINNISH, FRENCH, GERMAN, HUNGARIAN, INDONESIAN, ITALIAN, JAPANESE-EUC (EUC-JP), JAPANESE-JIS (ISO-2022-JP), JAPANESE-SJIS (SJIS), JAPANESE-UTF (UTF-8), KOREAN, LATVIAN, NORWEGIAN (Bokmål), NYNORSK, POLISH, PORTUGUESE, BR-PORTUGUESE, RUSSIAN (KOI8-R), RUSSIAN-1251 (Windows-1251), SERBIAN, SLOVAK (ISO Latin 2), SLOVAK-1250 (Windows-1250), SLOVENE (ISO Latin 2), SLOVENE-1250 (Windows-1250), SPANISH, SWEDISH, SWEDISH-ALT (alternative translation avoiding Anglicisms), TURKISH and UKRAINIAN.
The following languages were available for previous versions of analog, but have not yet been translated for version 5: BOSNIAN, CROATIAN, GREEK, ICELANDIC, LITHUANIAN and ROMANIAN. As and when they are translated, they will be added to the analog home page. If you want to translate any of them (or any other language), I would be delighted! See below.
The other way to specify a language is to use the LANGFILE command. This is useful if you want to download a new language from the analog home page, or if you want to translate one yourself, or even if you want to change some words or phrases or the way the dates and times are formatted in the output. The LANGFILE command tells analog in which file to find the various words and phrases for a new language. For example, the command
LANGFILE guarani.lng # or LANGFILE /usr/etc/httpd/analog/lang/guarani.lngwould read from that file. If the name of the file doesn't include a directory, it will be looked for wherever analog normally expects to find its language files.
Some languages also have domains files or report descriptions files available. These are normally selected automatically by the LANGUAGE command. But you can tell analog to use different ones with the DOMAINSFILE and DESCFILE commands. Also, some languages have translations of the form interface or configuration file.
If you want to translate another language, I would be delighted! Do contact me first to make sure that no-one else is already translating the same language. The file README.txt in the language directory, and the English language file, contain some brief instructions for translating new languages.
Equally, if you find any mistakes in the output in different languages, please do let me know because I'm not able to check them all myself!
OUTFILE stats.htmor with a command line argument like +Ostats.htm. If you use the filename - or stdout, the output will go to standard output, which is normally the screen, but Unix users might like to redirect it to another file or even into a pipe. You can also use an absolute path name, like
OUTFILE /usr/bin/httpd/htdocs/stats.html # Unix OUTFILE "Hard Disk:Server Apps:WebSTAR:Analog:Report.html" # MacIf the name of the OUTFILE doesn't include a directory, it will be put wherever analog expects to put its output files. (This location is built in when the program is compiled.) For example, on Windows it would be in the same folder as the analog executable. But if you use the +O command line argument, the file is within the current directory.
You can include date codes in the OUTFILE in exactly the same way as for the LOGFILE. So for example,
OUTFILE stats%y%M%D.htmlwill produce filenames like stats990501.html. As with the LOGFILE, the date used is the TO date if one was specified, and otherwise the time of the start of the program.
x GENERAL General Summary 1 YEARLY Yearly Report Q QUARTERLY Quarterly Report m MONTHLY Monthly Report W WEEKLY Weekly Report D DAILYREP Daily Report d DAILYSUM Daily Summary H HOURLYREP Hourly Report h HOURLYSUM Hourly Summary w WEEKHOUR Hour of the Week Summary 4 QUARTERREP Quarter-Hour Report 6 QUARTERSUM Quarter-Hour Summary 5 FIVEREP Five-Minute Report 7 FIVESUM Five-Minute Summary S HOST Host Report l REDIRHOST Host Redirection Report L FAILHOST Host Failure Report Z ORGANISATION Organisation Report o DOMAIN Domain Report r REQUEST Request Report i DIRECTORY Directory Report t FILETYPE File Type Report z SIZE File Size Report P PROCTIME Processing Time Report E REDIR Redirection Report I FAILURE Failure Report f REFERRER Referrer Report s REFSITE Referring Site Report N SEARCHQUERY Search Query Report n SEARCHWORD Search Word Report Y INTSEARCHQUERY Internal Search Query Report y INTSEARCHWORD Internal Search Word Report k REDIRREF Redirected Referrer Report K FAILREF Failed Referrer Report B BROWSERREP Browser Report b BROWSERSUM Browser Summary p OSREP Operating System Report v VHOST Virtual Host Report R REDIRVHOST Virtual Host Redirection Report M FAILVHOST Virtual Host Failure Report u USER User Report j REDIRUSER User Redirection Report J FAILUSER User Failure Report c STATUS Status Code ReportFor details on what the various reports mean, and a summary of the commands which control them, see the section on Analog's reports.
You can turn each report on or off with configuration commands like
FIVEREP OFF REFSITE ONor by using command line arguments like -5 and +s. You can also turn all reports except the General Summary on or off with the commands ALL ON and ALL OFF, or with the command line arguments +A and -A.
DESCRIPTIONS OFFEven if DESCRIPTIONS is ON, the descriptions will only appear if analog can find a report descriptions file in your language, or if you specify one using the DESCFILE command: for example,
DESCFILE descriptions.txtIf the name of the descriptions file doesn't include a directory, it will be looked for wherever analog normally expects to find its language files.
You can turn the "Go To" lines in the output off with the command
GOTOS OFFGOTOS ON turns them on again, and GOTOS FEW puts the "Go To" lines just at the top and bottom. GOTOS OFF can be abbreviated with the -X command line argument, and GOTOS ON with +X.
You can turn off the "Program started at" line at the top of the output, and the "Running Time" line at the bottom, with the command
RUNTIME OFFand turn them on again with RUNTIME ON.
The figures in parentheses in the General Summary are for the last seven days: either the seven days before the TO time, or if no TO time is given, the seven days before the time of the program start. The figures for the last seven days are normally included if some, but not all, of the requests fall in those seven days; but you can turn them off by means of the command
LASTSEVEN OFFOf course LASTSEVEN ON turns them on again.
You can change the order of the reports by means of the REPORTORDER command. You should list the code letters for all possible reports in the order you want them. Non-alphanumeric characters are ignored and so can be used as separators. For example,
REPORTORDER x-1QmdDhHw4567W-cPz-ritEIYy-SlLZo-sNnfKk-ujJ-vMR-bBp
GENSUMLINES ALLmeaning all available lines. (You always only get the ones relevant to your logfile though.) You can turn lines off using a command like
GENSUMLINES -KL(to turn off lines K & L) and turn them on again with a command like
GENSUMLINES +KYou can specify the exact set of lines to include with a command like
GENSUMLINES CDFGHMYou now just need to know which lines have which code letters, which is given in the following table.
IMAGEDIR img/ # relative URL: within the same directory as the output IMAGEDIR /img/ # off the root directory of your server IMAGEDIR http://www.myother.server.com/img/ # on another serverSome people are confused about the IMAGEDIR. It's just put in the <img> tags in the output. You can see its effect if you look at the HTML source of the output page.
You can use gif images instead of png's for the bar charts by specifying
PNGIMAGES OFFPNGIMAGES doesn't affect the pie charts, which are always png's: but see the JPEGCHARTS command for something similar.
LOGO picture.gif # for this file LOGO /images/picture2.gif # a different file LOGO none # for no logoThe logo is assumed to be inside the IMAGEDIR unless it starts with a slash, or contains ://
The LOGOURL command specifies a URL to link the logo to. If you change the LOGO, you probably want to change the LOGOURL as well. For example,
LOGOURL http://www.mycompany.com/ LOGOURL none # for no linkThe LOGOURL command only works with the XHTML output style, not HTML 2.0.
There are commands HOSTNAME and HOSTURL which affect the name and link at the end of the title line. For example, I might specify
HOSTNAME "Stephen Turner" HOSTURL http://homepage.ntlworld.com/adelie/stephen/to generate the title "Web Server Statistics for Stephen Turner". Again, you can use none as the HOSTURL to specify no link. Analog will normally translate characters in the hostname to HTML if necessary. So to include literal HTML, such as accented characters, in the output you need to precede them by a backslash, like this:
HOSTNAME "M\üller & S\öhne"
HEADERFILE noneto cancel a previously-specified header file. Again, if the name of the HEADERFILE or FOOTERFILE doesn't include a directory, analog will assume a directory, specified when the program was compiled.
STYLESHEET /housestyle.css STYLESHEET none # to cancel itIn the XHTML output style, if you specify a style sheet, it will replace the default one, so you might prefer to use the default one as a base -- you can find it in the directory examples/css, along with some other style sheets contributed by users.
There is a command CSSPREFIX to add a prefix to all the CSS class names used in the XHTML output style. This is useful to avoid clashes with other style sheets: the disadvantage is that it will make your output longer. For example,
CSSPREFIX anlg CSSPREFIX none # to cancel itOf course, if you use your own style sheet, you will have to add the CSSPREFIX to all the class names in the style sheet.
SEPCHAR " " REPSEPCHAR none DECPOINT ,to make "three thousand and a quarter" look like "3 000,25" in text and "3000,25" in the reports.
There is a command called RAWBYTES. Specify RAWBYTES ON if you want the exact number of bytes to be listed, or RAWBYTES OFF if you want the number of kilobytes or Megabytes as appropriate to be listed instead.
If RAWBYTES is OFF (which is the default), then you can use the BYTESDP command to specify how many decimal places you want the bytes rounded to. The default is 2, which will display numbers like "91.26 kilobytes".
You have to be careful using this command. Because of daylight savings time in operation in different parts of the world at different times, analog cannot attempt to convert between different timezones. So it's your responsibility to set the right offset for different times of year. For example, if you were in Chicago, but your server was recording time in GMT, you would need to specify two different time offsets, one of minus five hours for summer and one of minus six hours for winter. You would need to split your logfiles in the right places and then run commands like
LOGTIMEOFFSET -300 LOGFILE summer*.log LOGTIMEOFFSET -360 LOGFILE winter*.log
There is also a related command called TIMEOFFSET. This tells analog how much to offset the time of the computer on which it is running (rather than the computer running the server), to get your local time.
By the way, in the following lists, don't get confused between the commands for the Quarterly Report (which begin with QUARTERLY) and those for the Quarter-Hour Report and Quarter-Hour Summary (with begin with QUARTERREP and QUARTERSUM respectively).
HOURSUMCOLS Pbtells analog to include the number of page requests and percentage of the bytes, in that order, as the columns for the Hourly Summary. The full list of these COLS commands is YEARCOLS, QUARTERLYCOLS, MONTHCOLS, WEEKCOLS, DAYREPCOLS, DAYSUMCOLS, HOURREPCOLS, HOURSUMCOLS, WEEKHOURCOLS, QUARTERREPCOLS, QUARTERSUMCOLS, FIVEREPCOLS and FIVESUMCOLS. There is also a TIMECOLS command, which specifies that all the time reports are to have the specified columns.
DAYREPGRAPH Ptells analog to plot the bar charts in the Daily Report by the number of page requests. This also controls how analog decides which is the busiest time period in the bottom line of the report. Using a lower case letter tells analog to plot the bar charts with ASCII characters instead of the normal red bars. (This produces shorter output, and it is how they appear anyway in PLAIN and ASCII output styles, or when viewed with a non-graphical browser.) So, for example,
DAYREPGRAPH bwould plot the Daily Report by bytes, without using the graphics. The full list of GRAPH commands is YEARGRAPH, QUARTERLYGRAPH, MONTHGRAPH, WEEKGRAPH, DAYREPGRAPH, DAYSUMGRAPH, HOURREPGRAPH, HOURSUMGRAPH, WEEKHOURGRAPH, QUARTERREPGRAPH, QUARTERSUMGRAPH, FIVEREPGRAPH and FIVESUMGRAPH. There's also an ALLGRAPH command to set all of them simultaneously.
BARSTYLE aThe default style is b.BARSTYLE b
BARSTYLE c
BARSTYLE d
BARSTYLE e
BARSTYLE f
BARSTYLE g
BARSTYLE h
BARSTYLE i
BARSTYLE j
![]()
MONTHBACK ON # Monthly Report backwards WEEKBACK OFF # Weekly Report forwardsThe full list of BACK commands is YEARBACK, QUARTERLYBACK, MONTHBACK, WEEKBACK, DAYREPBACK, HOURREPBACK, QUARTERREPBACK and FIVEREPBACK. It tends to be confusing to mix directions (and analog will warn you if you attempt it) so usually you want to use the ALLBACK command which will set all of them at once.
QUARTERREPROWS 96 # only the last day's worth MONTHROWS 0 # 0 means no restriction: show all timeThe full list of ROWS commands is YEARROWS, QUARTERLYROWS, MONTHROWS, WEEKROWS, DAYREPROWS, HOURREPROWS, QUARTERREPROWS and FIVEREPROWS. Even if a ROWS command is given, the line at the bottom of the report will still show the busiest time period ever, not just the busiest one in that many rows.
MARKCHAR =tells analog to use the equals sign.
There is a parameter called MINGRAPHWIDTH which sets the minimum nominal size of the graphs. For example, if you set
MINGRAPHWIDTH 10then the graph will be allowed to be up to 10 characters wide, even if that would exceed the PAGEWIDTH.
There is one more command which affects the time reports. You can specify which day should be counted as the first day of the week. This affects the layout of the Daily Report, Daily Summary, Weekly Report and Hour of the Week Summary. For example, our local student newspaper publishes a new edition on the web every Friday, so they like to specify WEEKBEGINSON FRIDAY for their reports.
In the next section, we'll look at commands relating to the non-time reports.
First, these reports have COLS commands, just like the time reports. (See the section on Time reports for how to use these commands.) But for these reports, several additional columns are available. Here is the full list of columns for the non-time reports
REQCOLS NRSDcounts the files in the Request Report, listing the number of requests for each, the number of requests for each in the last 7 days, and the time when each was last requested. The full list of COLS commands for non-time reports is HOSTCOLS, REDIRHOSTCOLS, FAILHOSTCOLS, ORGCOLS, DOMCOLS, REQCOLS, DIRCOLS, TYPECOLS, SIZECOLS, PROCTIMECOLS, REDIRCOLS, FAILCOLS, REFCOLS, REFSITECOLS, SEARCHQUERYCOLS, SEARCHWORDCOLS, INTSEARCHQUERYCOLS, INTSEARCHWORDCOLS, REDIRREFCOLS, FAILREFCOLS, BROWREPCOLS, BROWSUMCOLS, OSCOLS, VHOSTCOLS, REDIRVHOSTCOLS, FAILVHOSTCOLS, USERCOLS, REDIRUSERCOLS, FAILUSERCOLS and STATUSCOLS. Not every column is allowed in every report, but if you specify an illegal one, analog will warn you about it.
HOSTSORTBY ALPHABETICALwill sort the Host Report alphabetically. The full list of SORTBY commands is HOSTSORTBY, REDIRHOSTSORTBY, FAILHOSTSORTBY, ORGSORTBY, DOMSORTBY, REQSORTBY, DIRSORTBY, TYPESORTBY, REDIRSORTBY, FAILSORTBY, REFSORTBY, REFSITESORTBY, SEARCHQUERYSORTBY, SEARCHWORDSORTBY, INTSEARCHQUERYSORTBY, INTSEARCHWORDSORTBY, REDIRREFSORTBY, FAILREFSORTBY, BROWREPSORTBY, BROWSUMSORTBY, OSSORTBY, VHOSTSORTBY, REDIRVHOSTSORTBY, FAILVHOSTSORTBY, USERSORTBY, REDIRUSERSORTBY, FAILUSERSORTBY and STATUSSORTBY. Again, not every sort method is possible in every report, but you'll be warned if you choose an illegal one.
There is one known bug concerned with SORTBY ALPHABETICAL. The report is sorted before any output alias is applied. This means that if an output alias has been specified for the report, then the report may appear not to be sorted correctly.
DOMFLOOR 1000r # all domains with at least 1000 requests
DOMFLOOR 100s # at least 100 requests within the last 7 days
DOMFLOOR 1000p # at least 1000 requests for pages
DOMFLOOR 100q # at least 100 requests for pages within the last 7 days
DOMFLOOR 1000000b # at least 1,000,000 bytes transferred
DOMFLOOR 1kb # at least 1 kilobyte (1024 bytes)
DOMFLOOR 10.5Mc # at least 10.5Mb within the last 7 days
DOMFLOOR 0.5%r # 0.5% of the total requests in the Domain Report
# (ditto %s, %p etc.)
DOMFLOOR 0.5:r # 0.5% of the maximum number of requests for any domain
# (ditto :s, :p etc.)
DOMFLOOR 970701d # last access since 1st July 1997
DOMFLOOR 970701e # first access since 1st July 1997
DOMFLOOR -00-01-00d # last access in last month (see
# documentation on FROM and TO commands)
DOMFLOOR -100r # domains with top 100 number of