[ Top | Up | Prev | Next | Map | Index ]

Analog 6.0: Aliases


After analog has read each logfile entry, it then applies aliases to each of the items. First, if you have a case insensitive filesystem, analog converts the filename to lower case. Usually analog assumes that Unix and BeOS filesystems are case sensitive and other systems are case insensitive. You might want to override its choice, if, for example, you have transferred files from one machine to another, so as to use the convention on the original machine. You can do this by the commands
CASE INSENSITIVE
CASE SENSITIVE
There are similar commands for usernames, if your logfile records these. By default, usernames are always case insensitive, but you can specify
USERCASE SENSITIVE
to override this.
Next it applies built-in aliases to each item. For example, it knows that %7E in a filename or referrer is equivalent to ~ and translates it accordingly. It also strips off the directory suffix from any filenames which have it. This suffix is normally index.html, but you can specify another one instead with a command such as
DIRSUFFIX default.htm
(You can only have one DIRSUFFIX.) There are other built-in aliases for other items: for example, hostnames are converted to lower case at this point.
After this, it applies user-specified aliases to each item. These aliases are useful if, for example, you know that two filenames correspond to the same file, or if you want to translate local hostnames to their internet equivalents. You specify aliases by commands like
FILEALIAS /football.html /soccer.html
HOSTALIAS lion lion.statslab.cam.ac.uk
There is also the special command FILEALIAS none, which cancels any other file aliases which might have been specified.

The alias commands for the other items are called BROWALIAS, REFALIAS, USERALIAS and VHOSTALIAS. Only one alias is ever applied to any item. So after

FILEALIAS /football.html /soccer.html
FILEALIAS /soccer.html /brazil.html
the file /soccer.html would get translated to /brazil.html, but /football.html would only get translated to /soccer.html and would not see the second alias.

You can also use wildcards in ALIAS commands: ? matches any one character and * matches any number of characters (including none). And on the right-hand side, you can use $1, $2 etc. to represent the parts of the original name matched by the *'s. As a special abbreviation, if there is exactly one * on the left-hand side, then a * on the right-hand side can be used to represent $1. So, for example,

FILEALIAS /*/football/* /soccer/
would translate /sport/football/rules.html to just /soccer/, but either of
FILEALIAS /*/football/* /$1/soccer/$2         # or
FILEALIAS /sport/football/* /sport/soccer/*
would translate /sport/football/rules.html to /sport/soccer/rules.html.

You can use $$ to get an actual $ on the right-hand side. Or you can prefix the right-hand side with "PLAIN:" to treat any $'s and *'s on the right-hand side literally. For example

FILEALIAS /*/football/* PLAIN:/$1/soccer/$2
would translate /sport/football/rules.html to exactly /$1/soccer/$2

Analog's *'s are un-greedy: if there are two possible ways of matching, the part of the expression on the left matches as little as possible. This is more often what you want. But it contrasts with Perl's regular expressions, for example. (Oh, two consecutive *'s are completely useless, but if you try it they are collapsed into one before counting the $1, $2, etc.)

The behaviour of FILEALIAS and REFALIAS can be slightly unintuitive if the file has search arguments.

A warning to Unix users: if you put an ALIAS command on the command line with +C, the shell may try and expand $1 etc., which is not what you want. To stop the shell doing this, put the command in single quotes instead of double quotes.


There is another set of alias commands, called output aliases. They don't alias items, but individual lines from particular reports (and they never combine lines, even if two lines end up with the same name). For example, the command
TYPEALIAS .txt ".txt (Plain text files)"
would provide an explanation of that line in the File Type Report.

There can be some confusion between some normal alias and output alias commands. For example, what is the difference between FILEALIAS and REQALIAS? In fact, there are several differences because of the different things the aliases are doing. FILEALIAS applies to the files themselves, but REQALIAS only applies to the lines in the Request Report. This means that FILEALIAS also affects the other reports which use the filenames, such as the Directory Report, whereas REQALIAS only affects the Request Report.

Another difference is that REQALIAS applies separately to each line of the Request Report. This means that if two separate files translate to the same thing in a FILEALIAS command, they will become one file for all the reports. But if you were to use the same REQALIAS command, they would still be two files, and would still be listed on separate lines in the Request Report, but with the same name.

So in summary, when should you use each command? FILEALIAS should be used if a single file has two different names; i.e., if your web server returns the same file for two different URLs. REQALIAS, on the other hand, would typically be used to annotate or clarify the Request Report. Sometimes it's useful to use both; first combine some files with FILEALIAS, and then annotate them in the Request Report with REQALIAS.

The full list of output aliases is REQALIAS, REDIRALIAS, FAILALIAS, TYPEALIAS, DIRALIAS, HOSTREPALIAS, REDIRHOSTALIAS, FAILHOSTALIAS, DOMALIAS, ORGALIAS, REFREPALIAS, REFSITEALIAS, REDIRREFALIAS, FAILREFALIAS, BROWREPALIAS, BROWSUMALIAS, OSALIAS, VHOSTREPALIAS, REDIRVHOSTREPALIAS, FAILVHOSTREPALIAS, USERREPALIAS, REDIRUSERALIAS and FAILUSERALIAS.

There is one known bug with the output aliases. The report is sorted before the alias is applied. This means that if the SORTBY for the report is set to ALPHABETICAL, then the report will not be sorted correctly.


You can also use regular expressions in the ALIAS commands. Sorry, I'm not going to teach you how to use regular expressions here if you don't already know: if you're on Unix try typing man perlre or man regex or man grep. There are lots of implementations of regular expressions. The ones which analog uses are Perl-syntax regular expressions. In general, these are a superset of the extended regular expressions used by Unix egrep or GNU grep -E.

You include regular expressions in an ALIAS command by prefixing the left-hand side of the alias with "REGEXP:". Or you can specify a case-insensitive match, like Perl m//i or Unix egrep -i, by using "REGEXPI:". (It's automatically case-insensitive for many items, such as hostnames, or filenames if you have specified CASE INSENSITIVE.)

On the right-hand side of the alias you can use $1, $2 etc. to represent the first, second etc. bracketed expression on the left-hand side, counting in order of the left brackets. (Again, you can't put $1, $2 etc. on the command line unless you put them in single quotes.)

Regular expressions match if they match just part of the string. If you want them to have to match the whole of the string, you have to anchor them to the ends of the string with ^ and $.

For example,

REQALIAS REGEXP:^(/~(.+?)/.*) "[$2] $1"
would translate
/~sret1/backgammon/rules.html
to
[sret1] /~sret1/backgammon/rules.html
in the Request Report. Or
HOSTALIAS REGEXP:^([^.]*)$ $1.mycompany.com
would add .mycompany.com to all hostnames not containing a dot. (See the FAQ for a discussion about whether this is a good idea.)

Regular expressions are greedy: if there are two possible ways of matching, the part of the expression on the left matches as much as possible.


Go to the analog home page.

Stephen Turner
19 December 2004

Need help with analog? Use the analog-help mailing list.

[ Top | Up | Prev | Next | Map | Index ]