 Analog CE 6.0:
Form interface and CGI program
 Analog CE 6.0:
Form interface and CGI programImportant: For security reasons, you must not attempt to run analog itself as a CGI program, or even leave it in the directory or folder with your web files or CGI programs. When the form interface runs analog for you, it checks that analog isn't given any dangerous options. Without this check, your system could be vulnerable to attack.
The form interface is suitable for ordinary users to use, but it needs to be set up by a system administrator or other expert. In order to set it up, you have to be running a web server. You need to know what CGI programs are, where they live on your server, and how to set up their permissions properly. You also need to know how to write HTML forms. I shall assume this level of background knowledge for the rest of this section. And you have to be running Perl 5.001 or later: see Technical details below for other system requirements. (Actually, if you're on Windows and don't have Perl, you can download an executable version of the form interface from the helper applications page.)
Please don't try and set up the form until analog has been set up and is running properly on its own. It just adds another level of complexity to troubleshoot. And unlike analog itself, the form interface will not run "out of the box". You have to read the whole of this section to find out how to set it up safely.
Warning: CGI programs can contain security loopholes which allow an unscrupulous user to harm your system. (If you don't know about this, you shouldn't be running CGI programs at all. Read and understand the World Wide Web Security FAQ and the CGI Security FAQ first.) I have tried to make this form interface safe, but I cannot guarantee it. Even the most carefully-designed CGI programs can accidentally have serious security bugs. And I take no responsibility if anything goes wrong: you use it at your own risk. (See the licence.) Furthermore, you should be aware that unless you take special measures like password protection or limiting anlgform.pl to specific hostnames, setting up the form interface implies making analog executable, and your logfiles analysable, by anyone on the internet. It's usually a bad idea to allow this, because it has obvious negative implications both for privacy and for the load on your system: an attacker can run multiple copies of analog causing a denial-of-service attack. There are more notes on security design in this program towards the end of this section.
The form interface consists of two parts: a form (called anlgform.html) to choose the options, and a cgi program (called anlgform.pl) to pass them to the analog program. Both anlgform.html and anlgform.pl must be configured to your system before they will work at all. There are instructions at the top of both files explaining how to do this.
The form which is distributed with the program should only be regarded as an example form. You can find forms in languages other than English in the lang directory. Or you can write your own if you prefer. In fact you don't actually need the form at all: if you want just to create a link to the cgi program, with the arguments passed after a question mark in the URL in the usual way, then that's fine.
Logfile name: <input type=text name="LOGFILE">or maybe something like
<select name=LOGFILE size=1> <option value="/var/log/apache/fred"> Fred's logfile <option value="/var/log/apache/jane"> Jane's logfile </select>
There are a few commands which you can't specify on the form for security or performance reasons. The full list is *LOGFORMAT, LANGFILE, DESCFILE, HEADERFILE, FOOTERFILE, UNCOMPRESS, OUTFILE, CACHEOUTFILE, ERRFILE, LOCALCHARTDIR, DNS and SETTINGS; and the person setting up the form can add more. See the security notes below for the reasons for these exclusions, and for some more commands you might want to add to the forbidden list. You can, if you prefer, specify the commands which are allowed, rather than those which are forbidden.
Alias this file: <input type=text name="FILEALIAS1"> To this one: <input type=text name="FILEALIAS2">You can only specify one such pair this way, so there's no way to specify several of the same ALIAS, for example. Only the last COMMAND1 and the last COMMAND2 you specify count.
Then there are FLOOR commands. To avoid users of the form having to know the syntax of these commands, you can if you want specify them in two halves, FLOORA and FLOORB, and they will be stuck together. For example, the form distributed with the program specifies
<br>Include all domains with at least <input type=TEXT name="DOMFLOORA" maxlength=6 size=6> <select name="DOMFLOORB"> <option value=r>requests <option value=p>requests for pages <option value=b selected>bytes </select>If DOMFLOORA contains 5% and DOMFLOORB contains r, then DOMFLOOR 5%r will be sent to the program. (Or DOMFLOORA=5 and DOMFLOORB=%r would work too, if you chose to present the form that way.)
Secondly, you can specify other configuration files to be included at specific times. When analog is called by the CGI program, it first processes the default configuration file as usual. Then it processes any configuration file specified by an option with name cg. Then it processes all the other commands which the CGI program specifies. After that, it processes any configuration file specified by an option with name cm. Finally, it processes the mandatory configuration file as usual. (You may therefore want two copies of analog, one for form use and one for non-form use, with different configuration files compiled in.) Note that the commands in the default and mandatory configuration files will contribute to the configuration: some of them may even override options specified on the form. For example, if the default configuration file contains an INCLUDE command, this may cause INCLUDE and EXCLUDE commands specified on the form to behave unexpectedly.
There are a couple of commands which the form always sets. These may override what you have set elsewhere. First, it sets either DNS READ (if a DNSFILE is set on the form) or DNS NONE (otherwise). Do not attempt to override this -- not only will you get timeout problems, but an attacker can then write to any file by setting DNSFILE.
The second command which the form always sets is WARNINGS FL, so that the less important warnings don't fill up your server's error log. You can override this by sending an explicit WARNINGS command from the form. And thirdly, it sets DEBUG -C to avoid filling up the error log if the LOGFORMAT is incorrectly configured: this can't be overridden from the form, only from the mandatory configuration file.
You won't get pie charts on the form unless you set a CHARTDIR and LOCALCHARTDIR in your default configuration file (LOCALCHARTDIR is disabled from the form for security reasons). And even if you do this, there will be a problem if two users try and run the form interface at the same time, because they will be trying to write the same images, so they may see broken images or each other's charts.
There is one small point about compressed logfiles. For security reasons, when using the form interface you need to specify the full pathname to the uncompression command in the UNCOMPRESS command in your configuration file.
Again for security reasons, analog checks the input from configuration commands more carefully when using the form interface before outputting it. One side-effect of this is that the JAPANESE-JIS character set won't work. Use one of the other Japanese character sets instead.
First, does analog run properly on its own without anlgform?
Next, you can run anlgform.pl from the (DOS or Unix) command line. This is good enough to debug most problems. You can specify options in pairs like this:
anlgform.pl qv=1 LOGFILE=/some/log REQINCLUDE=pagesIf you include qv=1 in the argument list as above, you will see what anlgform.pl is trying to send to analog. If you don't include qv=1, anlgform.pl will try and run analog.
If it still doesn't work, check the following points:
First, you should think about who can run the form interface. Unless you take special measures like password protection or limiting anlgform.pl to specific hostnames, adding the form interface to your site implies making analog executable, and your logfiles analysable, by anyone on the internet, as often as they want. It's usually a bad idea to allow this, because of the obvious concerns both about privacy and about the load on your system. Unless you limit the total CPU available to any analog processes, it is easy for an attacker to run multiple copies of analog, causing a denial-of-service attack.
Certain commands are ignored by anlgform.pl and not passed to analog. The list of them can be found at the top of anlgform.pl. Here are the reasons for them. HEADERFILE and FOOTERFILE would place any file on your system within the output. The *LOGFORMAT commands would also allow any file to be read, because someone could designate each line to be a single filename and then just list the filenames. OUTFILE, CACHEOUTFILE, ERRFILE and LOCALCHARTDIR would allow people to write to your filespace; ERRFILE would also divert warnings away from your server's error log. UNCOMPRESS would allow a user to execute any command. DNS is forbidden because setting it higher than READ would normally cause the process to time out, and also because with DNS WRITE, the DNSFILE would be a file to write, not just a file to read. CGI would allow the user to generate syntactically incorrect output. PROGRESSFREQ would allow a user to conduct a denial-of-service attack by filling up your error log really, really fast (and DEBUG C is also disabled for the same reason.)
None of the above should be deleted (unless you are really, really sure that it's completely impossible for anyone other than yourself to run anlgform.pl). There are three other commands which are forbidden by default but which you could consider removing from the forbidden list. SETTINGS is included because it will give away the locations of some files on your system. But it is useful for diagnostic purposes, and you could consider removing it temporarily if you have trouble setting up the form. The other commands which are included are LANGFILE and DESCFILE. They are included because it is possible that another file could be exactly the right number of lines long to be accepted as a language file or report descriptions file, and then parts of it would get into the output. But it would have to be exactly the right number of lines long first. These commands shouldn't really be needed if your copy of analog is installed correctly, because the LANGUAGE command should find the right files. But if you want them, and you're prepared to take the risk described above, you can remove LANGFILE and/or DESCFILE from the list.
There are other commands which you might consider adding to the list. For example, it is theoretically possible (though rather unlikely), that another file on your system could conform sufficiently closely to one of the predefined log formats that analog could be persuaded to analyse it and so reveal some of its contents. If you're worried about this, or even if you want to force only one particular logfile to be analysed from the form, you can add the LOGFILE command to the list of forbidden commands. And you could add DOMAINSFILE for similar reasons. Or if you wanted to stop a user having control over which analog warnings were written to the error log, you could add WARNINGS to the list. (Possible attempted security violations detected by anlgform will always be written.)
You can of course add any command you like to the list. For example, a user can use any configuration file on your system unless you add CONFIGFILE. If you add a command, you must also add any aliases for it. Have a look in the source file globals.c for the same command under different names -- some commands have legacy names which I don't admit to in the documentation.
For more certainty, you can, if you prefer, configure anlgform so that you specify the commands which are allowed, rather than those which are forbidden. See the top of anlgform.pl for how to do this.
The arguments to LOGFILE and CACHEFILE commands are checked for containing only certain allowed characters (specifically, letters, digits, /\.:_*? space, and - between two {letter, digit, underscore}'s). This is because they could match an UNCOMPRESS command and thus be passed to the shell when the uncompress command is popen()'ed.
Apart from that, command names are checked for containing only letters and the digits 1 and 2; and the arguments to commands are checked for not containing control characters (actually characters 0-32 and 127-159; in particular newline characters are prohibited). The length of the commands isn't checked by anlgform.pl, but buffer overflow shouldn't be an issue as configuration commands are checked for length by analog.
By the way, the reason that I advise that analog itself shouldn't be used as a CGI program is that some servers, notably Microsoft IIS, allow users to pass command line arguments into a CGI program. And even if the program doesn't return the proper CGI headers, the output can be sent back to the user. This means that all the above checking of arguments is then thwarted. Of course, on servers on which you can't pass command line arguments to a CGI program, there are not the same security concerns, but then analog isn't very useful as a CGI program because if you can't pass any arguments, you can only get the default output.
On Windows, you have to associate the .pl extension with the Perl executable so that Perl scripts are executed by Perl.
anlgform.pl will understand the GET or POST methods of form submission. The HTML spec says that GET should be used when, as in this case, running the program has no side effects. However, section 15.1.3 of the HTTP spec says that POST should be used if some of the options being passed might be confidential. Also, very long URLs, formed by specifying lots of options, can cause trouble to some older servers. So anlgform.html uses the POST method by default. However, the GET method will also work. For example, you could make a normal link to anlgform.pl with options specified after a question mark in the usual GET way.
Stephen Turner