Client Installation

Just download the distribution file, gunzip it and untar it in a directory owned by a normal user, I would recommend /usr/local/topitall, in unix and /topitall on wndows NT4. I would suggest creating a new user topitall, although if you have a non-root administrative logon, that would be a good choice. Do not run or install Topitall as a root user. You must install RRDtool [current version 1.033]. There are binaries for Solaris, sparc and x86 and Linux on the download page or you can download the source or the win32 binaries from the rrdtool homepage.

For win32 you should install rrdtool in /rrdtool otherwise you should edit the scripts which refer to it. The unix client is called topitall and the win32 client is called win32.

For unix the utilities ps, vmstat, netstat, and df must be in the users path. For win32 perl should be installed in the users path. I have developed the win32 version using Indigo Perl which comes with a pre-configured apache web server which makes things a little easier.

Client Quick Start

Simply run the client agents topitall [unix] or win32 [NT4], and the agents will be started. Data is collected approximately once per minute and is written to rrd files in the tia direcetory. You can look at the data using the cgi scripts in the server/cgi/ directory. The behaviour of the client depends on three config files and is summarised here :

process.cfg	If this file is absent the agent collects a list of current running processes and collects data for each of them. If a new rrd file is created it is created with a reduced data size of one week. This mode, known as Profiler is to generate a baseline on a system about which not much is initially known. The information so gathered shoud be used to generate a process.cfg file tailored to the machine. See the example files in the cfg directory for syntax.
client.cfg	The existence of this file with entries for a server hostname and port configure the topitall agent to send it's data to a topitall server. The frrequncy of update and the parameters to be sent are also entered in this file. The list of all parameters measured currently is found in the tia/plot.cfg file.
alert.cfg	The existence of thsi file configures the topitall agent to process alerts. An alert is defined per line in the file and the nrormal actions is to send a mail. Alerts are also sent to the server at the same frequency that the data is sent to the server. User defined alert handlers can be defined and you should see the two perl files in the alertHandlers directory.

Client Configuration files

The process .cfg file specifies the names of the processes [in fact a regular expression on the Command field given by ps] , one per line.
If for example you are running the apache httpd daemon then just put httpd in a line in the process.cfg file and restart topitall. One of the
parameters plotted is the number of processes matched by each line in the tiaProcess.cfg file, and so can be used to see if the process, or
application is running or not. If you wish to monitor a whole application then you may be able to enter a string or pattern to match all the
processes in the application. For example all your processes may start with ora_ and so will be matched by a line containing ora_. If your
regular expressions are anything like mine then you will need to keep them very simple otherwise you will get confused results. You can use the -v 2 option to see what is actually being matched.

The alert.cfg file is a bit more complicated so you should see the comments in the file for detailed description and examples but I will
give a summary here. Each parameter has a category : System, Disk or Process and a keyword to identify it. The category and the
keyword identify the measured parameter, and then an expression is defined from which, when it is satisfied at a 15 minute interval, an alert is generated. The last two fields are the email to send the alert to and a comment to include in the body of the email. The alert subject line is constructed from the expression as defined in the file and will be more less in plain English. In unix the alert subject line is also logged locally to the messages file using the local6.info syslog facility.

For example this line will generate an alert when the Load Average is greater than 5 and send an email to the user jbelshaw@jcbhp with
the comment in the body of the email. :
System LoadAv >5 jbelshaw@jcbhp # System is getting pretty busy

And this line will generate an alert for the process httpd when the number of process drops below 4:
httpd N <4 webmaster@localhost # Web server processes have stopped

here are two more example lines

netscape Mem >=55 jbelshaw@jcbhp # netscape is using a lot of Memory
Disk /dev/dsk/c0t0d0s0 >=65 jbelshaw@jcbhp # Boot disk getting full

The process httpd and netscape must have been included in the process.cfg file for these process examples to work.

What time resolutions is the data is kept for ?

One day is kept at 1 minute, 28 days at 15 minute and 1 year at 6 hour resoluton.

How can I configure the data plots ?

The order of the output plots can be changed by copying the plot.cfg file in the tia directory to a new name and then
editing the line containing the list of parameters to the order you desire. If you miss parameters out they will not
be plotted. You invoke topitall with the option topitall -plot -config filename

How can I stop Topitall collecting data for every parameter ?

Topitall generates the list of parameters dynamically when it is run. You can prevent data being written by creating a read only file
of zero length in the tia directory.

I have changed the config files but nothing has changed what is wrong ?

Topitall only reads the files when it is started so you need to kill topitall and restart it.

Topitall Server

Currently only runs in Unix. The server can be run on an individual host or a client host. The script name is tiaServer, and accepts a -v option to display debugging information, and a -port portnumber option to change the default tcp port.. The server listens on the port and forks a separate process to handle incoming requests and so will be able to handle many clients. The server files live in the server directory and include a cgi directory where the web-scripts are found. This directory will have to be added as a cgi directory in your web server with the ScriptAlias 'tia'. The graph images are generated in the directory server/images and an alias tiaIamges should also be defined for this. The permissions on the server/images directory should be set so that your web-server process can write to it when the cgi script runs.

eg. for Apache :
Alias /tiaImages/ "/home/topitall/server/images/"
ScriptAlias /tia/ "/home/topitall/server/cgi/'

cgi scripts

monitor.cgi	Unix Server script : A list of all the hosts which have sent data to the server, colour coded
client.cgi	Unix Server script : A list of all alerts and data parameters per host
tgraph.cgi	Unix Server and Clients : produces a gif image of the selcted data
client.cgi	Unix Clients : Produces a list of all the parametrs stored locally on the host
tgraph	Win32 clients : Produces a gif image of the selected data
client	Win32 Clients : Procues a list of all the parameters stored locally on the host
profiler.cgi	Unix Clients : Produces a large page with one gif picture per parameter stored, store file in tiaImages/profiler.html
profiler	Win32 Clients : Produces a large page with one gif picture per parameter stored, store file in tiaImages/profiler.html

The main server cgi script is monitor.cgi and all the data is accesssed from there. ie http://your-host/tia/monitor.cgi. tIf the monitor and host.cgi scripts work but you cannot see the graphs then the permissions on the server/images directory are probably not correct.

The server/RRD directory contains RRD files, one per parameter. Each file is approximately 46K bytes in size and this will guide how many clients and how many parameters per host you will configure. The server/Connect directory stores a small file for each client to mark the last update time and store the update frequency.

Each client has a client.cfg file in the topitall directory to tell it which machine to connect to, and what data to send, here is an example of this file :

Host jcb
Port 12345
Meas perl.N perl.Cpu perl.Mem top.N top.Cpu top.Mem httpd.N httpd.Cpu httpd.Mem named.N named.Cpu named.Mem gnome.N gnome.Cpu gnome.Mem Nrun MemSwap MemFree MemBuff MemCache SwapIn SwapOut IOin IOout CPUuser CPUsystem CPUidle NProc Users LoadAv Uptime Mem eth0.RXok eth0.TXok _dev_hda5

The first line specifies the name of the server host
The second lne specifies the tcp port which the server is listening
The third line lists the parameters which are to be sent to the server, this line must start with keyword Meas.

The list of all possible parameters can be found in the file tia/plot.cfg which is created by the client process.

The client process on each client needs to be started with topitall -daemon -server to tell it to attempt to connect to the server. The topitall client can be started with an additional parameter -update Nminutes which specifes the update frequency. The client measures the parameters every minute but they can be uploaded to the server at a lower frequency, I would suggest something in the range of 2-15 minutes. the default is 5 minutes. The client also processes the alert information every minute but this is a probably a bit too quick to receive an email, so Topitall uses the update frequency to send Alerts. It includes a number specifying how many times the alert was triggered during the update time. eg. 3/5. This gives information on whether the condition is sporadic or not.

That's it. When the server is running and the client is started with topitall -daemon -server the data should be gathered in the server directories and the cgi scripts will then display meaningfull data.

User defined Alert Handlers

The directory alertHandlerscontains perl subroutine files which are read when the client is started. You should look at the comments in the alertDefault.pl and alerMail.pl to write your own. You write a subroutine whose name is derived from the alert condition and then it will be called when the alert is generated. A typical subroutine name would be EHnamed_N_1 to match the alert named N <1, or EHSystem_LoadAv_2 to match the alert System LoadAv >2. If these subrotuines fail then the client will also fail, so write them defensively. If you need to carry out a command as a root user I recommend using a utility like sudo to restrict the possibility of unforseen destructive actions.

Alert database

The server collects all the alerts in the file server/Alert/db.Alert as a flat text file. The last ten alerts per host are stored in the file server/Alert.hostname. It would be quite easy to store the data in a database but this is beyond the current scope of this tool, mainly as it generates a huge amount of work downstream in configuring access to the data in the database.

Caveats

If vmstat, df, or ps hangs whilst being called from topitall then the agent will hang, and will need to be restarted when the problem is fixed.
The alerts generated can only be sent by mail if /usr/lib/sendmail is correctly configured.
The alerts generated will be logged by local6.info and require syslog to be configured.
vmstat generates different outputs on different platforms, and so the data collected is different, if the platform is not SunOS or Linux then topitall will just use the cryptic names given by vmstat itself as the name of the parameters.
Some older versions of gnuplot do not support the gif terminal type and so will not produce the graphs.
The system commands, vmstat, df, uptime, ps and gnuplot must be in the path of the user running topitall.
The network parameter data is currently in development and is only certain to work on Linux and Solaris.

BUGS

Version 1.x is not designed to run over a year and so the data files will overwrite.

Appendices

A Unix System parameters measured

**All Platforms**
Name	Description
Mem	Percent Memory used [not very accurate]
MemFree	Memory Free KB
MemSwap	Swap Used KB
Uptime	System uptime
Users	Number of users
LoadAv	15 min Load average
NProc	Total number of processes
CPUuser	Percent CPU used by user processes
CPUsystem	Percent CPU used by system or kernel
CPUidle	Percent CPU not used
Nrun	Number of running processes
Nsleep	Number of sleeping processes
Nswapped	Number of swapped processes
SwapIn	KB/s swapped in
SwapOut	KB/s swapped out
Interrupts	Number of Interrupts/s including clock
ContextSwitches	Number of context switches /s

**Linux Specific**
IOin	Block/s IO in
IOout	Block/s out
MemCache	Cache Used

**Solaris Specific**
PageReclaims
MinorFaults
SwapFreed
MemShortfall
ScanRate
disk0
disk1
disk2
disk3

Disks

Percentage Free space for all locally mounted disk, identified by /dev/nodename

Process, per match

Name	Description
Cpu	% CPU used
Mem	% Memory used
N	Number of processes matched

B Windows NT4 measured parameters

All of the parameters in the windows perfomance monitor categories, TCP, System, Paging File, Memory and the number of processes, memory used and cpu useage per process is gathered. Note the Network monitor service must be installed in the Network configuration for the TCP statistics to be measured.