Client Installation

Just download the tgz file, gunzip it and untar it in a normal users directory. Do not run or install Topitall as a root user.  You must install RRDtool [current version 1.033]. There are binaries for Solaris, spar and x86 and Linux on the download page or you can download the source or other binaries from the rrdtool homepage.

The script topitall should be run with topitall -daemon. If your perl is not in /usr/local/bin then you will have to edit the first line of the script files. For Usage info just run topitall without any arguments.  I have included an example rc script to start Topitall on boot , and an example crontab file to generate the html files automagically. The utilities ps, vmstat, netstat, and df must be in the users path. For alerts to work, logger must be in the path and sendmail should be /usr/lib/sendmail and configured so as to be able to send mail.

Client Quick Start

Simply run topitall  -daemon as the user you installed it and data will start to be writen in the tia directory [data directory, one file
per parameter]. If you use the -q option topitall will run in quick mode, where data will be gathered one point per ten seconds with
butchered time stamps so you can fine tune your config files. To see the results run topitall -plot -day and today's files will be
created in the html directory including the hostname.day.html file which is the one you should look at first.

Client Configuration files

The tiaProcess.cfg file specifies the names of the processes [in fact a regular expression on the Command field given by ps] , one per line.
If for example you are running the apache httpd daemon then just put httpd in a line in the tiaProcess.cfg file and restart topitall. One of the
parameters plotted is the number of processes matched by each line in the tiaProcess.cfg file, and so can be used to see if the process, or
application is running or not. If you wish to monitor a whole application then you may be able to enter a string or pattern to match all the
processes in the application. For example all your processes may start with ora_ and so will be matched by a line containing ora_. If your
regular expressions are anything like mine then you will need to keep them very simple otherwise you will get confused results. You can use
the -v 2 option to see what is actually being matched.

The tiaAlert.cfg file is a bit more complicated so you should see the comments in the file for detailed description and examples but I will
give a summary here. Each parameter has a category : System, Disk or Process and a keyword to identify it. The category and the
keyword identify the measured parameter, and then an expression is defined from which, when it is satisfied at a 15 minute interval, an alert
is generated. The last two fields are the email to send the alert to and a comment to include in the body of the email. The alert subject line is
constructed from the expression as defined in the file and will be more less in plain English. The alert subject line is also logged locally to the
messages file using the local6.info syslog facility.

For example this line will generate an alert when the Load Average is greater than 5 and send an email to the user jbelshaw@jcbhp with
the comment in the body of the email. :
System LoadAv >5 jbelshaw@jcbhp # System is getting pretty busy

And this line will generate an alert for the process httpd when the number of process drops below 4:
httpd N <4 webmaster@localhost # Web server processes have stopped

here are two more example lines

netscape Mem >=55 jbelshaw@jcbhp # netscape is using a lot of Memory
Disk /dev/dsk/c0t0d0s0 >=65 jbelshaw@jcbhp # Boot disk getting full

The process httpd and netscape must have been included in the tiaProcess.cfg file for these process examples to work.

What time resolutions is the data is kept for ?

One day is kept at 1 minute, 28 days at 15 minute and 1 year at 6 hour resoluton.

How can I configure the data plots ?

The order of the output plots can be changed by copying the plot.cfg file in the tia directory to a new name and then
editing the line containing the list  of parameters to the  order you desire.  If you miss parameters out they will not
be plotted.  You invoke topitall with the option topitall -plot -config   filename

How can I stop Topitall collecting data for every parameter ?

Topitall generates the list of parameters dynamically when it is run.  You can  prevent data being written by creating a read only file
of zero length in the tia directory.
 
I have changed the config files but nothing has changed what is wrong ?

Topitall only reads the files when it is started so you need to kill topitall and restart it.

Topitall Server

The server can be run on a separate host or a client host.  The script name is tiaServer, and accepts a -v option to display debugging information, and a -port portnumber option to change the default tcp port..  The server listens on the port and  forks a separate process to handle incoming requests and so will be able to handle many clients.  The server files live in the server directory and include a cgi directory where the web-scripts are found.  This directory will have to be added as a cgi directory in your web server with the ScriptAlias 'tia'.  The graph images are generated in the directory server/images and an alias tiaIamges should also be defined for this.

eg. for Apache :
Alias /tiaImages/   "/home/topitall/server/images/"
ScriptAlias /tia/       "/home/topitall/server/cgi/'

The main cgi script is monitor.cgi and all the data is accesssed from there. ie http://your-host/tia/monitor.cgi.  the server/images directory must be writeable by the userid your web server uses.

The server/RRD directory contains RRD files, one per parameter.  Each file is approximately 46K bytes in size and this will guide how many clients and how many parameters per host you will configure.

Each client has a server.cfg file in the topitall directory.  here is an example of this file :

Host jcb
Port 12345
Meas  perl.N perl.Cpu perl.Mem top.N top.Cpu top.Mem httpd.N httpd.Cpu httpd.Mem named.N named.Cpu named.Mem gnome.N gnome.Cpu gnome.Mem Nrun MemSwap MemFree MemBuff MemCache SwapIn SwapOut IOin IOout CPUuser CPUsystem CPUidle NProc Users LoadAv Uptime Mem eth0.RXok eth0.TXok _dev_hda5

The first line specifies the name of the server host
The second lne specifies the tcp port which the server is listening
The third line lists the parameters which are to be sent to the server, this line must start with keyword Meas.

The list of all possible parameters can be found in the file tia/plot.cfg which is created by the client process.

The client process on each client needs to be started with topitall -daemon -server to tell it to attempt to connect to the server. The topitall client can be started with an additional parameter -update Nminutes which specifes the update frequency.  The client measures the parameters every minute but they can be uploaded to the server at a lower frequency, I would suggest something in the range of 2-15 minutes.  the default is 5 minutes.  The client also processes the alert information every minute but this is a probably a bit too quick to receive an email, so Topitall uses the update frequency to send Alerts.  It includes a number specifying how many times the alert was triggered during the update time. eg. 3/5. This gives information on whether the condition is sporadic or not.

That's it.  When the server is running and the client is started with topitall -daemon -server the data should be gathered in the server directories and the cgi script will then display meaningfull data.
 

User defined Alert Handlers

The directory alertHandlerscontains perl subroutine files which are read when the client is started.  You should look at the comments in the alertDefault.pl and alerMail.pl to write your own.  You write a subroutine whose name is derived from the alert condition and then it will be called when the alert is generated. A typical subroutine name would be EHnamed_N_1 to match the alert named N <1, or EHSystem_LoadAv_2 to match the alert System LoadAv >2.   If these subrotuines fail then the client will also fail, so write them defensively.

Alert database

The server collects all the alerts in the file server/Alert/db.Alert as a flat text file.  The last ten Alerts per host are stored in the file server/Alert.hostname.  It would be quite easy to store the data in a database but this is beyond the current scope of this tool,  mainly as it generates a huge aamount of work downstream in configuring access to the data in the database.
 

Caveats

BUGS

Version 1.x is not designed to run over a year and so the data files will overwrite.

Appendix

System parameters measured

All Platforms
Name Description
Mem Percent Memory used [not very accurate]
MemFree Memory Free KB
MemSwap Swap Used KB
Uptime System uptime
Users  Number of users
LoadAv  15 min Load average
NProc  Total number of processes 
CPUuser  Percent CPU used by user processes
CPUsystem Percent CPU used by system or kernel
CPUidle  Percent CPU not used
Nrun Number of running processes
Nsleep Number of sleeping processes
Nswapped Number of swapped processes
SwapIn KB/s swapped in
SwapOut KB/s swapped out
Interrupts Number of Interrupts/s including clock
ContextSwitches Number of context switches /s
 
Linux Specific
IOin Block/s IO in
IOout Block/s out
MemCache Cache Used
 
Solaris Specific
PageReclaims
MinorFaults
SwapFreed
MemShortfall
ScanRate
disk0
disk1
disk2
disk3

Disks

Percentage Free space for all locally mounted disk, identified by /dev/nodename

Process, per match

Name Description
Cpu % CPU used
Mem % Memory used
N Number of processes matched