If you want to know to know how your Unix boxes are spending their time, whilst you are not watching, for example when you are off sleeping, eating or sailing to far off places then Topitall could be for you.
Topitall also can be easily configured to generate alerts by syslog and email based on any of the collected parameters and so act as a simple Enterprise System Monitoring tool. For example, Topitall could be used to detect system resource problems such as runaway processes, over or under used file system capacity, or when a process or application has exceeded a system resource threshold. What more do you really need ?
So what do the graph's look like ? here is an example with just three graphs, a normal page will have over forty graphs, plus three graphs per configured process/application.
Topitall will currently will run on any system as a perl script where perl5, df, vmstat, ps and gnuplot are installed. It should be possible to get it to run under NT/win2000 with various NT tools.
Topitall is designed to be trivial to install, and compared to most other traditional and commercial commercial Enterprise System Management software, easy to customize for your own particular applications.
Topitall is implemented in JABOPS [Just A Bunch Of Perl Scripts] using the KISS [Keep it Simple, Stupid] paradigm, and so is reliable and easy to customize and extend. The topitall daemon is quite efficient and so will not, in itself, add greatly to the system load.
Topitall executable has three main functions. The data collection agent, topitall -daemon, is designed to run permanently, gathering data averaged over 15 minutes. Basic system data is collected and then for configured processes or groups of processes, the number of processes running and cpu and memory usage is collected. The plotting tool topitall -plot, takes the data collected by the daemon and produces time series graphs in gif format and writes a html file. The time period plotted can be specified as a day, a week, a four week month or a whole year. the plotting tool does use some significant cpu load on the system to produce the graphs, but this can be scheduled to run as often or as little as you desire. The last function topitall -prune, reduces the number of data points according to the rules in the prune.cfg file.
Named has stopped, check slave server for dns !
Automatic Mail from Topitall
Just download the tgz file, gunzip it and untar it in a normal
users directory. I would not reccommend running or installing
Topitall as a root user. The script topitall.pl should
be run with perl topitall.pl -options. There is an example rc script
to start Topitall on boot , and an example crontab
file to generate the html files automagically and purge old data using
the prune script. The utilities top, netstat, gnuplot and df must
be in the users path. For alerts to work logger must be in the path
and sendmail should be in /usr/lib/sendmail and configured so as to be
able to send mail.
There are three configuration files
The tiaProcess.cfg file specifies the names of the processes [in fact a regular expression on the Command field given by ps] , one per line. If for example you are running the apache httpd daemon then just put httpd in a line in the tiaProcess.cfg file and restart topitall. One of the parameters plotted is the number of processes matched by each line in the tiaProcess.cfg file, and so can be used to see if the process, or application is running or not. If you wish to monitor a whole application then you may be able to enter a string or pattern to match all the processes in the application. For example all your processes may start with ora_ and so will be matched by a line containing ora_. If your regular expressions are anything like mine then you will need to keep them very simple otherwise you will get confused results. You can use the -v 2 option to see what is actually being matched.
The tiaAlert.cfg file is a bit more complicated so you should see the comments in the file for detailed description and examples but I will give a summary here. Each parameter has a category : System, Disk or Process and a keyword to identify it. The category and the keyword identify the measured parameter, and then an expression is defined from which, when it is satisfied at a 15 minute interval, an alert is generated. The last two fields are the email to send the alert to and a comment to include in the body of the email. The alert subject line is constructed from the expression as defined in the file and will be more less in plain English. The alert subject line is also logged locally to the messages file using the local6.info syslog facility.
For example this line will generate an alert when the Load Average is
greater than 5 and send an email to the user jbelshaw@jcbhp with the comment
in the body of the email. :
System LoadAv >5 jbelshaw@jcbhp # System is getting
pretty busy
And this line will generate an alert for the process httpd when the
number of process drops below 4:
httpd N <4 webmaster@localhost # Web server
processes have stopped
here are two more example lines
netscape Mem >=55 jbelshaw@jcbhp # netscape is using
a lot of Memory
Disk /dev/hda1 >=65 jbelshaw@jcbhp # Boot disk getting full
The process httpd and netscape must have been included in the tiaProcess.cfg file for the process examples to work.
The prune.cfg is only used when the prune executable is run. It is designed to reduce the numer of data points stored in each data file for data older than a number specified days. If a parameter is not mentioned in the file then data is never reduced. I suggest running prune manually when you wish to reduce the amount of data stored and then decide on prune policy, edit the prune.cfg file and put an entry in the crontab to run it occasionally, say once a week.
If you need help in rolling out topitall in an enterprise environment I can offer my services to configure your config files for you.
The alerts generated can only be sent by mail if /usr/lib/sendmail is
correctly configured.
The alerts generated will be logged by local6.info and require syslog
to be configured.
vmstat generates different outputs on different platforms, and so the data collected is different.
Some older versions of gnuplot do not support the gif terminal type and so will not produce the graphs.
The commands must be in the path of the user running topitall.
BUGS
Version 1 is not designed to run over a year and so the data files will
overwrite.
Name | Description |
Mem | Percent Memory used [not very accurate] |
MemFree | Memory Free KB |
MemSwap | Swap Used KB |
Uptime | System uptime |
Users | Number of users |
LoadAv | 15 min Load average |
NProc | Total number of processes |
CPUuser | Percent CPU used by user processes |
CPUsystem | Percent CPU used by system or kernel |
CPUidle | Percent CPU not used |
Nrun | Number of running processes |
Nsleep | Number of sleeping processes |
Nswapped | Number of swapped processes |
SwapIn | KB/s swapped in |
SwapOut | KB/s swapped out |
LoadAv | 15 minute load average |
Interrupts | Number of Interrupts/s including clock |
ContextSwitches | Number of context switches /s |
IOin | Block/s IO in |
IOout | Block/s out |
MemCache | Cache Used |
PageReclaims | |
MinorFaults | |
SwapFreed | |
MemShortfall | |
ScanRate | |
disk0 | |
disk1 | |
disk2 | |
disk3 |
Name | Description |
Cpu | % CPU used |
Mem | % Memory used |
N | Number of processes matched |