Topitall

Introduction

Topitall is a Unix system monitoring tool designed to run as an agent, or daemon, to gather, and display as html, time series graphs of the most useful system resource information.   Topitall uses the common unix system tools, vmstat, ps, df and uptime to gather system information every 15 minutes and uses gnuplot to display them as a series of time-series graphs.   The parameters collected are basic system parameters such as memory, swap and cpu usage, per process or application, and file system useage.

If you want to know to know how your Unix boxes are spending their time, whilst you are not watching, for example when you are off sleeping, eating or sailing to far off places then Topitall could be for you.

Topitall also can be easily configured to generate alerts by syslog and email based on any of the collected parameters and so act as a simple Enterprise System Monitoring tool.  For example,  Topitall could be used to detect system resource problems such as runaway processes,   over or under used file system capacity,  or  when a process or application has exceeded a system resource threshold.   What more do you  really need ?

So what do the graph's look like ? here is an  example with just three graphs,  a normal page will have over forty graphs, plus three graphs per configured process/application.

Topitall will currently will run on any system as a perl script where perl5, df, vmstat, ps and gnuplot are installed.  It should be possible to get it to run under NT/win2000 with various NT tools.

Topitall is designed to be trivial to install, and compared to most other traditional  and commercial commercial Enterprise System Management software, easy to customize for your own particular applications.

Topitall is implemented in JABOPS [Just A Bunch Of Perl Scripts] using the KISS [Keep it Simple, Stupid] paradigm,  and so is reliable and easy to customize and extend.  The topitall daemon is quite efficient and so will not, in itself, add greatly to the system load.

Topitall executable has three main functions.  The data collection agent, topitall -daemon, is designed to run permanently, gathering data averaged over 15 minutes.  Basic system data is collected and then for configured processes or groups of processes, the number of processes running and cpu and memory usage is collected.  The plotting tool topitall -plot, takes the data collected by the daemon and produces time series graphs in gif format and writes a html file.  The time period plotted can be specified as a day, a week, a four week month or a whole year. the plotting tool does use  some significant cpu load on the system to produce the graphs, but this can be scheduled to run as often or as little as you desire. The last function topitall -prune,  reduces the number of data points according to the rules in the prune.cfg file.

Here is an example Alert email :

Subject:
         jcbhp named : N <1
   Date:
         Fri, 28 Dec 2001 08:54:03 GMT
   From:
         John Belshaw <jbelshaw@localhost.localdomain>

 Named has stopped, check slave server for dns !

Automatic Mail from Topitall

Here is an example message

Dec 28 08:51:08 jcbhp Topitall: Alert named N <1
 

Installation


Just download the tgz file,  gunzip it and untar it in a normal users directory.  I would not reccommend running or installing  Topitall  as a root user.   The script topitall.pl should be run with perl topitall.pl -options.  There is an example rc script to start Topitall on boot , and an example crontab file to generate the html files automagically and purge old data using the prune script.  The utilities top, netstat, gnuplot and df must be in the users path.  For alerts to work logger must be in the path and sendmail should be in /usr/lib/sendmail and configured so as to be able to send mail.

Download

You can download topitall here.

Quick Start

Simply run topitall  -daemon in the background as the user you installed it and data will start to be writen in the tia directory [data directory, one file per parameter].   If you use the -q option  topitall will run in quick mode, where data will be gathered one point per ten seconds with butchered time stamps so you can fine tune your config files.  To see the results run plotitall -day and  files will be created in the html directory including the hostname.day.html file which is the one you should look at first.

There are three configuration files

The tiaProcess.cfg file specifies the names of the processes [in fact a regular expression on the Command field given by ps] , one per line.   If for example you are running the apache httpd daemon then just put httpd in a line in the tiaProcess.cfg file and restart topitall.   One of the parameters  plotted is the number of processes matched by each line in the tiaProcess.cfg file, and so can be used to see if the process, or application is running or not.    If you wish to monitor a whole  application then you may be able to enter a string or pattern to match all the processes in the application.  For example all your processes may start with ora_ and so will be matched by a line containing ora_.  If your regular expressions are anything like mine then you will need to keep them very simple otherwise you will get confused results.  You can use the -v 2 option to see what is actually being matched.

The tiaAlert.cfg file is a bit more complicated so you should see the comments in the file for detailed description and examples but I will give a summary here.   Each parameter has a  category :  System, Disk or Process and a keyword to identify it.    The category and the keyword identify the measured parameter,  and then an expression is defined from which, when it  is satisfied at a 15 minute interval, an alert is generated.  The last two fields are the email to send the alert to and a comment to include in the body of the email.  The alert subject line is constructed from the expression as defined in the file and will be more less in plain English.  The alert subject line is also logged locally to the messages file using the local6.info syslog facility.

For example this line will generate an alert when the Load Average is greater than 5 and send an email to the user jbelshaw@jcbhp with the comment in the body of the email. :
System LoadAv   >5  jbelshaw@jcbhp # System is getting pretty busy

And this line will generate an alert for the process httpd when the number of process drops below 4:
httpd N   <4  webmaster@localhost # Web server processes have stopped

here are two more example lines
netscape Mem   >=55 jbelshaw@jcbhp # netscape is using a lot of Memory
Disk /dev/hda1  >=65 jbelshaw@jcbhp # Boot disk getting full

The process httpd and netscape must have been included in the tiaProcess.cfg file for the process examples to work.

The prune.cfg is only used when the prune executable is run.  It is designed to reduce the numer of data points stored in each data file for data older than a number specified days.  If a parameter is not mentioned in the file then data is never reduced.  I suggest running prune manually when you wish to reduce the amount of data stored and then decide on prune policy, edit the prune.cfg file and put an entry in the crontab to run it occasionally, say once a week.

Performance

The topitall agent is designed to run at a very low load.  When run on a normal speed laptop in Linux, running in quick mode it uses less than 1% of the cpu so I believe it will average at less than 0.01 % of cpu when running normally.  It has a menory footprint of circa 1.5 MB, 1.2 MB of which is shared memory [probably the perl core].

Comments

I wrote version 1 of topitall whilst bored sailing to the Seychelles on my laptop using RedHat Linux 6.2, and so I had limited resources to distract me.   If you like Topitall tell all your friends and get them to install it. If you have any comments or constructive criticism please send them to topitall@eircom.net and I will try and answer them.  If you have any success customising the config files for particular applications I would welcome seeing them and I will include them in future example files with an acknowledgement.

Restrictions

Topitall is distributed under the artistic License, so you can freely download it, use it and extend it as long as you  respec t my copyright. If you wish to use it in a commercial environment  and wish to receive support then you should contact the author John Belshaw at topitall@eircom.net .

If you need help in rolling out topitall in an enterprise environment I can offer my services to configure your config files for you.

Caveats

If vmstat, df, or ps hangs whilst being called from topitall then the agent will hang, and will need to be restarted when the problem is fixed.

The alerts generated can only be sent by mail if /usr/lib/sendmail is correctly configured.
The alerts generated will be logged by local6.info and require syslog to be configured.

vmstat generates different outputs on different platforms,  and so the data collected is different.

Some older versions of gnuplot do not support the gif terminal type and so will not produce the graphs.

The commands must be in the path of the user running topitall.
 

BUGS

Version 1 is not designed to run over a year and so the data files will overwrite.
 

Appendix

System parameters measured

All Platforms
 Name Description
Mem Percent Memory used [not very accurate]
MemFree Memory Free KB
MemSwap Swap Used KB
Uptime System uptime
Users  Number of users
LoadAv  15 min Load average
NProc  Total number of processes 
CPUuser  Percent CPU used by user processes
CPUsystem Percent CPU used by system or kernel
CPUidle  Percent CPU not used
Nrun Number of running processes
Nsleep Number of sleeping processes
Nswapped Number of swapped processes
SwapIn KB/s swapped in
SwapOut KB/s swapped out
LoadAv 15 minute load average
Interrupts Number of Interrupts/s including clock
ContextSwitches Number of context switches /s
Linux Specific
IOin Block/s IO in
IOout Block/s out
MemCache Cache Used
Solaris Specific
PageReclaims
MinorFaults
SwapFreed
MemShortfall
ScanRate
disk0
disk1
disk2
disk3

Disks

Percentage Free space for all locally mounted disk, identified by /dev/nodename

Process, per match

Name Description
Cpu % CPU used
Mem % Memory used
N Number of processes matched