AWK


AWK is a programming language that is mainly used for extracting information from text and data files. Its name is derived from the names of its authors; Aho, Kerninghan and Weinberger, although it may have been written by one of them who then roped in two of his friends to furnish a suitable acronym.

The main advantage of AWK is that complicated text processing tasks can be accomplished with a minimum of instructions .

AWK programs are based on the idea of pattern and action; the program scans a document looking for a pattern and when found it performs the action.

Say for instance you have just received a 10,000 line e-mail, your correspondent being the reincarnation of Edmund Burke. The text of this e-mail resides in your mail directory/folder and is called longmail.txt. Eager to wade through this tome and print all lines that mention the word love you could use the following single line AWK program.

    awk '/love/ {print $0}' longmail.txt

The pattern here is love and the action is to print all lines that contain it to the screen.

Spurred on by the confirmation of true love, or otherwise, you decide to print out the entire document except for any lines that mention the weather. This could be accomplished with the following single line AWK program.

    awk '! /weather/ {print $0}' longmail.txt

The pattern here is anything but weather and the action is to print the corresponding lines.

Although the syntax is initially complicated it is always concise. In fact these last two programs could have been written like this;

    awk '/love/' longmail.txt

    awk '! /weather/' longmail.txt

This is because printing the line that contains the matched pattern is AWK's default action.

Finally, suppose you wanted to count the number of lines and the number of words in the email. The following is all that is required;

    awk '{Lines = NR; Words += NF } END{ print Lines, Words }' longmail.txt

This example is given without explanation mainly to demonstrate the succinctness of AWK. This only becomes clear when you try to implement the same function in a real programming language (that is to say C not Basic).

AWK comes as standard with Unix but is also available under Dos. There are two free versions available as part of the GNU project; Gawk and Mawk, respectively GNU's AWK and Mike's AWK . Mawk is quicker but Gawk has more functions and a manual.


Enough of this AWK nonsense