AWK
awk text processing languge
awk Created for Unix by Aho, Weinberger and Kernighan Basicaly an: interpreted text processing programming language Updated versions NAWK New awk GAWK Free Software Foundation’s version
awk Basics Basic form: Can use regular expressions awk options 'selection criteria {action}' file(s) Can use regular expressions Files read one line at a time with contents as fields Fields are numbered ($1, $2, etc…) Entire line is $0 Can run standalone Can run as a program Uses a blank as the default separator
-f Option (stored awk programs) awk programs can be stored in a file awk –f awkfile datafile -f filename is the awk program datafile contains the data
Example Find the TAs in the personnel file The file is blank separated -F defines the delimiter Use “\ “ to escape the blank (a blank after the \) Note: the blank is the default seperator anyway Title is in the 3rd field # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # # awk -F\ '$3 == "TA" { print }' personnel.data
example To run an awk program personnel.data has the data findta.awk is the code Looks for TA (3rd parm) Prints first name and telephone number (1st and 5th parms) Note: what small formatting problem is here? # awk -F\ -f findta.awk personnel.data TAs Jinyue704-687-2222 Hadi704-687-3333 Done # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # cat findta.awk BEGIN { print "TAs"; } $3 == "TA" {print $1 $5} END { print "Done"
print and printf Output goes to std out print is unformatted can be redirected with > or | redirected name must be in quotes: # print $2, $1 | "sort" the output of the print goes to the sort routine print is unformatted printf allows formatting %s – string %-20s 20 char spaces, justified (-) %d – integer %8d set aside 8 spaces for the number %f – floating point %4.8f Set aside 4 chars to the left of the decimal point and 8 to the right printf needs \n to start new line
Number processing AWK supports basic computation Also supports: + - addition - - subtraction * - multiplication / - division % - modulus ^ - exponentiation Also supports: ++ - add one to itself (post and pre fix) += - add and assign to self -- - subtract one from self (post and pre fix) -= - subtract from self *= - multiply self /= - divide self
Variables and Expressions awk is loosely typed do not need to declare variables x = 5 do not need $ to use variables like sed or bash print x strings are double quoted x = "This is a string" no string concatenater, done by context x = "string1"; y = "string2" print x y Space is required some conversions done automatically x = "56"; y = 43; z = "abc" print x y # gives 5643 y converted to string print x + y # gives 99 + converts x to integer print y + z # gives 43 + converts z to integer 0
Comparison and Logical Operators awk supports string and numeric comparisons == is the equality operator = is for assignment < and > can be used on strings Beware of conversions when dealing with strings that consist of numbers ~ is used for regular expressions $2 ~ /[dh]og/ parameter 2 matches hog or dog
Comparison and Logical Operators awk supports boolean operations && - and || - or ! - not
simple comparison Field 6 is number of years with organization Find those with more than 5 years # awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.data Kombol, Tony:6 Flintstone, Fred:10 # # cat personnelyears.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 #
Regular Expression comparison example Find the TAs and RAs including the URAs # awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.data Jinyue Xia 704-687-2222 Hadi Hashemi 704-687-3333 Fred Flintstone 704-687-1212 Barney Rubble 704-687-3344 # # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 Fred Flintstone RA 800123321 704-687-1212 Barney Rubble URA 800112233 704-687-3344
BEGIN and END Sections BEGIN and END General format: Allows for some pre and post processing Both are optional General format: BEGIN { action } { action } END { action } BEGIN's actions are done before the processing of the datafile begins Good for headers, setup, etc. END's actions are done after the processing of the datafile ends Good for post processing, notes, etc.
another regular expression This is a more complex check using a file for the awk program Check to see the ID is 800…… That is 800 followed by 6 characters # cat findbadid.awk BEGIN { print "List of bad IDs follows"; } $4 !~ /^800....../ { print $1 " " $2 " has a bad id:" $4}; END { print "End of list"; # # cat personnelbad.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 Bad Id LX 809123456 704-687-8890 0 # awk -f findbadid.awk personnelbad.data List of bad IDs follows Bad Id has a bad id:809123456 End of list
awk file example # cat ckgrades.awk BEGIN { print "Listing Bs\n" } END { print "\nDone" # awk file example # awk -F: -f ckgrades.awk grades.data Listing Bs Tara Boomdea: 85:B Zorbax Bottlewit:88:B Done # # cat grades.data Fred Ziffle:99:A Arnold Ziffle: 55: F Tara Boomdea: 85:B Neo:100:A Buffy Summers: 72:C Sheldon Cooper:67:D Zorbon Prentwist: 88 : B Zorbax Bottlewit:88:B Bad Grade: 33: A Note: ": B" does not get matched
Positional Parameters Parameters are usually used as the fields of each line A parameter can be passed to the awk program Used with a shell program Must be in quotes in the program e.g. Instead of $4 > 12 4th parm in line is > 12 $4 > '$2' 4th parm in line is > 2nd parm passed to the program: prog.awk 50 82
Arrays awk supports arrays Arrays are associative arrays do not need to be "declared" "declared" the minute they are used Arrays are associative index can be numeric alphabetic thisday["Tue"] = "Tuesday"; thisday[2] = "Tuesday"; above are two array elements for the array thisday each reference a separate string printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ; printf("thisday[2] is %s", thisday[2]) ; Both will print "Tuesday" for the array referenced
Arrays ENVIRON[ ] an assosciative array containing all the environmental variables # awk 'BEGIN{for (env in ENVIRON)print env "=" ENVIRON[env]}' SSH_CLIENT=10.23.161.139 59365 22 HOME=/home/tkombol TERM=xterm LESSOPEN=| /usr/bin/lesspipe %s SHELL=/bin/bash USER=tkombol _=/usr/bin/awk SHLVL=1 PWD=/home/tkombol SSH_CONNECTION=10.23.161.139 59365 152.15.95.103 22 LANG=en_US.UTF-8 MAIL=/var/mail/tkombol LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36: HISTCONTROL=ignoredups PATH=/usr/local/bin:/usr/bin:/bin:/usr/games LESSCLOSE=/usr/bin/lesspipe %s %s LOGNAME=tkombol SSH_TTY=/dev/pts/2 #
Built-in Variables awk has a set of built-in variables Some can be overridden Built-In Variables Variable Function Default NR Cumulative # of lines read - FS Input Field Separator space OFS Output Field Separator OFMT Default FP format %.6f RS Record separator newline NF Number of fields in current line FILENAME Current input file ARGC Number of arguments in command line ARGV Array containing list of arguments ENVIRON Assoc. array of all environment variables
Functions awk has several built-in functions () are optional if no parms encouraged to use Arithmetic functions String functions
Arithmetic Functions int(x) sqrt(x)
String Functions length() length(x) tolower(s) toupper(s) length of complete line length(x) length of x tolower(s) returns s as lower case toupper(s) returns s as upper case substr(str,m) returns string starting at m to end of string substr(str,m,n) returns string starting at m for n characters index(s1,s2) finds the position of s2 inside s2 split(str,arr,ch) splits str int an array, the delimiter is ch system("cmd") exectutes a system (Linux) command and returns exit status
if Syntax: if (cond true) { statements } else { statements } Notes: else is optional {} not needed for single statements
for Syntax form 1: Syntax form 2: Example: for ( startval ; condition ; control ) statement C like in form Example: for ( k=1 ; k<9 ; k++ ) print k Syntax form 2: for ( var in array ) statement Will scan every var in the array Great for associative array Non numeric indices Gaps in array See ENVIRON example in previous slide
While Syntax: while (cond is true) { statement(s) }
continue and break Continue and break can be used to stop all loops for while break stops the loop continue stops processing statements in this loop continues to next iteration
Resources Awk - A Tutorial and Introduction - by Bruce Barnett http://www.grymoire.com/Unix/Awk.html Awk Tutorial - Main Page http://robert.wsi.edu.pl/awk/
Which is not a “scripting language: Auk Awk Perl Pearl Bash Bam
Summary awk is a "primative" scripting language good for processing text files filtering perl is a more modern replacement "religious war" over which is better if you understand awk it will be a good basis to understant perl