Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.


Chapter 12: gawk Yes it sounds funny

In this chapter … Intro Patterns Actions Control Structures Putting it all together

gawk? GNU awk awk == Aho, Weinberger and Kernighan Pattern processing language Filters data and generates reports

gawk con’t Syntax: gawk [options] [program] [file-list] gawk [options] –f program-file [file-list] Essentially, program is a list of things to pattern match, and then a list of actions to perform Can either be on the command line or in a file

gawk program A gawk program contains one or more lines in the format pattern { action } Pattern is used to determine which lines of data to select Action determines what to do with those lines Default pattern is all lines Default action is to print the line Use single quotes around program on CL

Patterns Simple numeric or string comparisons = > Regular expressions (see Appendix A) –The ~ operator matches pattern –The !~ operator does not match pattern Combinations using || (OR) and && (AND)

Patterns, con’t BEGIN – before any lines are processed END – after all lines are processed pattern1,pattern2 – a range, that starts with pattern 1, and ends with pattern2. After matching pattern2, gawk attempts to match pattern1 again

Variables $0 – the current record (line) $1-$n – fields in current record FS – input field separator (default: SPACE / TAB ) NF – number of fields in record NR – current record number RS – input record separator (default: NEWLINE ) OFS – output field separator ORS – output record separator

Associative Arrays A variable type similar to an array, but with strings as indexes (instead of integers) Ex –myAssocArray[name] = “Bob” –myAssocArray[hometown] = “Austin” Ex –studentGrades[ ] = 75 –studentGrades[ ] = 100

Pattern examples $1 ~ /^[A-Z]/ –Matches records where first field starts with a capital letter $3 <= $5 –Matches records where the third field is less than or equal to the fifth field $2 > 5000 && $1 !~ /exempt/ –Matches records where second field is greater than 5000 and first field is not exempt

Functions length(str) – returns length of str –Returns length of line if str omitted int(num) – returns integer portion of num tolower(str) – coverts chars to lower case toupper(str) – converts chars to upper case substr(str,pos,len) – returns substring of str starting at pos with length len

Actions Default action is print entire record Using print, can print out particular parts (i.e., fields) –Ex. { print $1 } Put literal strings in single quotes By default multiple parameters catenated –Use comma to use OFS Ex. { print $1, $5 }

Actions, con’t Separate multiple actions by semicolons Other actions usually involve variables (i.e., incrementors, accumulators) Variables need not be formally initialized By default set to zero or null Standard operators function normally * / % + - = = -= *= /= %=

Actions, con’t Instead of print you can use printf (c-style) Syntax: –printf “control-string”, arg1, arg2 … argn –control-string contains one or more conversion –%[-][[x].[y]]conv - – left justify x – min field width y – decimal places conv : d – decimal f – floating point s – string Ex: %.2f – floating point with two decimal places

Control Structures gawk programs can utilize several control structures Can use if-else, while, for, break and continue All are C-style in syntax (what did the K in gawk stand for?)

if … else Syntax: if (condition) { commands } else { commands }

while Syntax: while (condition) { commands }

for Syntax: for (init; condition; increment) { commands } You can use break and continue for both for and while loops

Examples gawk ‘{print}’ cars gawk ‘/chevy/’ cars gawk ‘{print $3, $1}’ cars gawk ‘/chevy/ {print $3, $1} cars gawk ‘$1 ~ /^h/’ cars gawk ‘2000 <= $5 && $5 < 9000’ cars gawk ‘/volvo/, /bmw/’ cars gawk ‘{print $3, $1, “$” $5}’ cars gawk ‘BEGIN {print “Car Info”}’ cars

Putting it all together BEGIN{ print " Miles" print "Make Model Year (000) Price" print \ " " } { if ($1 ~ /ply/) $1 = "plymouth" if ($1 ~ /chev/) $1 = "chevrolet" printf "%-10s %-8s %2d %5d $ %8.2f\n",\ $1, $2, $3, $4, $5 }

Results gawk -f printf_demo cars Miles Make Model Year (000) Price plymouth fury $ chevrolet malibu $ ford mustang $ volvo s $ ford thundbd $ chevrolet malibu $ bmw 325i $ honda accord $ ford taurus $ toyota rav $ chevrolet impala $ ford explor $

Associative Arrays gawk ‘ {manuf[$1]++} END {for(name in manuf) print name,\ manuf[name]}’ cars | sort bmw 1 chevy 3 ford 4 honda 1 plym 1 toyota 1 volvo 1

Standalone Scripts Alternative to issuing gawk –f at command line Just like making a shell script – first line defines what runs script #!/bin/gawk –f Then begin your patterns/actions

Advanced gawk getline - allows you to manually pull lines from input –Useful if you need to loop through data Coprocess – direct input or output through a second process, using |& operator Coprocess can be network based using /inet/tcp/0/URL