Download presentation
Presentation is loading. Please wait.
Published byMadlyn Wilson Modified over 9 years ago
1
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-4954 “UNIX for Programmers and Users” Third Edition, Prentice-Hall, GRAHAM GLASS, KING ABLES Slides partially adapted from Kumoh National University of Technology (Korea) and NYU
2
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-49542 The awk utility scans one or more files and an action on all of the lines that match a particular condition. The actions and conditions are described by an awk program and range from the very simple to the complex. awk got its name from the combined first letters of its authors’ surnames: Aho, Weinberger, and Kernighan. It borrows its control structures and expression syntax from the language C. Programmable Text Processing with awk Aho Weinberger Kernighan
3
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-49543 awk awk's purpose: A general purpose programmable filter that handles text (strings) as easily as numbers –this makes awk one of the most powerful of the Unix utilities A programming language for handling common data manipulation tasks with only a few lines of code awk is a pattern-action language awk processes fields The language looks a little like C but automatically handles input, field splitting, initialization, and memory management –Built-in string and number data types –No variable type declarations awk is a great prototyping language –start with a few lines and keep adding until it does what you want awk gets it’s input from –files –redirection and pipes –directly from standard input nawk (new awk) is the new standard for awk –Designed to facilitate large awk programs –gawk is a free nawk clone from GNU
4
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-49544 awk Program An awk program is a list of one or more commands of the form: [ pattern ] [ \{ action \} ] For example: BEGIN { print "List of html files:" } /\.html$/ { print }---> “/” then “\.” then “html” then “$” END { print "There you go!" } action is performed on every line that matches pattern (or condition in other words). If pattern is not provided, action is performed on every line. If action is not provided, then all matching lines are simply sent to standard output. Since patterns and actions are optional, actions must be enclosed in braces to distinguish them from pattern. The statements in an awk program may be indented and formatted using spaces, tabs, and new lines.
5
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-49545 awk: Patterns and Actions Search a set of files for patterns. Perform specified actions upon lines or fields that contain instances of patterns. Does not alter input files. Process one input line at a time Every program statement has to have a pattern or an action or both Default pattern is to match all lines Default action is to print current record Patterns are simply listed; actions are enclosed in { } awk scans a sequence of input lines, or records, one by one, searching for lines that match the pattern –meaning of match depends on the pattern
6
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-49546 awk: Patterns Selector that determines whether action is to be executed pattern can be: the special token BEGIN or END extended regular expressions (enclosed with //) arithmetic relation operators string-valued expressions arbitrary combination of the above: /CSUN/ matches if the string “CSUN” is in the record x > 0 matches if the condition is true /CSUN/ && (name == "UNIX Tools")
7
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-49547 Special awk Patterns: BEGIN, END BEGIN and END provide a way to gain control before and after processing, for initialization and wrap-up. BEGIN: actions are performed before the first input line is read. END: actions are done after the last input line has been processed. BEGIN { print "List of html files:" } /\.html$/ { print } END { print "There you go!" }
8
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-49548 awk: Actions action is a list of one or more of the following kinds of C-like statements terminated by semicolons: if ( conditional ) statement [ else statement ] while ( conditional ) statement for ( expression; conditional; expression ) statement break continue variable = expression print [ list of expressions ] [>expression] printf format [, list of expressions ] [>expression] next(skips the remaining patterns on the current line of input) exit(skips the rest of the current line) { list of statements } action may include arithmetic and string expressions and assignments and multiple output streams.
9
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-49549 awk: An Example $ ls | awk ' BEGIN { print "List of html files:" } /\.html$/ { print } END { print "There you go!" } ‘ List of html files: index.html as1.html as2.html There you go! $ _
10
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495410 awk: Variables awk scripts can define and use variables BEGIN { sum = 0 } { sum ++ } END { print sum } Some variables are predefined: NR - Number of records processed NF - Number of fields in current record FILENAME - name of current input file FS - Field separator, space or TAB by default OFS - Output field separator, space by default ARGC/ARGV - Argument Count, Argument Value array –Used to get arguments from the command line
11
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495411 awk: Records Default record separator is newline –by default, awk processes its input a line at a time. Could be any other regular expression. Special variable RS: record separator –can be changed in BEGIN action Special variable NR is the variable whose value is the number of the current record.
12
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495412 awk: Fields Each input line is split into fields. Special variable FS: field separator: default is whitespace (1 or more spaces or tabs) awk –Fc –sets FS to the character c –can also be changed in BEGIN $0 is the entire line $1 is the first field, $2 is the second field, …., $NF is the last field Only fields begin with $, variables are unadorned
13
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495413 awk: Simple Output From AWK Printing Every Line –If an action has no pattern, the action is performed to all input lines { print } will print all input lines to standard out { print $0 } will do the same thing Printing Certain Fields –multiple items can be printed on the same output line with a single print statement { print $1, $3 } –expressions separated by a comma are, by default, separated by a single space when output
14
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495414 awk: Output (continued) Special variable NF: number of fields –Any valid expression can be used after a $ to indicate the contents of a particular field –One built-in expression is NF: number of fields { print NF, $1, $NF } –will print the number of fields, the first field, and the last field in the current record { print $(NF-2) } –prints the third to last field Computing and Printing –You can also do computations on the field values and include the results in your output { print $1, $2 * $3 }
15
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495415 awk: Output (continued) Printing Line Numbers –The built-in variable NR can be used to print line numbers { print NR, $0 } –will print each line prefixed with its line number Putting Text in the Output –you can also add other text to the output besides what is in the current record { print "total pay for", $1, "is", $2 * $3 } –Note that the inserted text needs to be surrounded by double quotes
16
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495416 awk: Fancier Output Lining Up Fields –like C, Awk has a printf function for producing formatted output –printf has the form: printf( format, val1, val2, val3, … ) { printf(“total pay for %s is $%.2f\n”, $1, $2 * $3) } –when using printf, formatting is under your control so no automatic spaces or newlines are provided by awk. You have to insert them yourself. { printf(“%-8s %6.2f\n”, $1, $2 * $3 ) }
17
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495417 awk: Selection Awk patterns are good for selecting specific lines from the input for further processing Selection by Comparison $2 >= 5 { print } Selection by Computation $2 * $3 > 50 { printf(“%6.2f for %s\n”, $2 * $3, $1) } Selection by Text Content $1 == “CSUN" /CSUN/ Combinations of Patterns $2 >= 4 || $3 >= 20 Selection by Line Number NR >= 10 && NR <= 20
18
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495418 awk: Arithmetic and Variables awk variables take on numeric (floating point) or string values according to context. User-defined variables are unadorned (they need not be declared). By default, user-defined variables are initialized to the null string which has numerical value 0. awk Operators: =assignment operator; sets a variable equal to a value or string ==equality operator; returns TRUE is both sides are equal !=inverse equality operator &&logical AND ||logical OR !logical NOT, =relational operators +, -, /, *, %, ^arithmetic String concatenation
19
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495419 awk: Arithmetic and Variables Examples Counting is easy to do with Awk $3 > 15 { emp = emp + 1}# work hours are in the third field END { print emp, “employees worked more than 15 hrs”} Computing sums and averages is also simple { pay = pay + $2 * $3 }# $2 pay per hour, $3 - hours END { print NR, “employees” print “total pay is”, pay print “average pay is”, pay/NR }
20
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495420 awk: Handling Text One major advantage of awk is its ability to handle strings as easily as many languages handle numbers awk variables can hold strings of characters as well as numbers, and Awk conveniently translates back and forth as needed This program finds the employee who is paid the most per hour: # Fields: employee, payrate $2 > maxrate { maxrate = $2; maxemp = $1 } END { print “highest hourly rate:”, maxrate, “for”, maxemp }
21
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495421 awk: String Manipulation String Concatenation –new strings can be created by combining old ones { names = names $1 " " } END { print names } Printing the Last Input Line –although NR retains its value after the last input line has been read, $0 does not { last = $0 } END { print last }
22
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495422 awk: Built-In Functions awk contains a number of built-in functions. Arithmetic –sin, cos, atan, exp, int, log, rand, sqrt String –length, substitution, find substrings, split strings Output –print, printf, print and printf to file Special –system - executes a Unix command e.g., system(“clear”) to clear the screen Note double quotes around the Unix command –exit - stop reading input and go immediately to the END pattern-action pair if it exists, ot herwise exit the script
23
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495423 awk: Built-in Functions Example: Counting lines, words, and characters using length (a poor man’s wc): { nc = nc + length($0) + 1 nw = nw + NF } END { print NR, "lines,", nw, "words,", nc, "characters" } substr(s, m, n) produces the substring of s that begins at position m and is at most n characters long.
24
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495424 awk: Control Flow Statements awk provides several control flow statements for making decisions and writing loops if-then-else $2 > 6 { n = n + 1; pay = pay + $2 * $3 } END { if (n > 0) print n, "employees, total pay is", pay, "average pay is", pay/n else print "no employees are paid more than $6/hour" }
25
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495425 awk: Loops while # interest1 - compute compound interest # input: amount, rate, years # output: compound value at end of each year {i = 1 while (i <= $3) { printf(“\t%.2f\n”, $1 * (1 + $2) ^ i) i = i + 1 } do-while do { statement1 } while (expression) for # interest2 - compute compound interest # input: amount, rate, years # output: compound value at end of each year { for (i = 1; i <= $3; i = i + 1) printf("\t%.2f\n", $1 * (1 + $2) ^ i) }
26
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495426 awk: Arrays Array elements are not declared Array subscripts can have any value: –numbers –strings! (associative arrays) arr[3]="value" grade["Korn"]=40.3 Example # reverse - print input in reverse order by line { line[NR] = $0 } # remember each line END { for (i=NR; (i > 0); i=i-1) { print line[i] } }
27
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495427 awk: Examples In the following example, we run a simple awk program on the text file “float” to insert the number of fields into each line: $ cat float--> look at the original file. Wish I was floating in blue across the sky, My imagination is strong, And I often visit the days When everything seemed so clear. Now I wonder what I’m doing here at all… $ awk `{ print NF, $0 }` float--> execute the command. 9 Wish I was floating in blue across the sky, 4 My imagination is strong, 6 And I often visit the days 5 When everything seemed so clear. 9 Now I wonder what I’m doing here at all… $ _
28
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495428 awk: Examples We run a program that displayed the first, third, and last fields of every line: $ cat awk2--> look at the awk script. BEGIN { print “Start of file:”, FILENAME } { print $1 $3 $NF }--> print first, third and last fields. END { print “End of file” } $ awk -f awk2 float--> execute the script. Start of file: float Wishwassky, Myisstrong, Andoftendays Whenseemdedclear. Nowwonderall… End of file $ _
29
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495429 awk: Examples In the next example, we run a program that displayed the first, third, and last fields of lines 2 and 3 of “float”: $ cat awk3--> look at the awk script. NR > 1 && NR < 4 { print NR, $1, $3, $NF } $ awk -f awk3 float--> execute the script. 2 My is strong, 3 And often days $ _
30
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495430 awk: Examples A variable’s initial value is a null string or zero, depending on how you use it. In the next example, the program counts the number of lines and words in a file as it echoed the lines to standard output: $ cat awk4--> look at the awk script. BEGIN { print “Scanning file” } { printf “line %d: %s\n”, NR, $0; lineCount++; wordCount += NF; } END {printf “lines = %d, words=%d\n”, lineCount, wordCount}
31
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495431 awk: Examples $ awk -f awk4 float --> exeute the script. Scanning file line 1 : Wish I was floating in blue across the sky, line 2 : My imagination is strong, line 3 : And I often visit the days line 4 : When everything seemed so clear. line 5 : Now I wonder what I’m doing here at all… lines = 5, words = 33 $ _
32
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495432 awk: Examples In the following example, we print the fields in each line in reverse order: $ cat awk5--> look at the awk script. { for ( i=NF; i>=1; i-- ) printf “%s ”, $i; printf “\n”; } $ awk -f awk5 float--> execute the script. sky, the across blue in floating was I wish strong, is imagination My days the visit often I And clear, so seemed everything When all… at here doing I’m what wonder I Now $ _
33
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495433 awk: Examples In the next example, we display all of the lines that contained a t followed by an e, with any number of characters in between. $ cat awk6--> look at the script. /t.*e/ { print $0 } $ awk -f awk6 float--> execute the script. Wish I was floating in blue across the sky, And I often visit the days When everything seemed so clear. Now I wonder what I’m doing here at all… $ _
34
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495434 awk: Examples A condition may be two expressions separated by a comma. In this case, awk performs action on every line from the first line that matches the first condition to the next line that satisfies the second condition: $ cat awk7--> look at the awk script. /strong/, /clear/ { print $0 } $ awk -f awk7 float--> execute the script. My imagination is strong, --> first line of the range And I often visit the days When everything seemed so clear. --> last line of the range $ _
35
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495435 awk: Examples In the next example, we process a file whose fields are separated by colons: $ cat awk3--> look at the awk script. NR > 1 && NR < 4 { print $1, $3, $NF } $ cat float2--> look at the input file. Wish:I:was:floating:in:blue:across:the:sky, My:imagination:is:strong, And:I:often:visit:the:days When:I:wonder:what:I’m:doing:here:at:all… Now:I:wonder:what:I’m:doing:here:at:all… $ awk -F: -f awk3 float3--> execute the script. My is strong, And often days $ _
36
Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun.edu Phone: 818-677-495436 awk: Examples Here’s an example of the use of some built-in functions: $ cat test--> look at the input file. 1.1 a 2.2 at 3.3 eat 4.4 beat $ cat awk8--> look at the awk script. { printf “$1 = %g ”, $1 printf “exp = %.2g “, exp($1); printf “log = %.2g “, log($1); printf “sqrt = %.2g “, sqrt($1); printf “int = %d “, int($1); printf “substr( %s,1,2) = %s \n”, $2, substr( $2,1,2); } $ awk -f awk8 test--> execute the script. $1=1.1 exp=3 log=0.095 sqrt=1 int =1 substr(a,1,2)=a $1=2.2 exp=9 log=0.79 sqrt=1.5 int=2 substr(at,1,2)=at $1=3.3 exp=27 log=1.2 sqrt=1.8 int=3 substr(eat,1,2)=ea $1=4.4 exp=81 log=1.5 sqrt=2.1 int=4 substr(beat,1,2)=be $ _
37
Programmable Text Processing with awk awk challenge
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.