Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: 818-677-4954 “UNIX for Programmers and Users”

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
Gdb: GNU Debugger Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users” Third Edition, Prentice-Hall,
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Lecture 5 sed and awk. Last week Regular Expressions –grep (BRE) –egrep (ERE) Sed - Part I.
Linux+ Guide to Linux Certification, Second Edition
Lecture 5 Awk and Shell. Sed Drawbacks Hard to remember text from one line to another Not possible to go backward in the file No way to do forward references.
Lecture 5 sed and awk. Last week Regular Expressions –grep –egrep.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
Guide To UNIX Using Linux Third Edition
Guide To UNIX Using Linux Third Edition
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
Awk challanges Prof. Andrzej (AJ) Bieszczad Phone: awk Challenges Find out what the following awk scripts do: END.
Stream-Oriented, Non-Interactive EDitor sed Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and.
Sed and awk.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Introduction to Shell Script Programming
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Linux+ Guide to Linux Certification, Third Edition
Chapter 8: Arrays.
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Exam Revision Ruibin Bai (Room AB326) Division of Computer Science The University of Nottingham.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
1Computer Sciences Department Princess Nourah bint Abdulrahman University.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
CSC141 Introduction to Computer Programming Teacher: AHMED MUMTAZ MUSTEHSAN Lecture - 6.
1 LAB 4 Working with Trace Files using AWK. 2 Structure of Trace File.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
Alon Efrat Computer Science Department University of Arizona Unix Tools.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
Review of Awk Principles
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Linux+ Guide to Linux Certification, Second Edition
CSCI 330 UNIX and Network Programming Unit IX: awk II.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Linux Administration Working with the BASH Shell.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
Awk Programmable Filters 1.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
Lesson 5-Exploring Utilities
CSC 4630 Meeting 7 February 7, 2007.
Getting Started with C.
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
PHP.
Sed and awk.
Introduction to Bash Programming, part 3
Introduction to C Programming
Presentation transcript:

Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users” Third Edition, Prentice-Hall, GRAHAM GLASS, KING ABLES Slides partially adapted from Kumoh National University of Technology (Korea) and NYU

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: The awk utility scans one or more files and an action on all of the lines that match a particular condition. The actions and conditions are described by an awk program and range from the very simple to the complex. awk got its name from the combined first letters of its authors’ surnames: Aho, Weinberger, and Kernighan. It borrows its control structures and expression syntax from the language C. Programmable Text Processing with awk Aho Weinberger Kernighan

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk awk's purpose: A general purpose programmable filter that handles text (strings) as easily as numbers –this makes awk one of the most powerful of the Unix utilities A programming language for handling common data manipulation tasks with only a few lines of code awk is a pattern-action language awk processes fields The language looks a little like C but automatically handles input, field splitting, initialization, and memory management –Built-in string and number data types –No variable type declarations awk is a great prototyping language –start with a few lines and keep adding until it does what you want awk gets it’s input from –files –redirection and pipes –directly from standard input nawk (new awk) is the new standard for awk –Designed to facilitate large awk programs –gawk is a free nawk clone from GNU

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk Program An awk program is a list of one or more commands of the form: [ pattern ] [ \{ action \} ] For example: BEGIN { print "List of html files:" } /\.html$/ { print }---> “/” then “\.” then “html” then “$” END { print "There you go!" } action is performed on every line that matches pattern (or condition in other words). If pattern is not provided, action is performed on every line. If action is not provided, then all matching lines are simply sent to standard output. Since patterns and actions are optional, actions must be enclosed in braces to distinguish them from pattern. The statements in an awk program may be indented and formatted using spaces, tabs, and new lines.

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Patterns and Actions Search a set of files for patterns. Perform specified actions upon lines or fields that contain instances of patterns. Does not alter input files. Process one input line at a time Every program statement has to have a pattern or an action or both Default pattern is to match all lines Default action is to print current record Patterns are simply listed; actions are enclosed in { } awk scans a sequence of input lines, or records, one by one, searching for lines that match the pattern –meaning of match depends on the pattern

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Patterns Selector that determines whether action is to be executed pattern can be: the special token BEGIN or END extended regular expressions (enclosed with //) arithmetic relation operators string-valued expressions arbitrary combination of the above: /CSUN/ matches if the string “CSUN” is in the record x > 0 matches if the condition is true /CSUN/ && (name == "UNIX Tools")

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: Special awk Patterns: BEGIN, END BEGIN and END provide a way to gain control before and after processing, for initialization and wrap-up. BEGIN: actions are performed before the first input line is read. END: actions are done after the last input line has been processed. BEGIN { print "List of html files:" } /\.html$/ { print } END { print "There you go!" }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Actions action is a list of one or more of the following kinds of C-like statements terminated by semicolons: if ( conditional ) statement [ else statement ] while ( conditional ) statement for ( expression; conditional; expression ) statement break continue variable = expression print [ list of expressions ] [>expression] printf format [, list of expressions ] [>expression] next(skips the remaining patterns on the current line of input) exit(skips the rest of the current line) { list of statements } action may include arithmetic and string expressions and assignments and multiple output streams.

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: An Example $ ls | awk ' BEGIN { print "List of html files:" } /\.html$/ { print } END { print "There you go!" } ‘ List of html files: index.html as1.html as2.html There you go! $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Variables awk scripts can define and use variables BEGIN { sum = 0 } { sum ++ } END { print sum } Some variables are predefined: NR - Number of records processed NF - Number of fields in current record FILENAME - name of current input file FS - Field separator, space or TAB by default OFS - Output field separator, space by default ARGC/ARGV - Argument Count, Argument Value array –Used to get arguments from the command line

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Records Default record separator is newline –by default, awk processes its input a line at a time. Could be any other regular expression. Special variable RS: record separator –can be changed in BEGIN action Special variable NR is the variable whose value is the number of the current record.

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Fields Each input line is split into fields. Special variable FS: field separator: default is whitespace (1 or more spaces or tabs) awk –Fc –sets FS to the character c –can also be changed in BEGIN $0 is the entire line $1 is the first field, $2 is the second field, …., $NF is the last field Only fields begin with $, variables are unadorned

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Simple Output From AWK Printing Every Line –If an action has no pattern, the action is performed to all input lines { print } will print all input lines to standard out { print $0 } will do the same thing Printing Certain Fields –multiple items can be printed on the same output line with a single print statement { print $1, $3 } –expressions separated by a comma are, by default, separated by a single space when output

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Output (continued) Special variable NF: number of fields –Any valid expression can be used after a $ to indicate the contents of a particular field –One built-in expression is NF: number of fields { print NF, $1, $NF } –will print the number of fields, the first field, and the last field in the current record { print $(NF-2) } –prints the third to last field Computing and Printing –You can also do computations on the field values and include the results in your output { print $1, $2 * $3 }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Output (continued) Printing Line Numbers –The built-in variable NR can be used to print line numbers { print NR, $0 } –will print each line prefixed with its line number Putting Text in the Output –you can also add other text to the output besides what is in the current record { print "total pay for", $1, "is", $2 * $3 } –Note that the inserted text needs to be surrounded by double quotes

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Fancier Output Lining Up Fields –like C, Awk has a printf function for producing formatted output –printf has the form: printf( format, val1, val2, val3, … ) { printf(“total pay for %s is $%.2f\n”, $1, $2 * $3) } –when using printf, formatting is under your control so no automatic spaces or newlines are provided by awk. You have to insert them yourself. { printf(“%-8s %6.2f\n”, $1, $2 * $3 ) }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Selection Awk patterns are good for selecting specific lines from the input for further processing Selection by Comparison $2 >= 5 { print } Selection by Computation $2 * $3 > 50 { printf(“%6.2f for %s\n”, $2 * $3, $1) } Selection by Text Content $1 == “CSUN" /CSUN/ Combinations of Patterns $2 >= 4 || $3 >= 20 Selection by Line Number NR >= 10 && NR <= 20

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Arithmetic and Variables awk variables take on numeric (floating point) or string values according to context. User-defined variables are unadorned (they need not be declared). By default, user-defined variables are initialized to the null string which has numerical value 0. awk Operators: =assignment operator; sets a variable equal to a value or string ==equality operator; returns TRUE is both sides are equal !=inverse equality operator &&logical AND ||logical OR !logical NOT, =relational operators +, -, /, *, %, ^arithmetic String concatenation

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Arithmetic and Variables Examples Counting is easy to do with Awk $3 > 15 { emp = emp + 1}# work hours are in the third field END { print emp, “employees worked more than 15 hrs”} Computing sums and averages is also simple { pay = pay + $2 * $3 }# $2 pay per hour, $3 - hours END { print NR, “employees” print “total pay is”, pay print “average pay is”, pay/NR }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Handling Text One major advantage of awk is its ability to handle strings as easily as many languages handle numbers awk variables can hold strings of characters as well as numbers, and Awk conveniently translates back and forth as needed This program finds the employee who is paid the most per hour: # Fields: employee, payrate $2 > maxrate { maxrate = $2; maxemp = $1 } END { print “highest hourly rate:”, maxrate, “for”, maxemp }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: String Manipulation String Concatenation –new strings can be created by combining old ones { names = names $1 " " } END { print names } Printing the Last Input Line –although NR retains its value after the last input line has been read, $0 does not { last = $0 } END { print last }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Built-In Functions awk contains a number of built-in functions. Arithmetic –sin, cos, atan, exp, int, log, rand, sqrt String –length, substitution, find substrings, split strings Output –print, printf, print and printf to file Special –system - executes a Unix command e.g., system(“clear”) to clear the screen Note double quotes around the Unix command –exit - stop reading input and go immediately to the END pattern-action pair if it exists, ot herwise exit the script

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Built-in Functions Example: Counting lines, words, and characters using length (a poor man’s wc): { nc = nc + length($0) + 1 nw = nw + NF } END { print NR, "lines,", nw, "words,", nc, "characters" } substr(s, m, n) produces the substring of s that begins at position m and is at most n characters long.

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Control Flow Statements awk provides several control flow statements for making decisions and writing loops if-then-else $2 > 6 { n = n + 1; pay = pay + $2 * $3 } END { if (n > 0) print n, "employees, total pay is", pay, "average pay is", pay/n else print "no employees are paid more than $6/hour" }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Loops while # interest1 - compute compound interest # input: amount, rate, years # output: compound value at end of each year {i = 1 while (i <= $3) { printf(“\t%.2f\n”, $1 * (1 + $2) ^ i) i = i + 1 } do-while do { statement1 } while (expression) for # interest2 - compute compound interest # input: amount, rate, years # output: compound value at end of each year { for (i = 1; i <= $3; i = i + 1) printf("\t%.2f\n", $1 * (1 + $2) ^ i) }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Arrays Array elements are not declared Array subscripts can have any value: –numbers –strings! (associative arrays) arr[3]="value" grade["Korn"]=40.3 Example # reverse - print input in reverse order by line { line[NR] = $0 } # remember each line END { for (i=NR; (i > 0); i=i-1) { print line[i] } }

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples In the following example, we run a simple awk program on the text file “float” to insert the number of fields into each line: $ cat float--> look at the original file. Wish I was floating in blue across the sky, My imagination is strong, And I often visit the days When everything seemed so clear. Now I wonder what I’m doing here at all… $ awk `{ print NF, $0 }` float--> execute the command. 9 Wish I was floating in blue across the sky, 4 My imagination is strong, 6 And I often visit the days 5 When everything seemed so clear. 9 Now I wonder what I’m doing here at all… $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples We run a program that displayed the first, third, and last fields of every line: $ cat awk2--> look at the awk script. BEGIN { print “Start of file:”, FILENAME } { print $1 $3 $NF }--> print first, third and last fields. END { print “End of file” } $ awk -f awk2 float--> execute the script. Start of file: float Wishwassky, Myisstrong, Andoftendays Whenseemdedclear. Nowwonderall… End of file $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples In the next example, we run a program that displayed the first, third, and last fields of lines 2 and 3 of “float”: $ cat awk3--> look at the awk script. NR > 1 && NR < 4 { print NR, $1, $3, $NF } $ awk -f awk3 float--> execute the script. 2 My is strong, 3 And often days $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples A variable’s initial value is a null string or zero, depending on how you use it. In the next example, the program counts the number of lines and words in a file as it echoed the lines to standard output: $ cat awk4--> look at the awk script. BEGIN { print “Scanning file” } { printf “line %d: %s\n”, NR, $0; lineCount++; wordCount += NF; } END {printf “lines = %d, words=%d\n”, lineCount, wordCount}

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples $ awk -f awk4 float --> exeute the script. Scanning file line 1 : Wish I was floating in blue across the sky, line 2 : My imagination is strong, line 3 : And I often visit the days line 4 : When everything seemed so clear. line 5 : Now I wonder what I’m doing here at all… lines = 5, words = 33 $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples In the following example, we print the fields in each line in reverse order: $ cat awk5--> look at the awk script. { for ( i=NF; i>=1; i-- ) printf “%s ”, $i; printf “\n”; } $ awk -f awk5 float--> execute the script. sky, the across blue in floating was I wish strong, is imagination My days the visit often I And clear, so seemed everything When all… at here doing I’m what wonder I Now $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples In the next example, we display all of the lines that contained a t followed by an e, with any number of characters in between. $ cat awk6--> look at the script. /t.*e/ { print $0 } $ awk -f awk6 float--> execute the script. Wish I was floating in blue across the sky, And I often visit the days When everything seemed so clear. Now I wonder what I’m doing here at all… $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples A condition may be two expressions separated by a comma. In this case, awk performs action on every line from the first line that matches the first condition to the next line that satisfies the second condition: $ cat awk7--> look at the awk script. /strong/, /clear/ { print $0 } $ awk -f awk7 float--> execute the script. My imagination is strong, --> first line of the range And I often visit the days When everything seemed so clear. --> last line of the range $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples In the next example, we process a file whose fields are separated by colons: $ cat awk3--> look at the awk script. NR > 1 && NR < 4 { print $1, $3, $NF } $ cat float2--> look at the input file. Wish:I:was:floating:in:blue:across:the:sky, My:imagination:is:strong, And:I:often:visit:the:days When:I:wonder:what:I’m:doing:here:at:all… Now:I:wonder:what:I’m:doing:here:at:all… $ awk -F: -f awk3 float3--> execute the script. My is strong, And often days $ _

Programmable Text Processing with awk Prof. Andrzej (AJ) Bieszczad Phone: awk: Examples Here’s an example of the use of some built-in functions: $ cat test--> look at the input file. 1.1 a 2.2 at 3.3 eat 4.4 beat $ cat awk8--> look at the awk script. { printf “$1 = %g ”, $1 printf “exp = %.2g “, exp($1); printf “log = %.2g “, log($1); printf “sqrt = %.2g “, sqrt($1); printf “int = %d “, int($1); printf “substr( %s,1,2) = %s \n”, $2, substr( $2,1,2); } $ awk -f awk8 test--> execute the script. $1=1.1 exp=3 log=0.095 sqrt=1 int =1 substr(a,1,2)=a $1=2.2 exp=9 log=0.79 sqrt=1.5 int=2 substr(at,1,2)=at $1=3.3 exp=27 log=1.2 sqrt=1.8 int=3 substr(eat,1,2)=ea $1=4.4 exp=81 log=1.5 sqrt=2.1 int=4 substr(beat,1,2)=be $ _

Programmable Text Processing with awk awk challenge