1 awk awk is a file-processing programming language. Makes it easy to perform text manipulation tasks. Is used in –Generating reports –Matching patterns.

Slides:



Advertisements
Similar presentations
Macro simple idea of textual substitution useful when you need a group of instructions or directives frequently.
Advertisements

Introduction to C Programming
 2000 Prentice Hall, Inc. All rights reserved. Chapter 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line.
Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Linux+ Guide to Linux Certification, Second Edition
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Introduction to C Programming
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
The UNIX Shell. The Shell Program that constantly runs at terminal after a user has logged in. Prompts the user and waits for user input. Interprets command.
Chap 3 – PHP Quick Start COMP RL Professor Mattos.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Introduction to Bash Programming Ellen Zhang. Previous three classes What have we learnt so far ?
Linux+ Guide to Linux Certification, Third Edition
Linux+ Guide to Linux Certification Chapter Eight Working with the BASH Shell.
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 2 Chapter 2 - Introduction to C Programming.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)
Chapter 10: BASH Shell Scripting Fun with fi. In this chapter … Control structures File descriptors Variables.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
CSCI/CMPE 4341 Topic: Programming in Python Review: Exam I Xiang Lian The University of Texas – Pan American Edinburg, TX 78539
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (3) Ruibin Bai (Room AB326) Division of Computer Science The University.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 2 - Introduction to C Programming Outline.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
I NTRODUCTION TO PYTHON - GETTING STARTED ( CONT )
Linux+ Guide to Linux Certification, Second Edition
 2007 Pearson Education, Inc. All rights reserved. A Simple C Program 1 /* ************************************************* *** Program: hello_world.
CSCI 330 UNIX and Network Programming Unit IX: awk II.
By Dr P.Padmanabham Professor (CSE)&Director Bharat Institute of Engineering &Technology Hyderabad Mobile
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Linux Administration Working with the BASH Shell.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
1 Lecture 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line of Text 2.3Another Simple C Program: Adding.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
CSC 4630 Meeting 7 February 7, 2007.
Chapter 2 - Introduction to C Programming
Lecture 9 Shell Programming – Command substitution
Scripts & Functions Scripts and functions are contained in .m-files
JavaScript: Functions.
Chapter 2 - Introduction to C Programming
Programmazione I a.a. 2017/2018.
What is Bash Shell Scripting?
John Carelli, Instructor Kutztown University
Chapter 2 - Introduction to C Programming
Chapter 2 - Introduction to C Programming
Chapter 2 - Introduction to C Programming
Chapter 2 - Introduction to C Programming
Linux Shell Script Programming
Chapter 2 - Introduction to C Programming
Introduction to Bash Programming, part 3
Introduction to C Programming
Presentation transcript:

1 awk awk is a file-processing programming language. Makes it easy to perform text manipulation tasks. Is used in –Generating reports –Matching patterns –Validating data –Filtering data for transmission An awk program is a sequence of statements of the form –Pattern {action} –Scans the input lines, in order, one at a time. –Searches for the pattern and if pattern is found, the corresponding action is performed. –Each statement of awk program is executed for each line of input.

2 awk

3 awk programming model awk program consists of a main input loop (you don’t write the loop but the main program works as one). The main routine reads one line of input from a file and makes it available for processing. The main loop executes as many times as there are lines in the input. Preprocessing before the main loop and post processing after the loop are done with BEGIN and END. The routine is applied to each input line, one line at a time.

4 awk Two ways to present the program to awk. –Make the program the first argument on the command line – if the program is short. –awk ‘program ‘ [filename....] –Examples: %awk '/Smith/ {print}' people %awk '/Smith/ {print}' - –Put the program in a separate file and tell awk to use the program file on the input files. –Examples: awk -f awkprog file1 file2 Keywords and some important functions –BEGIN, END, FILENAME, FS, NF, NR, OFS, ORS, OFMT, RS –break, close, continue, exit, exp, for, getline, if, in, index, int, length –log, next, number, print, printf, split, sprintf, sqrt, string, string, substr, while Operators –Assignment, compound assignment, arithmetic, relational, logical and regular expression matching operators.

5 Some Regular Expression Metacharacters \ - escapes any meta character that follows, including itself. ^ - anchors the following regular expression to the beginning of string. $ - anchors the following regular expression to the end of string.. (dot) Matches any character including newline […] – matches any one of the class characters enclosed between the brackets. [^] – A circumflex as first character inside [] reverses the match to all characters except those listed in the []. r1 | r2: between two regular expressions r1 and r2, it allows either of the regular expressions to be matched. r* - Matches any number (including zero) of the regular expression that precedes it. r+ - Matches one or more occurences of the regular expression that precedes it. r? - Matches 0 or 1 occurences of the regular expression that precedes it. () – groups regular expressions \{n,m\} – Matches a range of occurences of a single character that precedes it. Matches any number of occurences between n and m. May not be available in very old versions.

6 Writing Regular Expressions Writing regular expressions involves three steps: –Specification: Knowing what you want to match. –Coding: Writing an expression to describe what you want to match –Testing: Testing the pattern to see what it matches. –Testing your regular expression may result in, Hits: Lines you wanted to match Misses: Lines you did not want to match Omissions:Lines you wanted to match but did not. False Alarms: The lines you matched but did not want to match. –Eliminate false alarms by limiting the matches and capture the omissions by expanding the possible matches. –

7 Some Examples What do they match? [a-zA-Z?+!] - [a-zA-Z][?+!] - [-+*/] - AB\{2,4\}C - UNIX|LINUX - Compan(y|ies) - [0-9][0-9]*\.\{2,\}[a-z][a-z]* -

8 Multiline Records FS – default value is a single space. FS can be set to a single character. When more than one character is given it is interpreted as a regular expression. RS – default value is a newline. Default value can be changed. Example: BEGIN {RS = "" ; FS = "\n"} # Record separator is a blank line { print "Name ", $1 print "Zip ", $NF } Input file: John Smith 235 Alameda Santa Clara CA Output: Name John Smith Zip 95053

9 Examples cat prog1.awk # test for integer, string or a blank line. /[0-9]+/ {print $0 ": An integer"} /[A-Za-z]+/ { print $0 ": A String"} /^$/ {print "A Blank line"} # + metacharacter – one or more cat testfile 1234 This is a test 789 Hello %awk –f prog1.awk testfile 1234: An integer This is a test: A String 789 Hello: An integer 789 Hello: A String A Blank line

10 Examples %cat prog2.awk BEGIN {FS = ","} # Comma is the field separator { print $1 print $2 print $3 } % cat prog3.awk BEGIN {FS = ","} /CA/ {print $1 "," $3} # will match any field with CA $3 ~ /CA/ {print $1 "," $3} # field match %cat testfile2 John Smith, Santa Clara, CA Mary Jones, Red Bank, NJ Susan Wang, Denver, CO % awk –f prog2.awk testfile2 What is the output? More than one character can be specified as a field separator, it will be interpreted as a regular expression. Examples: FS = “\t+” How many fields are in the following line? IJK\t\tXYZ FS= “[‘:,\t\]

11 Examples $cat prog4.awk BEGIN {printf ("Scores\n "); } { print $0; total = total + $2} #NR – number of input records that are read END {print "Average score is ", total / NR } $cat scores Smith 80 Jones 97 Chan 95 King 78 $ awk -f prog4.awk scores Scores Smith 80 Jones 97 Chan 95 King 78 Average score is 87.5

12 Passing Parameters into awk script Parameters can be passed from the command line into an awk script. A variable(s) is set from the command line and can be accessed from the awk script. Parameters that are passed in, are not available in BEGIn, they are available to the script only after the first line of input is read. Example – param.awk BEGIN {print "Passing Parameters"} {print "arg1 = ", arg1 print "arg2 = ", arg2 } From the command line, invoke awk –f param.awk arg1=100 arg2=200 datafile A shell script’s command line arguments can be passed in as follows: Assume that the following line is in a shell script called awktest.sh awk –f param.awk “arg1=$1 arg2=$2” datafile $1 and $2 are the positional parameters given as arguments on command line when awktest.sh is invoked as awktest.sh

13 Patterns Using Regular Expressions # print lines ending with ia awk ‘ia$/ {print}’ countries - #print countries ending with ia Awk ‘$1 ~ /ia$/ {print $1 }’ countries #select lines where the third field #matches Asia or begins with North #or South $3 ~ /Asia |^North | ^South/{print} #Pattern Ranges /Russia/,/Brazil/ {print} #Replace USA by United States /USA/ {$1 = "United States";print} %cat countries Australia 3000 Australia USA 3615 North America Argentina 1072 South America India 1270 Asia Russia 8650 Asia China 3692 Asia Brazil 3286 South America

14 Associative Arrays Arrays in awk are associative arrays where the index can be a number or a string. The order in which the items are retrieved may be random. %cat prog6.awk { x [$1] = $2 } END { for (item in x) print item,x[item] } %awk –f prog6.awk scores Jones 89 Smith 65 Chen 100 King 120 Lowel 200

15 Example: Computing Grades Cat prog7.awk BEGIN { OFS = "\t" } { # main loop applied to all input lines total = 0 for (I = 2; I <= NF; ++I) total += $I; average = total / (NF -1) # store each student average stAvg[NR] = average avgByName[$1] = average #determine the letter grade if (average >= 90) grade = "A" else if (average >= 80) grade = "B" else if (average >= 70) grade = "C" else grade = "F“ # store a count of the letter grades ++classGrade[grade] }

16 #class statistics END{ #calculate class average for (x = 1; x <= NR; x++) classTotal += stAvg[x] classAve = classTotal / NR print "Class Average = " classAve #determine how many above or below average #print number of students per letter grade print "Enter name " getline name < "-" print name ": " avgByName[name] for (letterGrade in classGrade) print letterGrade ":" classGrade[letterGrade] | "sort" }

17 %cat grades Smith Jones Wang Wolf Pratt %awk -f prog7.awk grades Smith C Jones 30 F Wang 79 C Wolf B Pratt 90 A Class Average = 71.8 Enter name Smith Smith: A:1 B:1 C:2 F:1

18 Multidimensional arrays #awk offers a syntax for subscripts that simulate a reference to multidimensional arrays { for (i = 1; i <= NF; ++i) table[NR,i] = $i } END{ for (k = 1; k <= NR ; ++k){ for (i = 1; i <= 4; ++i){ total += table[k,i] printf("%d ", table[k,i]) } printf("\n") } {print "Total = " total} }

19 next and getline Next causes the next input line to be read. Next statement passes control back to the top of the script. %cat prog9.awk NF == 2 {next} # skips to the next record and starts the program from the # beginning /USA/ {$4 = "United States Of America"; print $0} {print NR } %cat countries Japan Asia 2: UK Europe 3: Brazil S.America Egypt Africa 5: USA N.America Canada N.America % awk –f prog9.awk countries 2 3 5: USA N.America United States Of America 5

20 Using getline #Using getline function to read the next line of input /^\/+/ { getline print $1 } #get input from command line BEGIN{ printf "Enter your name: " getline name < "-" print name } /Smith/ { getline print $1 }

21 #Reading from a pipe using a getline {while ("who" | getline) terminal[$1] = $2 } END{ for (item in terminal) print item, terminal[item] }

22 Example - An word lookup # reads a file with acronyms and their expansions, #handles users queries BEGIN { FS = “\t”; OFS = “\t” printf (“Enter a word for lookup: “); } #Load the file named acronyms FILENAME == “acronyms” { wordList[$1] = $2 next }

23 Example - An word lookup (cont) #scan for command to exit program $0 ~ /^(quit|qQ|[Xx]|exit|)$/ { exit } #process any non-empty line $0 != “” { if ( $0 in wordList) { print wordList[$0]} else print $0 “ not found” } #Prompt user to enter another word { printf (“Enter another word or q|Q to quit”); } acronyms -

24 split () Split () is a built-in function that can parse any string into elements of an array. Syntax: No Of elements = split (string,array,separator). If no separator is specified, FS is used as the field separator. n = split($0,days) {for (j = 1; j <= n; ++j) print days[j] }

25 next The next statement forces awk to immediately stop processing the current record and go on to the next record. The rest of the current rule's action is not executed either. If you think of the main body in awk is a loop, the next statement is analogous to a continue statement: it skips to the end of the body of this implicit loop, and executes the increment (which reads another record). Note: getline function causes awk to read the next record immediately, but it does not alter the flow of control in any way. So the rest of the current action executes with a new input record. For example, if your awk program works only on records with four fields, and you don't want it to fail when given bad input, you might use this rule near the beginning of the program:

26 Example: FILENAME == "names.txt" { count += 1; next } {print $0 } END{ print count } #Counts each line in the file, “names.txt”.

27 %cat prog9.awk NF == 2 {next} # skips to the next record and starts the program from the # beginning /USA/ {$4 = "United States Of America"; print $0} {print NR } %cat countries Japan Asia 2: UK Europe 3: Brazil S.America Egypt Africa 5: USA N.America Canada N.America % awk –f prog9.awk countries 2 3 5: USA N.America United States Of America 5

28 getline getline is used to read the next line of input input from the current input file, from a specified file and a pipe. The getline command can be used without arguments to read input from the current input file. Reads the next input record and split it up into fields. This is useful if you've finished processing the current record, but you want to continue processing from the next record. Note: the new value of $0 is used in testing the patterns of any subsequent rules. The original value of $0 that triggered the rule which executed getline is lost.

29 Example: /^[0-9]+/ {print "Line number ", NR, ":", "starts with a number" } /^\/\*/ { getline } {print NR “:” $0 } Input: This is a cat 1234 a cat A test /* A comment line */ 990 is the score Output: 1:This is a cat Line number 2 : starts with a number 2:1234 a cat 3:A test 5:990 is the score

30 getline Using getline to read a line into a variable You can use `getline variable' to read the next record from awk 's input into the variable variable. No other processing is done. For example, suppose the next line is a comment, or a special string, and you want to read it, without triggering any rules. This form of getline allows you to read that line and store it in a variable so that the main read-a-line-and-check- each-rule loop of awk never sees it. The getline command used in this way sets only the variables NR and FNR. The record is not split into fields, so the values of the fields (including $0 ) and the value of NF do not change.

31 What is the output of the following program on input file given below: /^[A-Za-z]/ { getline tmp print tmp } {print $0 } Inputfile: ABCD 1234 EFGH 5678

32 getline Using getline to read the next record from the file file. Here file is a string-valued expression that specifies the file name. `< file' is called a redirection since it directs input to come from a different place. For example, the following program reads its input record from the file `input.dat when it encounters a first field with a value equal to 10 in the current input file. awk '{ if ($1 == 10) { getline < "input.dat" print } else print }'. Since the main input stream is not used, the values of NR and FNR are not changed. But the record read is split into fields in the normal manner, so the values of $0 and other fields are changed. So is the value of NF.

33 Using getline to read the output of a command from a pipe: You can pipe the output of a command into getline, using `command | getline'. In this case, the string command is run as a shell command and its output is piped into awk to be used as input. This form of getline reads one record at a time from the pipe. For example, the following program copies its input to its output, except for lines that begin with which are replaced by the output produced by running the rest of the line as a shell command: awk ‘{ if ($1 == { tmp = substr($0, 10) while ((tmp | getline) > 0) print close(tmp) } else print }' input The close function is called to ensure that if two identical lines appear in the input, the command is run for each one.

34 Close() Close () allows you to close open files and pipes. –There may be a limitation on the number of files and pipes that can be open at the same time. –Closing a pipe allows you to run the same command twice. –Example: Close (“who”)

35 What is the output for the given input file Jsmith who TWolf

36 Using getline to read the output of a command from pipe into a variable: When you use `command | getline var', the output of the command command is sent through a pipe to getline and into the variable var. Example: awk 'BEGIN { "date" | getline current_time close("date") print "Report printed on " current_time }' In this version of getline, none of the built- in variables are changed, and the record is not split into fields.

37 Using system() System() function executes a command supplied as an expression. The output generated from executing system() is not available within the program for processing. System() returns the exit status of the program that was executed. Example: #!/bin/awk -f BEGIN{ status = system ("mkdir temp") if (status != 0) print "command failed" }

38 User-defined functions A Function definition can be anywhere that a pattern-action rule can be. Input to the function are passed as a list of parameters. Example: # inserts a string, insertStr after position in aString function insertString(aString, position, insertStr){ before = substr(aString, 1,position) after = substr(aString,position +1) return before insertStr after } { print insertString($1,5,"BBBB") }#No spaces are allowed between the function name and the left parenthesis.

39 All the variables in the parameter list are considered local to the function. All variables defined in the body of the function are treated as global variables. Therefore any temporary variables that are declared are put at the end of the parameter list. Example: function insertString(aString, position, insertStr,after){ before = substr(aString, 1,position) after = substr(aString,position +1) return before insertStr after } { print insertString($1,5,"BBBB") } { print aString } { print "before: " before} { print "after: "after }

40 cat testFile HelloWorld This is a test XYZ awk –f fun2.awk testFile HelloBBBBWorld before: Hello after: ThisBBBB before: This after: XYZ12BBBB before: XYZ12 after:

41 Functions Arrays are passed by reference #!/bin/awk -f function moveSmallest(LIST,SIZE, temp,small,smal small = LIST[1] for (i = 2; i <= SIZE; ++i){ if (LIST[i] < small){ small = LIST[i] smallIndex = i; } LIST[smallIndex] = LIST[1] LIST[1] = small return } END{ array[1] = 12; array[2] = 0; array[3] = -1; array[4] = 100; moveSmallest(array,4) for(i = 1; i <= 4;++i){ print array[i] }

42 Some built-in Functions Arithmetic Functions cos, exp,int,log,sin,sqrt,atan2,rand,srand Some useful String Functions index, length, split, sub,substr,tolower,loupper gsub(regExp,replaceWithString,inString) – globally substitutes replaceWithString for regExp in inString. match (string, regExp) – returns the position of where the regExp is found in string or 0 if no occurences are found.

43 Passing parameters into a script Input is passed into an awk script by setting variables on the command line. Example: –awk –f awkprog x=1 y=2 inputfile –The variables x and y can be accessed in the main loop (not in the BEGIN section). –The system variables ARGC and ARGV can be used to access the command line arguments Example: BEGIN { print "BEGIN: " n } NR == 1 { print ARGC; print n for (i = 0; i < ARGC; ++i){ print ARGV[i]} } % awk -f param.awk n=20 testfile BEGIN: 3 20 awk n testfile

44 An array of Environment variables #!/bin/awk -f BEGIN{ for (env in ENVIRON){ print env "=" ENVIRON[env] } print “Logname = “,ENVIRON[“LOGNAME”] }