Download presentation
Presentation is loading. Please wait.
Published byBrook Hubbard Modified over 9 years ago
1
13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)
2
Introduction More awk programming –The awk programming model –Input to and output from pipes –System() –Formatted printing (printf, sprintf) –Forcing variable types Using sed and awk together
3
Palindrome Example Suppose we wanted to write an awk script which takes a number or string and tells the user whether it is a palindrome: $ palindrome.sh Enter a number: 1221 successful $ palindrome.sh Enter a number:1234 failure $
4
#!/bin/sh echo -n "Enter a number: " read a junk echo " $a " | awk ' { pal=$1 stat="successful" l=length(pal) loop=int(l/2) for(i=1;i<=loop;i++) { first=substr(pal,i,1) last=substr(pal,l-i+1,1) if(first!=last) stat="failure" } print stat }'
5
Breakdown of Palindrome Example #!/bin/sh echo -n "Enter a number: " read a junk echo " $a " | awk ' Print the text “Enter a number: “ to the command line. The -n option tells the shell not to put in a new line Read the number into the variable a. If user has added anything else on the command line by mistake, read this into the variable junk (which is not used) Echo the value of a and pipe it onto awk for use in the awk part of the script
6
{ pal=$1 stat="successful" l=length(pal) loop=int(l/2) for(i=1;i<=loop;i++) … Assign stat to be the string “successful” Find the length of pal using the length() function and assign to l Define a variable called “loop” to be the an integer of length (l) divided by 2. (I.e. a whole number, not a decimal.) pal is set to be the value of the first argument given to awk (which will be the value of a) Iterate from 1 through to the value of loop, incrementing by 1 each time
7
{ first=substr(pal,i,1) last=substr(pal,l-i+1,1) if(first!=last) stat="failure" } print stat }' Print the string in the variable “stat”. Stat will contain “successful” if first and last match with every iteration of the loop. If there is at least one mismatch during a loop, stat will contain “failure”. In this loop section, we are counting in from the front and back of the string and comparing each character pair in turn Use the substr() function to get a substring from pal, starting at position i which is 1 character long. Assign this to the variable “first”. Use the substr() function to get a substring from pal, starting at position which is the length minus i, +1, which is 1 character long. Assign this to the variable “last”. If the character in first and last are not the same, set the variable “stat” to contain the string “failure”.
8
Awk’s programming model Awk has a main input loop –It reads one line of input from a file and makes it available for processing –It is executed as many times as there are lines of input –It does not execute until there is a line of input –It terminates when there are no more lines of input c.f. other programming languages which require the programmer to create the main input loop, open the file(s) and read one line at a time…
9
Awk’s programming model - BEGIN and END With awk, the whole programming loop is executed for each line of input Each statement within the loop is executed on each input line that matches it –(Each statement has a pattern to be matched and a corresponding action to be taken if a match is found) If you want to do some processing before or after the main programming loop, use BEGIN and END respectively
10
Awk’s programming model - next and getline Suppose you have the awk statement: –total = total + $newValue –… used to provide a total across a number of input lines –…and you wanted to read the remaining lines of input before moving on to the next awk statement you need to use either next or getline: while ((getline newValue 0) { total = total + $newValue } print total = total + $newValue next
11
next and getline The next command is used to read another line of input from a data stream and passes control back to the top of the script The getline function is similar but: –Can also be used to read from files and pipes –… does not pass control back to the top of the script getline returns one of three possible values: –1if able to read a line –0if end-of-file encountered –-1if an error encountered
12
A note about getline getline is a statement (not a function) although it returns a value, if you put brackets after it, e.g.: getline() You will get an error!
13
Reading input from a file and assigning variables Use the < redirection operator: –getline < "myFile" while ((getline newValue 0) { … BEGIN {printf "Enter a name: " getline < " - " print } Here, the input record is assigned to the variable “newValue” In this example, the user is prompted to enter their name. This is assigned to $0 and the print statement outputs the value of $0 by default
14
Reading input from a pipe The UNIX “who am i” command will give the following type of output: This output can be piped to getline: –"who am i" | getline Here, $0 will be set to the output of the command, the line will be parsed into fields such that “zlizmj” will be put in field $1, “pts/32” will be put into $2, etc. The system variable NF will be set $ who am i zlizmjj pts/32 Apr 20 12:25 (10.20.1.40)
15
Reading input from a pipe and assigning variables awk ′ BEGIN { "who am i " | getline name = $1 FS = " : " } name ~ $1 {print $5} ′ /etc/passwd This script pipes the result of the “who am i” command to getline which parses it into fields. The variable “name” is assigned to field number 1 and the File Separator is assigned to “:” The script then tests to see whether the first field ($1) in /etc/passwd is the same as that stored in name (the fields in /etc/passwd are separated by a “:”) If so, the 5 th field of /etc/passwd is printed (which contains the corresponding user’s full name)
16
Reading input from a pipe and assigning variables (2) The UNIX command whoami returns only the user’s login name: $ whoami zlizmj " whoami " | getline name print name In this example, the output of “whoami” is assigned to the variable “name”
17
Some Important Limitations There is a limit to the number of pipes and files that the system can have open at any one time –This limit varies from system to system –Traditionally 20 open files in BSD UNIX Use the close() function! Some other limits are: –Number of fields per record 100 –Characters per input record 2048 (set in size.h) –See the awk manual page for more information
18
Using close() with Pipes and Files Why use close()? –So your program can open as many pipes and files as it needs without exceeding the system limit –It allows your program to run the same command twice –You may need close() to force an output pipe to finish its work { do something | " sort > myFile " } END { close( " sort > myFile " ) while ((getline 0) { do more stuff }
19
Directing Output to a File or Pipe Use print Use a shell script print $0 | sort | uniq print > " myFile " awk ‘ { do something print $0 }’ $* | sort | uniq
20
Formatted Printing - printf One of awk’s most important purposes is to produce formatted reports We can use printf for this Suppose we wanted the following output from awk: ModuleStudentsConvener G51UST15Mauro Jaskelioff G51CSA17Liyang Hu G51PRG39Paul Dempster
21
Formatted Printing - printf (2) printf uses format specifiers: Use format specifiers with a % symbol: printf( " %s\t%s\t%s\n ", " Module ", " Students ", " Convener " ) BEGIN { for(i=1;i<=numModules;i++) { printf( " %s\t%d\t%s\n ", $module[i], $students[i], $convener[i]) } cascii character ddecimal integer efloating point sstring NOTE: \t inserts a tab character, \n inserts a new line
22
sprintf Like printf, but sprintf returns a string that can be assigned to a variable while ((getline 0) { myString = sprintf( " %s:%s:%s ", $1, $2, $3) … } This example repeatedly gets a line from “inputFile” and prints the first, second and third fields as colon separated strings to myString
23
sprintf (2) Like printf, but sprintf returns a string that can be assigned to a variable for(i=$startOfAscii; i<=$endofAscii; i++) { letter = sprintf( " %c ",i) … } This example converts numbers into ASCII characters
24
Built in Arithmetic Functions awk has a number of arithmetic functions that are built in. Some are shown below: exp(x) Returns e to the power x int(x) Returns a truncated value of x sqrt(x) Returns the square root of x cos(x) Returns the cosine of x
25
Built in String Functions split(str,arr,fs) Splits the string into elements of array arr, using field separator, fs substr(str,pos,len) Returns substring of string str at beginning position pos up to a maximum length, len. If len is not specified then the string from p to the end is used length(str) Returns the length of the string str, or the length of $0 if no string specified
26
Built in String Functions (2) index(str,substr) Returns the position of substring substr in string str or 0 if it is not present gsub(regex,s,str) Globally substitutes s for each match of the regular expression regex in the string str. Returns the number of substitutions. If a string str is not supplied, it will use $0
27
Built in String Functions - match() match() is used to test whether a regular expression matches a specified string match("in UST you learn about shell", /[A-Z]+/) –match() takes two arguments, the string to be examined, THEN the regular expression (note the change of order) –match() sets two system variables: RSTART - the starting position of the substring –This is the value also returned by match() RLENGTH - the length of the string in characters If no match found, RSTART is set to 0 and RLENGTH is set to -1
28
System Variables that are Arrays There are two system variables that are arrays: 1.ARGV –An array containing the command line arguments given to awk. –The number of elements is stored in another variable called ARGC (not an array) –The array is indexed from 0 (unlike other arrays in awk) –The last element is therefore ARGC-1 –E.g. ARGV[ARGC-1], ARGV[2] –The first element is the name of the command that invoked the script
29
System Variables that are Arrays (2) 2.ENVIRON –An array containing environment variables –Each element is the value of the current environment –The index of each element is the name of the environment variable –E.g. ENVIRON["PATH"], ENVIRON["SHELL"]
30
ARGV Example BEGIN { for (x=0; x<ARGC; x++) print ARGV[x] print ARGC } $ awk -f parameters.awk 2007 G51UST " Mauro Jaskelioff " students=80 - awk 2007 G51UST Mauro Jaskelioff Students=15 - 6
31
The system() Function The system() function allows a programmer to execute a command whilst within an awk script. The awk script waits for the command to finish before continuing execution The output of the command is NOT available for processing from within awk The system() function returns an exit status which can be tested by the awk script
32
An example using system() BEGIN { if (system( " mkdir UST " ) == 0) { if (system( " cd UST " ) != 0) print " change directory - failed " } else print " make directory - failed " } This example tries to create a new directory called UST. If successful, the code tries to change directory to UST. If not, an error is printed.
33
An example using system() $ awk -f create.awk $ ls UST $ awk -f create.awk mkdir: UST: File exists make directory - failed Here, the script (called create.awk) is run and is successful. “ls UST” doesn’t return anything because UST is empty. Here, the script is run for a second time and so the mkdir command fails because UST already exists. The first error is given by the mkdir command, the second error is given by the awk script
34
Use of Backslash Backslash can be used: –To continue strings across new lines $ awk ‘BEGIN {print " hello, \ > world " }’ hello, world
35
Use of Backslash (2) –For escape sequences \b - backspace \n - new line \r - carriage return \t - horizontal tab \v - vertical tab \c - any literal character: $ awk 'BEGIN {print "80\% \"topsy turvy\", 20\% strange" }' 80% ″ topsy turvy ″, 20% strange
36
Forcing Variable Types In awk, you do not declare variables and given them types Sometimes you want to force awk to treat a variable as a particular type, e.g. as a number or as a string. –To force a variable, x, to be treated as a number, put in the line: x=x+0 –To force a variable, x, to be treated as a string, put in the line: x=x ""
37
Using sed and awk Together - An Example In this example, sed is used to remove empty lines and lines containing quotes before passing the data onto awk: #!/bin/sh /bin/sed -e ′ /^$/d ′ -e ′ /^#.*/d ′ | awk
38
Summary More advanced awk awk’s programming model Next and getline Input/output to/from files and pipes Formatted printing Built in functions ARGV and ARGC Forcing variable types
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.