13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

CIS 240 Introduction to UNIX Instructor: Sue Sampson.
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Shell Basics CS465 - Unix. Shell Basics Shells provide: –Command interpretation –Multiple commands on a single line –Expansion of wildcard filenames –Redirection.
More Shell Basics CS465 - Unix. Unix shells User’s default shell - specified in /etc/passwd file To show which shell you are currently using: $ echo $SHELL.
Guide To UNIX Using Linux Third Edition
Guide To UNIX Using Linux Third Edition
Bash Shell Scripting 10 Second Guide Common environment variables PATH - Sets the search path for any executable command. Similar to the PATH variable.
Shell Script Examples.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Advanced File Processing
Advanced Shell Programming. 2 Objectives Use techniques to ensure a script is employing the correct shell Set the default shell Configure Bash login and.
1 Operating Systems Lecture 3 Shell Scripts. 2 Shell Programming 1.Shell scripts must be marked as executable: chmod a+x myScript 2. Use # to start a.
1 Operating Systems Lecture 3 Shell Scripts. 2 Brief review of unix1.txt n Glob Construct (metacharacters) and other special characters F ?, *, [] F Ex.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
An Introduction to Unix Shell Scripting
The UNIX Shell. The Shell Program that constantly runs at terminal after a user has logged in. Prompts the user and waits for user input. Interprets command.
Shell Scripting Todd Kelley CST8207 – Todd Kelley1.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Introduction to Linux OS (IV) AUBG ICoSCIS Team Prof. Volin Karagiozov March, 09 – 10, 2013 SWU, Blagoevgrad.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Linux+ Guide to Linux Certification, Third Edition
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
1 Operating Systems Lecture 2 UNIX and Shell Scripts.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Intro Python: Variables, Indexing, Numbers, Strings.
Chapter 3: Formatted Input/Output Copyright © 2008 W. W. Norton & Company. All rights reserved. 1 Chapter 3 Formatted Input/Output.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
LIN Unix Lecture 7 Hana Filip. LIN Text Processing Command Line Utility Programs (cont.) sed LAST WEEK wc sort tr uniq awk TODAY join paste.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CSCI 330 UNIX and Network Programming Unit IX: Shell Scripts.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Agenda Positional Parameters / Continued... Command Substitution Bourne Shell / Bash Shell / Korn Shell Mathematical Expressions Bourne Shell / Bash Shell.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (3) Ruibin Bai (Room AB326) Division of Computer Science The University.
GAME203 – C Files stdio.h C standard Input/Output “getchar()”
Sed. Class Issues vSphere Issues – root only until lab 3.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
By Dr P.Padmanabham Professor (CSE)&Director Bharat Institute of Engineering &Technology Hyderabad Mobile
Chapter 3: Formatted Input/Output 1 Chapter 3 Formatted Input/Output.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
Lesson 5-Exploring Utilities
CSC 4630 Meeting 7 February 7, 2007.
Engineering Innovation Center
John Carelli, Instructor Kutztown University
Introduction to C++ Programming
Guide To UNIX Using Linux Third Edition
Linux Shell Script Programming
Introduction to Computer Science
Class code for pythonroom.com cchsp2cs
Review.
Presentation transcript:

13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)

Introduction More awk programming –The awk programming model –Input to and output from pipes –System() –Formatted printing (printf, sprintf) –Forcing variable types Using sed and awk together

Palindrome Example Suppose we wanted to write an awk script which takes a number or string and tells the user whether it is a palindrome: $ palindrome.sh Enter a number: 1221 successful $ palindrome.sh Enter a number:1234 failure $

#!/bin/sh echo -n "Enter a number: " read a junk echo " $a " | awk ' { pal=$1 stat="successful" l=length(pal) loop=int(l/2) for(i=1;i<=loop;i++) { first=substr(pal,i,1) last=substr(pal,l-i+1,1) if(first!=last) stat="failure" } print stat }'

Breakdown of Palindrome Example #!/bin/sh echo -n "Enter a number: " read a junk echo " $a " | awk ' Print the text “Enter a number: “ to the command line. The -n option tells the shell not to put in a new line Read the number into the variable a. If user has added anything else on the command line by mistake, read this into the variable junk (which is not used) Echo the value of a and pipe it onto awk for use in the awk part of the script

{ pal=$1 stat="successful" l=length(pal) loop=int(l/2) for(i=1;i<=loop;i++) … Assign stat to be the string “successful” Find the length of pal using the length() function and assign to l Define a variable called “loop” to be the an integer of length (l) divided by 2. (I.e. a whole number, not a decimal.) pal is set to be the value of the first argument given to awk (which will be the value of a) Iterate from 1 through to the value of loop, incrementing by 1 each time

{ first=substr(pal,i,1) last=substr(pal,l-i+1,1) if(first!=last) stat="failure" } print stat }' Print the string in the variable “stat”. Stat will contain “successful” if first and last match with every iteration of the loop. If there is at least one mismatch during a loop, stat will contain “failure”. In this loop section, we are counting in from the front and back of the string and comparing each character pair in turn Use the substr() function to get a substring from pal, starting at position i which is 1 character long. Assign this to the variable “first”. Use the substr() function to get a substring from pal, starting at position which is the length minus i, +1, which is 1 character long. Assign this to the variable “last”. If the character in first and last are not the same, set the variable “stat” to contain the string “failure”.

Awk’s programming model Awk has a main input loop –It reads one line of input from a file and makes it available for processing –It is executed as many times as there are lines of input –It does not execute until there is a line of input –It terminates when there are no more lines of input c.f. other programming languages which require the programmer to create the main input loop, open the file(s) and read one line at a time…

Awk’s programming model - BEGIN and END With awk, the whole programming loop is executed for each line of input Each statement within the loop is executed on each input line that matches it –(Each statement has a pattern to be matched and a corresponding action to be taken if a match is found) If you want to do some processing before or after the main programming loop, use BEGIN and END respectively

Awk’s programming model - next and getline Suppose you have the awk statement: –total = total + $newValue –… used to provide a total across a number of input lines –…and you wanted to read the remaining lines of input before moving on to the next awk statement you need to use either next or getline: while ((getline newValue 0) { total = total + $newValue } print total = total + $newValue next

next and getline The next command is used to read another line of input from a data stream and passes control back to the top of the script The getline function is similar but: –Can also be used to read from files and pipes –… does not pass control back to the top of the script getline returns one of three possible values: –1if able to read a line –0if end-of-file encountered –-1if an error encountered

A note about getline getline is a statement (not a function) although it returns a value, if you put brackets after it, e.g.: getline() You will get an error!

Reading input from a file and assigning variables Use the < redirection operator: –getline < "myFile" while ((getline newValue 0) { … BEGIN {printf "Enter a name: " getline < " - " print } Here, the input record is assigned to the variable “newValue” In this example, the user is prompted to enter their name. This is assigned to $0 and the print statement outputs the value of $0 by default

Reading input from a pipe The UNIX “who am i” command will give the following type of output: This output can be piped to getline: –"who am i" | getline Here, $0 will be set to the output of the command, the line will be parsed into fields such that “zlizmj” will be put in field $1, “pts/32” will be put into $2, etc. The system variable NF will be set $ who am i zlizmjj pts/32 Apr 20 12:25 ( )

Reading input from a pipe and assigning variables awk ′ BEGIN { "who am i " | getline name = $1 FS = " : " } name ~ $1 {print $5} ′ /etc/passwd This script pipes the result of the “who am i” command to getline which parses it into fields. The variable “name” is assigned to field number 1 and the File Separator is assigned to “:” The script then tests to see whether the first field ($1) in /etc/passwd is the same as that stored in name (the fields in /etc/passwd are separated by a “:”) If so, the 5 th field of /etc/passwd is printed (which contains the corresponding user’s full name)

Reading input from a pipe and assigning variables (2) The UNIX command whoami returns only the user’s login name: $ whoami zlizmj " whoami " | getline name print name In this example, the output of “whoami” is assigned to the variable “name”

Some Important Limitations There is a limit to the number of pipes and files that the system can have open at any one time –This limit varies from system to system –Traditionally 20 open files in BSD UNIX Use the close() function! Some other limits are: –Number of fields per record 100 –Characters per input record 2048 (set in size.h) –See the awk manual page for more information

Using close() with Pipes and Files Why use close()? –So your program can open as many pipes and files as it needs without exceeding the system limit –It allows your program to run the same command twice –You may need close() to force an output pipe to finish its work { do something | " sort > myFile " } END { close( " sort > myFile " ) while ((getline 0) { do more stuff }

Directing Output to a File or Pipe Use print Use a shell script print $0 | sort | uniq print > " myFile " awk ‘ { do something print $0 }’ $* | sort | uniq

Formatted Printing - printf One of awk’s most important purposes is to produce formatted reports We can use printf for this Suppose we wanted the following output from awk: ModuleStudentsConvener G51UST15Mauro Jaskelioff G51CSA17Liyang Hu G51PRG39Paul Dempster

Formatted Printing - printf (2) printf uses format specifiers: Use format specifiers with a % symbol: printf( " %s\t%s\t%s\n ", " Module ", " Students ", " Convener " ) BEGIN { for(i=1;i<=numModules;i++) { printf( " %s\t%d\t%s\n ", $module[i], $students[i], $convener[i]) } cascii character ddecimal integer efloating point sstring  NOTE: \t inserts a tab character, \n inserts a new line

sprintf Like printf, but sprintf returns a string that can be assigned to a variable while ((getline 0) { myString = sprintf( " %s:%s:%s ", $1, $2, $3) … } This example repeatedly gets a line from “inputFile” and prints the first, second and third fields as colon separated strings to myString

sprintf (2) Like printf, but sprintf returns a string that can be assigned to a variable for(i=$startOfAscii; i<=$endofAscii; i++) { letter = sprintf( " %c ",i) … } This example converts numbers into ASCII characters

Built in Arithmetic Functions awk has a number of arithmetic functions that are built in. Some are shown below: exp(x) Returns e to the power x int(x) Returns a truncated value of x sqrt(x) Returns the square root of x cos(x) Returns the cosine of x

Built in String Functions split(str,arr,fs) Splits the string into elements of array arr, using field separator, fs substr(str,pos,len) Returns substring of string str at beginning position pos up to a maximum length, len. If len is not specified then the string from p to the end is used length(str) Returns the length of the string str, or the length of $0 if no string specified

Built in String Functions (2) index(str,substr) Returns the position of substring substr in string str or 0 if it is not present gsub(regex,s,str) Globally substitutes s for each match of the regular expression regex in the string str. Returns the number of substitutions. If a string str is not supplied, it will use $0

Built in String Functions - match() match() is used to test whether a regular expression matches a specified string match("in UST you learn about shell", /[A-Z]+/) –match() takes two arguments, the string to be examined, THEN the regular expression (note the change of order) –match() sets two system variables: RSTART - the starting position of the substring –This is the value also returned by match() RLENGTH - the length of the string in characters If no match found, RSTART is set to 0 and RLENGTH is set to -1

System Variables that are Arrays There are two system variables that are arrays: 1.ARGV –An array containing the command line arguments given to awk. –The number of elements is stored in another variable called ARGC (not an array) –The array is indexed from 0 (unlike other arrays in awk) –The last element is therefore ARGC-1 –E.g. ARGV[ARGC-1], ARGV[2] –The first element is the name of the command that invoked the script

System Variables that are Arrays (2) 2.ENVIRON –An array containing environment variables –Each element is the value of the current environment –The index of each element is the name of the environment variable –E.g. ENVIRON["PATH"], ENVIRON["SHELL"]

ARGV Example BEGIN { for (x=0; x<ARGC; x++) print ARGV[x] print ARGC } $ awk -f parameters.awk 2007 G51UST " Mauro Jaskelioff " students=80 - awk 2007 G51UST Mauro Jaskelioff Students=15 - 6

The system() Function The system() function allows a programmer to execute a command whilst within an awk script. The awk script waits for the command to finish before continuing execution The output of the command is NOT available for processing from within awk The system() function returns an exit status which can be tested by the awk script

An example using system() BEGIN { if (system( " mkdir UST " ) == 0) { if (system( " cd UST " ) != 0) print " change directory - failed " } else print " make directory - failed " } This example tries to create a new directory called UST. If successful, the code tries to change directory to UST. If not, an error is printed.

An example using system() $ awk -f create.awk $ ls UST $ awk -f create.awk mkdir: UST: File exists make directory - failed Here, the script (called create.awk) is run and is successful. “ls UST” doesn’t return anything because UST is empty. Here, the script is run for a second time and so the mkdir command fails because UST already exists. The first error is given by the mkdir command, the second error is given by the awk script

Use of Backslash Backslash can be used: –To continue strings across new lines $ awk ‘BEGIN {print " hello, \ > world " }’ hello, world

Use of Backslash (2) –For escape sequences \b - backspace \n - new line \r - carriage return \t - horizontal tab \v - vertical tab \c - any literal character: $ awk 'BEGIN {print "80\% \"topsy turvy\", 20\% strange" }' 80% ″ topsy turvy ″, 20% strange

Forcing Variable Types In awk, you do not declare variables and given them types Sometimes you want to force awk to treat a variable as a particular type, e.g. as a number or as a string. –To force a variable, x, to be treated as a number, put in the line: x=x+0 –To force a variable, x, to be treated as a string, put in the line: x=x ""

Using sed and awk Together - An Example In this example, sed is used to remove empty lines and lines containing quotes before passing the data onto awk: #!/bin/sh /bin/sed -e ′ /^$/d ′ -e ′ /^#.*/d ′ | awk

Summary More advanced awk awk’s programming model Next and getline Input/output to/from files and pipes Formatted printing Built in functions ARGV and ARGC Forcing variable types