awk- An advanced Filter

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

Introduction to C Programming
Introduction to Unix – CS 21 Lecture 11. Lecture Overview Shell Programming Variable Discussion Command line parameters Arithmetic Discussion Control.
Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
CS 497C – Introduction to UNIX Lecture 33: - Shell Programming Chin-Chih Chang
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Advanced Shell Programming. 2 Objectives Use techniques to ensure a script is employing the correct shell Set the default shell Configure Bash login and.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
Unix programming Term: III B.Tech II semester Unit-II PPT Slides Text Books: (1)unix the ultimate guide by Sumitabha Das (2)Advanced programming.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Linux+ Guide to Linux Certification, Third Edition
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Sed, awk, & perl CS 2204 Class meeting 13 *Notes by Mir Farooq Ali and other members of the CS faculty at Virginia Tech. Copyright 2003.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
8-1 Compilers Compiler A program that translates a high-level language program into machine code High-level languages provide a richer set of instructions.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Linux+ Guide to Linux Certification, Second Edition
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
Lesson 5-Exploring Utilities
AWK.
CSC 4630 Meeting 7 February 7, 2007.
Lecture 14 Programming with awk II
Topics Designing a Program Input, Processing, and Output
COMP 170 – Introduction to Object Oriented Programming
Java Primer 1: Types, Classes and Operators
Shell Scripting March 1st, 2004 Class Meeting 7.
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Filters using regular expressions
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. It was created by Guido van Rossum during.
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
In the last class, sed to edit an input stream and understand its addressing mechanism Line addressing Using multiple instructions Context addressing Writing.
PHP.
Web DB Programming: PHP
Topics Designing a Program Input, Processing, and Output
HYPERTEXT PREPROCESSOR BY : UMA KAKKAR
Essential Shell Programming
Topics Designing a Program Input, Processing, and Output
Topics Designing a Program Input, Processing, and Output
awk- An Advanced Filter
Introduction to Bash Programming, part 3
CIS 136 Building Mobile Apps
Introduction to C Programming
Review.
Presentation transcript:

awk- An advanced Filter Course code: 10CS44 UNIT-7 awk- An advanced Filter Engineered for Tomorrow Prepared by : Thanu Kurian Department :Computer Science Date : 18.01.2014

awk- An Advanced Filter Engineered for Tomorrow awk- An Advanced Filter The awk filter combines features of several filters. Named after its authors , Aho, Weinberger and Kernighan, awk, is the most powerful utility for text manipulation. Simple awk Filtering: the syntax of awk command is : awk options ‘selection_criteria {action}’ file(s) The selection_criteria filters input and selects lines for the action component to act upon. the selection_criteria and action is surrounded by a set of single quotes.

$ awk ‘/director/ { print }’ emp.lst Engineered for Tomorrow A typically complete awk command specifies the selection criteria and action. The following command selects the directors from emp.lst: $ awk ‘/director/ { print }’ emp.lst 9876 | jai sharma |director |production |12/03/50 |7000 2365 |barun gupta |director|personnel |11/05/47 |7800 1006 |chanchal singhavi |director |sales |03/09/38 |6700

the following 3 forms are equivalent: Engineered for Tomorrow The selection_criteria (/director/) selects lines that are processed in the action section ({ print }). If selection_criteria is missing, then action applies to all lines. If action is missing, the entire line is printed. Either of the two (but not both) is optional, but they must be enclosed within a pair of single quotes. the print statement, when used without any field specifiers, prints the entire line. Also, print is the default action of awk. the following 3 forms are equivalent: awk ‘ /director/’ emp.lst //printing is the default action awk ‘/director/{ print }’ emp.lst //whitespce is permitted awk ‘ /director/ { print $0}’ emp.lst //$0 is the complete line

For pattern matching, awk uses regular expressions in sed-style: Engineered for Tomorrow For pattern matching, awk uses regular expressions in sed-style: $ awk -F “|” ‘/sa[kx]s*ena/’ emp.lst 3212 |shyam saksena |d.g.m. |accounts |12/12/55 |6000 2345 |j.b. saxena |g.m. |marketing |12/04/67 |8000 The regular expressions used by awk belong to the basic BRE and ERE used by grep –E or egrep. Splitting a Line into Fields: awk uses special parameter ,$0, to indicate the entire line. It also identifies fields by $1,$2, $3 . awk uses a contiguous sequence of spaces and tabs as a single delimiter. To print the name, designation, department and salary of all the sales people:

$ awk –F “|” ‘/sales/ { print $2,$3,$4,$6 }’ emp.lst Engineered for Tomorrow $ awk –F “|” ‘/sales/ { print $2,$3,$4,$6 }’ emp.lst a.k shukla g.m. sales 6000 chanchal singhavi diector sales 6700 s.n. dasgupta manager sales 5600 anil aggarwal manager sales 5000 Here, a comma (,) has been used to delimit the filed specifications. This ensures that each filed is separated from the other by a space. we can use awk with a line addressing to select lines. $ awk –F “|” ‘NR ==3, NR ==6 { print NR, $2,$3,$6 }’ empn.lst 3 n.k. gupta chairman 5400 4 v.k. agarwal g.m. 9000 5 j.b. saxena g.m. 8000 6 sumit chakrobarty d.g.m. 6000

printf: Formatting Output Engineered for Tomorrow printf: Formatting Output In the printf format, %s is used for string data and %d for numeric. $ awk -F”|” ‘/[aA]gg?[ar]+wal/ { >printf “%3d %-20s %-12s %d\n”, NR,$2,$3,$6 }’ empn.lst 4 v.k. agarwal g.m. 9000 9 sumit Agarwal execuitve 7500 15 anil aggarwal manager 5000 The name and designation are printed in spaces 20 and 12 characters wide ,respectively. The – symbol left-justifies the output.

Redirecting Standard Output: Engineered for Tomorrow Redirecting Standard Output: every print and printf can be separately redirected with the > and | symbols. printf “%s %-10s %-12s %-8s \n”, $1, $3, $4,$6 > “mslist” The following command sorts the output of the printf statement: printf “%s %-10s %-12s %-8s \n”, $1, $3, $4,$6 | “sort” the command or filename that follows the > and | symbols is enclosed within double quotes

Variables and Expressions Engineered for Tomorrow Variables and Expressions Expressions comprise strings,numbers,variables and entities that are built by combining them with operators eg: (x+5)*12 awk allows user defined variables but without declaring them;implicitly initialised to zero Variables are case sensitive;x is different from X Unlike shell variables, awk variables don’t use $ Strings in awk are always double quoted and can contain any character

Variables and Expressions Engineered for Tomorrow Variables and Expressions awk provides no operator for concatenating strings. Strings are concatenated by simply placing them side by side x=“sun”;y=“com” print x y //prints suncom print x”.”y //prints sun.com awk makes automatic conversions x=“5”;y=6;z=“A” print x y //y converted to string; prints 56 print x+y //x converted to number; prints 11 print y+z //z converted to numeric 0;prints 6

The Comparison Operators: Engineered for Tomorrow The Comparison Operators: $ awk -F”|” ‘$3 = = “director” || $3 = =“chairman” { > printf “%-20s %-12s %d\n”, $2, $3, $6 }’ empn.lst n.k. gupta chairman 5400 lalait choudhary director 8200 barun sengupta director 7800 jai sharma director 7000 chanchal singahvi director 6700 In the above example, the pattern is matched with a field. For negating the above condition, we should use the != and && operators: $3!= “director” && $3 ! = “chairman”

~ and !~ : The Regular Expression Operators Engineered for Tomorrow ~ and !~ : The Regular Expression Operators The operators ~ and !~ work only with field specifiers ($1,$2 etc.). For matching a pattern in a specific field, awk offers the ~ and ~! operators to match and negate a match, respectively. $2 ~ /[cC] ho[wu]dh?ury/ || $2 ~ /sa [xk]s?ena/ //matches 2nd field $2 ~ /[cC]ho[wu]dh?ury|sa[xk]s? ena/ //same as above $3 !~ /director|chairman/ //neither director nor chairman

The comparison and regular expression matching operators is as shown : Engineered for Tomorrow Number Comparison: awk can also handle numbers –both integer and floating type-and make relational tests on them. The comparison and regular expression matching operators is as shown :

> printf “%-20s %-12s %d \n”, $2,$3,$6 }’ empn.lst Engineered for Tomorrow For example, to print pay-slips for people whose basic pay exceeds 7500: $ awk –F”|” ‘$6 > 7500 { > printf “%-20s %-12s %d \n”, $2,$3,$6 }’ empn.lst v.k. agarwal g.m. 9000 j.b. saxena g.m. 8000 lalit choudhary director 8200 barun sengupta director 7800 To match the people born in 1945 or drawing a basic pay greater than 8000: $ awk –F”|” ‘$6 > 8000 || $5 ~/45$/’ empn.lst Here, the context address /45$/ matches the string 45 at the end ($) of the field.

awk handles decimal numbers also. Engineered for Tomorrow Number Processing: awk can perform computations on numbers using the arithmetic operators +, -, *, / and %. awk handles decimal numbers also. to print a rudimentary pay slip for the directors: $ awk –F”|” ‘$3 = = “director” { > printf “%-20s %-12s %d %d %d\n”, $2,$3,$6,$6*0.4,$6*0.15}’ empn.lst lalait choudhary director 8200 3280 1230 barun sengupta director 7800 3120 1170 jai sharma director 7000 2800 1050 chanchal singahvi director 6700 2680 1005

$ awk -F”|” ‘$3 = = director” && $6 > 6700 { >kount = kount + 1 Engineered for Tomorrow Variables: awk has certain built-in variables, like NR and $0. It also allows the user to variables of his/her choice. $ awk -F”|” ‘$3 = = director” && $6 > 6700 { >kount = kount + 1 >printf “%3d %-20s %-12s %d\n”, kount,$2,$3,$6 }’ empn.lst 1 lalit choudhary director 8200 2 barun sengupta director 7800 3 jai sharma director 7000 Here, the initial value of kount is zero (0) by default.

The –f option :Storing awk Programs in a File Engineered for Tomorrow The –f option :Storing awk Programs in a File the awk programs take the extension .awk the previous program is stored in the file empawk.awk $ cat empawk.awk $3 = = “director” && $6 > 6700 { printf “%3d -20s %-12s %d\n”, ++kount, $2,$3,$6 } now we can use awk in the following way to get the same output as before: awk –F”|” –f empawk.awk empn.lst

The BEGIN and END sections: Engineered for Tomorrow The BEGIN and END sections: The BEGIN and END sections are used to do some pre- and post –processing work. To print something before processing the first line, for example, a heading, then the BEGIN section can be used. The END section is useful in printing some totals after processing is over. The BEGIN and END sections are optional and take the form BEGIN { action } END { action } // both require curly braces These 2 sections , when present, are delimited by the body of the awk program. Consider the following awk program , stored in a file empawk2.awk. Here, a suitable is heading is printed at the beginning and average salary at the end

printf “\t \tEmployee abstract \n \n “ } $6 > 7500 { // Engineered for Tomorrow BEGIN { printf “\t \tEmployee abstract \n \n “ } $6 > 7500 { // kount++ ; tot+= $6 printf “:%3d %-20s %-12s %d\n”, kount, $2,$3,$6 } END { printf ‘\n \t The average basic pay is %6d\n “, tot/kount To execute this program, use the –f option: $ awk –F “|” -f empawk2.awk empn.lst Employee abstract

3 lalit choudhary director 8200 4 barun sengupta director 7800 Engineered for Tomorrow 1 v.k. agarwal g.m. 9000 2 j.b. saxena g.m. 8000 3 lalit choudhary director 8200 4 barun sengupta director 7800 The average basic pay is 8250 We can also perform floating point arithmetic with awk : $ awk ‘BEGIN { printf %f\n”, 22/7 }’ 3.142857

Engineered for Tomorrow Built-in Variables: awk has several built-in variables. They are all assigned automatically

NR variable: signifies the record number of the current line. Engineered for Tomorrow NR variable: signifies the record number of the current line. FS variable: awk uses a contiguous string of spaces as the default field delimiter. FS redefines this filed separator as |. it must occur in the BEGIN section so that the body of the program knows its value before it starts processing: BEGIN { FS= “|” } this is an alternative to the –F option which does the same thing. The OFS variable: the awk’s default output field separator can be reassigned using the variable OFS in the BEGIN section: BEGIN { OFS=“~” }

> print “Record No “, NR, “has” , NF, “FIELDS”}’ empx.lst Engineered for Tomorrow The NF variable: by using this variable on a file , say, empx.lst , we can locate those lines not having 6 fields as shown : $ awk ‘BEGIN { FS = “|” } > NF ! = 6 { > print “Record No “, NR, “has” , NF, “FIELDS”}’ empx.lst Record No 6 has 4 fields Record No 17 has 5 fields The FILENAME variable: FILENAME stores the name of the current file being processed. awk can also handle multiple filenames the command line. By default, awk doesn’t print the filename, but we can instruct it to do so: ‘$6 < 4000 { print FILENAME, $0 }’

An array is a variable that can store a set of values or elements. Engineered for Tomorrow Arrays: An array is a variable that can store a set of values or elements. Each value is accessed by a subscript called the index. awk arrays are different from the ones used in other programming languages in many ways: They are not formally defined. An array is considered declared the moment it is used. Array elements are initialized to zero or an empty string unless initialized explicitly. Arrays expand automatically. The index can be virtually anything ; it can even be a string. Consider the program empawk3.awk , which uses arrays to store the totals of the basic pay ,da, hra and gross pay of the sales and marketing people. Assume that the da is 25% , and hra is 50% of basic pay.

Engineered for Tomorrow BEGIN { FS = “|” printf “%46s\n”, “Basic Da Hra Gross” } /sales|marketing/ { da = 0.25*$6 ; hra = 0.50*$6 ; gp = $6+hra+da tot[1] += $6 ; tot[2] += da ; tot[3] += hra ; tot[4] += gp kount++ } END { printf “\t Average %5d %5d %5d \n”, \ tot[1]/kount , tot[2]/kount , tot[3]/kount, tot[4]/kount Now run this program: $ awk –f empawk3.awk empn.lst Basic Da Hra Gross Average 6812 1703 3406 11921

Engineered for Tomorrow zz $ awk ‘BEGIN { direction[“N”]=“North”;direction[“S”]=“South”; direction[“E”]=“East”;direction[“W”]=“West”; printf(“N is %s and W is %s\n”,direction[“N”],direction[“W”]); mon[1]=“jan”;mon[“1”]=“january”;mon[“01”]=“JAN”; printf(“mon[1] is %s\n”,mon[1]); printf(“mon[01] is also %s\n”,mon[01]); printf(“mon[\”1\”] is also %s\n”,mon[“1”]); printf(“But mon[\”01\”] is %s\n”, mon[“01”]); }’ N is North and W is West Mon[1] is january Mon[01] is also january Mon[“1”] is also january But mon[“01”] is JAN

some functions take a variable number of arguments. Engineered for Tomorrow Functions: awk has several built-in functions, performing both arithmetic and string operations. the arguments are passed to a function , delimited by commas and enclosed by a matched pair of parentheses. some functions take a variable number of arguments. awk also has some of the common string handling functions . length: determines the length of its argument and if no argument is present, the entire line is assumed to be the argument. we can use length (without any argument ) to locate lines whose length exceeds 1024 characters: awk –F“|” ‘length > 1024’ empn.lst we can use length with a field as shown: awk –F”|” ‘length($2) < 11’ empn.lst

Engineered for Tomorrow index: index(s1,s2) determines the position of a string s2 within a larger string s1. x = index(“abcde”, “b”) returns the value 2. substr: The substr (stg,m,n) function extracts a substring from a string stg. m represents the starting point of extraction and n indicates the number of characters to be extracted.

Engineered for Tomorrow split : split(stg,arr,ch) breaks up a string stg on the delimiter ch and stores the fields in an array arr[]. $ awk –F\| ‘{split($5,ar,”/”) ; print “19” ar[3] ar[2] ar[1]}’ empn.lst 19521212 19501203 19431904 ……. system: The system function executes a UNIX command within awk. BEGIN { system(“tput clear”) //clears the screen system(“date”) } //executes the UNIX date command

2365 | barun sengupta |director |personnel |11/05/47|7800|2365 Engineered for Tomorrow $ awk –F”|” ‘substr ($5,7,2) > 45 && substr($5,7,2) < 52’ empn.lst 2365 | barun sengupta |director |personnel |11/05/47|7800|2365 3564|sudhirAgarwal|executive|personnel|06/07/47|7500|2365 4290|jayantchoudhary|executive|production|07/09/50|6000|9876 9876|jai sharma |director |production |12/03/50 |7000|9876

CONTROL FLOW-The if statement: Engineered for Tomorrow CONTROL FLOW-The if statement: The conditional structure (such as if statement) and loops (for and while) execute a body of statements depending on the success or failure of the control command. The if statement takes the following form: if ( condition is true) { statements } else { //else is optional }

awk –F”|” ‘{ if ($6 > 7500) printf……… Engineered for Tomorrow Example: awk –F”|” ‘{ if ($6 > 7500) printf……… if statement can be used with the comparison operators and the special symbols ~ and !~ to match a regular expression. if statement can also be used with the logical operators || and &&. Examples: if ( NR >=3 && NR <=6 ) if ( $2 !~ /[aA\gg?[ar]+wal/ ) The if-else structure: if ( $6 < 6000) da = 0.25*$6 else da = 1000 The above code can be replaced with a compact conditional structure: $6 < 6000 ? da = 0.25*$6 : da = 1000

Engineered for Tomorrow LOOPING with for: The for statement executes the loop body as long as the control command returns a true value . The for has 2 forms. The first form is: for ( K = 1 ; K<=9 ; K+= 2) This form consists of 3 components: the first component initializes the value of K, the second checks the condition with every iteration , while the third sets the increment used for every iteration.

Using for with an Associative Array: Engineered for Tomorrow Using for with an Associative Array: The second form of for loop exploits the associative feature of awk’s arrays. The loop selects each index of an array: for ( k in array) commands Here, k is the subscript of the array arr. The k can also be string and hence we can use this loop to print all environment variables. $ awk ‘BEGIN { > for ( key in ENVIRON) > print key “=” ENVIRON[key] > }’

MAIL=/var/mail/sumit PATH=/usr/bin::/usr/local/bin:/usr/ccs/bin Engineered for Tomorrow LOGNAME=sumit MAIL=/var/mail/sumit PATH=/usr/bin::/usr/local/bin:/usr/ccs/bin TERM=xterm HOME=/home/sumit SHELL=/bin/bash ……… We can use any field as the index because the index is actually a string. We can use the string value of $3 as the subscript of the array kount[]: $ awk –F”|” ‘{ kount[$3]++ } > END { for ( desig in kount ) > print desig, kount[desig] }’ empn.lst

Output: g.m. 4 chairman 1 executive 2 director 4 manager 2 d.g.m 2 Engineered for Tomorrow Output: g.m. 4 chairman 1 executive 2 director 4 manager 2 d.g.m 2 LOOPING with while: The while loop repeats a set of instructions as long as its control command returns a true value. The previous for loop used for centering text can be easily replaced with a while loop as shown below:

Engineered for Tomorrow k = 0 while (k < (55 - length($0))/2) { printf “%s”, “ ” k++ } print $

> awk ‘{ for (k = 1 ; k < (55 –length($0)) /2 ; k++) Engineered for Tomorrow $ echo “ > Income statement\nfor \nthe month of August , 2002\nDepartemnt : Sales” | > awk ‘{ for (k = 1 ; k < (55 –length($0)) /2 ; k++) > printf “%s”, “ ” > printf $0 }’ Output: Income statement for the month of August, 2002 Department : Sales