1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
Linux+ Guide to Linux Certification, Second Edition
Chapter 9 Formatted Input/Output Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
Guide To UNIX Using Linux Third Edition
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
Printing. printf: formatted printing So far we have just been copying stuff from standard-in, files, pipes, etc to the screen or another file. Say I have.
UNIX Filters.
Shell Script Examples.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
11 Chapter 3 DECISION STRUCTURES CONT’D. 22 FORMATTING FLOATING-POINT VALUES WITH THE DecimalFormat CLASS We can use the DecimalFormat class to control.
Introduction to Shell Script Programming
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Introduction to Bash Programming Ellen Zhang. Previous three classes What have we learnt so far ?
Linux+ Guide to Linux Certification, Third Edition
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter 3: Formatted Input/Output Copyright © 2008 W. W. Norton & Company. All rights reserved. 1 Chapter 3 Formatted Input/Output.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)
A talk about AWK Don Newcomb 18 Jan What is AWK? AWK is an interpreted computer language It is primarily used for text processing and data formatting.
Introduction to Unix – CS 21
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
LIN Unix Lecture 7 Hana Filip. LIN Text Processing Command Line Utility Programs (cont.) sed LAST WEEK wc sort tr uniq awk TODAY join paste.
CSC141 Introduction to Computer Programming Teacher: AHMED MUMTAZ MUSTEHSAN Lecture - 6.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CSCI 330 UNIX and Network Programming Unit IX: Shell Scripts.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
1 Homework Done the reading? –K&R –Glass Chapters 1 and 2 Applied for cs240? (If not, keep at it!) Gotten a UNIX account? (If not, keep at it!)
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Linux+ Guide to Linux Certification, Second Edition
Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
By Dr P.Padmanabham Professor (CSE)&Director Bharat Institute of Engineering &Technology Hyderabad Mobile
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Chapter 3: Formatted Input/Output 1 Chapter 3 Formatted Input/Output.
Programming Languages Meeting 12 November 18/19, 2014.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
CCSA 221 Programming in C CHAPTER 3 COMPILING AND RUNNING YOUR FIRST PROGRAM 1 ALHANOUF ALAMR.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
CSC 4630 Meeting 7 February 7, 2007.
Lecture 9 Shell Programming – Command substitution
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
INPUT & OUTPUT scanf & printf.
Guide To UNIX Using Linux Third Edition
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
Unix Talk #2 (sed).
Awk.
Presentation transcript:

1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

2 Introduction  Students' grades in a text file  John  Alex  How can I calculate John's current average within this file  GREP? –Search for John with grep? Gives me the line. –Now I can use my calculator to figure it out. –SED?  sed will allow me to print, change, delete, etc.  I really want to automatically manipulate the values within this line.  This is where awk comes in.  (awk me amadeus)

3 awk  The first initials from the last names of each of the authors, Aho, Weinberg and Kernighan  Which awk are we tawking about? –awk –nawk – new awk ( on CS machines ) –gawk – GNU awk ( bart )

4 AWK syntax  awk ‘/pattern/’ file  awk ‘{action}’ file  awk ‘/pattern/ {action;}' file  cat file | awk ‘{action}’ Awk automatically reads in the file for you line by line. –No need to open/close file. (like in C or Java) –pattern section FINDS LINES with that pattern –action section does the actions you defined on the lines it found –The original file does not change.

5 Simple example  awk ‘{ print }’ fruit_prices  Note: Here the pattern is missing, in this case, the awk command print is used to print each line it read

6 Simple example awk ‘ /\$[0-9]*\.[0-9][0-9]*/ { print} ‘ fruit_prices

7 Action  Actions are specified by the programmers not just print, delete, etc (p/d/s from sed). That is why it is so awesome!  Actions consists of –variable assignments, –arithmetic and logic operators, –decision structures, –looping structures.  For example, print, if, while and for  awk ‘{print}’ filename

8 Execution types  format 1: awk ‘script’ –where INPUT must come from pipe or STDIN –command | awk ‘script’  format 2: awk ‘script’ input1 input2... inputn –where we supply input FILES as input1, input2, etc.  format 3: awk -f script_file input1...  (# in "script..." is comment)

9 Pattern  Types –Regular expressions –BEGIN  Do all the stuff BEFORE reading any input –END  does all this stuff AFTER reading ALL input.  Pattern is optional  If no pattern is specified, the "action" will occur for EVERY LINE time.  awk ‘{Action}’ filename  awk '{print;}' namesprints all lines  awk ‘BEGIN {print “The average grades”}’

10 Awk Regular Expression Metacharacters  Supports –^, $,., *, +, ?, [ABC], [^ABC], –[A-Z], A|B, (AB)+, \, &  Not support –Backreferencing, \( \) –Repetition, \{ \}

11 awk ‘ BEGIN { actions ; } /pattern/ { actions ; } END { actions ;} ‘ files Execution steps: 1)If a BEGIN pattern is present, executes its actions 2)Reads an input line and parses it into fields 3)Compares each of the specified patterns against the input line, if find a match, executes the actions. This step is repeated for all patterns. 4)Repeats steps 2 and 3 while input lines are present 5)After the script reads all the input lines, if the END pattern is present, executes its actions

12 Try This!  Place the following in the file tryawk1.awk BEGIN { print "Starting to read input"; nLines = 0; } nLines = 0; } /^.*$/ { nLines++; } END { print “DONE: Total lines = “ nLines; } –Run the command: cat tryawk1.awk | awk –f tryawk1.awk –Counts the # of lines in the input  nLines is a variable … note NO declaration, just use  print command prints a line of text, adds newline to end of the line

13 Records and fields  awk has RECORDS (lines) and FIELDS  $0 represents the entire line of input  $1 represents the first field  Print just like echo –Print $1 $2 # $1 concat $2 –Print $1, $2 # $1 OFS $2  cat fruit_prices  awk '{print;}' fruit_prices #prints all lines  awk '{print $0;}' fruit_prices #prints each entire line  awk '{print $1;}' fruit_prices #prints first field in each line  awk '{print $2;}' fruit_prices #prints second field in each line

14 Examples cat phones.data John Robinson Yin Pan awk ‘{ print $1, $2, $3 }’ phones.data John Robinson John Robinson Yin Pan awk ‘{ print $2 “, ”, $1, $3 }’ phones.data Robinson, John Robinson, John Pan, Yin Pan, Yin awk ‘/^$/ { print x += 1 }’ phones.data awk ‘/Mary/ { print $0 }’ phones.data

15 Examples (con’t)  ls -l | awk ‘ $6 == "Oct" { sum += $5 ; } END { print sum ; } ‘  ls -l | awk -f block_use.awk cat block_use.awk $6 == "Oct" { sum += $5 ; } END { print sum ; }

16 Taking Pattern-specific Actions #!/bin/sh awk ‘ /\$[1-9][0-9]*\.[0-9][0-9]*/ { print $0,”*”;} /\$0\.[0-9][0-9]*/ { print ;} ‘ fruit_prices

17 Intrinsic variables  awk defines RECORDS (lines) and FIELDS –FS, input field separator (default=space/tab) –OFS, output field separator (default=space) –ORS, Output record separator (default=newline) –RS, Input record separator (default=newline) –NR, number of the current record being processed –NF, number of fields within current record –FILENAME, awk sets this pattern to the name of the file that it's currently reading. (If you have more than input file, awk resets this pattern as it reads each file in turn.

18 How does awk work  awk ‘{print $1, $3}’ names –Put a line of input to $0 based on RS –The line is broken into fields based on FS and store them in a numbered variable, starting with $1 –Prints the fields with print or others based on OFS to separate fields –After awk displays it output, it goes to next line and repeat. The output lines are separated by ORS.

19 Changing the Input Field Separator  Manually resetting FS in a BEGIN pattern –Forces you to hard code the value of the field separator –BEGIN{FS=“:” ; } –Example:  $ awk ‘BEGIN { FS=“:” ; } { print $1, $6 ; }’ /etc/passwd  Specifying the –F option to awk –awk –F: ‘ { … } ’ –Enables using a shell variable to specify the field separator dynamically –Example:  sep=‘:’  $ awk –F$sep ‘ { print $1, $6 ; }’ /etc/passwd

20 Example  FirstName;LastName;Address;City;State;Zip;Phone  SSN:DOB:NumberOfDependents  HospitilizationCOde,DentalCode,LifeCOde  Convert this file format to:  SSN,LastName,FirstName,Address,….

21  awk ‘BEGIN{OFS=“,”; FS=“;”} {NR%3==1 {FS=“;”; #prepare {NR%3==1 {FS=“;”; #prepare F=$1; L=$2; A=$3;…..} NR%3==2 {FS=“:”; SSN=$1;DOB=$2;…} NR%3==0{FS=“,”;…;print F L A…} }’ filename

22 Print vs. Printf.2  printf –1 st argument is a string … the ‘format’ –Prints each character of the format  Upon reaching a %, the next few characters are a format specifier  The next argument is printed according to the specifier –Does not append a newline –More control over appearance of output –Consider awk 'BEGIN { printf "%5.2f\n", 2/3; }'  Prints  0.67 (here, the  represents a space)  %5.2f means print a fractional number (the ‘f’) in a field 5 characters wide, with 2 digits to the right of the decimal point.

23 Why Printf  printf - for formatting output of your “print”  We have function print, why printf –Printf allows us to FORMAT stuff. –can FORCE printing of string –Decimals –whole numbers –how many digits fall on either side of decimal pt –scientific notation –make things line up nicely

24 printf  printf (format, what to print)   printf ( "%s", x) –%s is a PLACEHOLDER for some OUTPUT. –s is a specific type of output (string) –ONE item (%s), must have ONE thing to print in the "what to print“ –format inside of quotes, followed by comma, followed by variables outside the quotes to print.  printf ( " s = %s ", x ) –"s=" is a LITERAL string

25 Printf format  s = A character string  f = A floating point number  d or i= the integer part of a decimal number  g or e = scientific notation of a floating point  c = An ASCII character  if x=65 and I use this print statement  printf ( " s = %c ", x )  output is "s = A“  awk 'BEGIN{x=65; printf("char: %c\n", x)}'

26 Printf  More control: –%wd  Print an integer out in a field of width w  If the number is smaller than w characters, print leading spaces  Try awk 'BEGIN { printf "%10d\n", 10; }' /dev/null –Try to add a ‘-’ immediately after the %  Left justifies the value in the field

27 Printf  %ws –Print a string out in a field of width w –Supply leading spaces as necessary  Place a ‘-’ immediately after the % to get left justification

28 Printf  %w.df –Prints the value out in a field of width w –Places the decimal point d places from the right end –Place a ‘-’ immediately after the % to get left justification

29 Printf examples  Apple    awk ‘{printf (" %10s %5d %5d %d ", $1, $2, $3, $4 )}’ file  awk ‘{printf (" %-10s %5d %5d %d ", $1, $2, $3, $4 )}’ file  minus sign designates that this field will be LEFT JUSTIFIED  awk ‘{printf (" %-10s %-5d %-5d %d ", $1, $2, $3, $4 )}’ file  awk ‘{printf (“|%-15s|\n”, $1)}’

30 Printf examples  Let’s put an average in there...  printf (" %-10s %-5d %-5d %-5d %f ", $1, $2, $3, $4, average )  Will provide RAW number ( as many decimals as the calculation provides with 6 char’s to RIGHT of decimal)  printf (" %-10s %-5d %-5d %-5d %.2f ", $1, $2, $3, $4, average )  %.2f says use TWO char's to RIGHT of decimal  printf doesn't provide the newline automatically....  printf (" %-10s %-5d %-5d %-5d %.2f \n ", $1, $2, $3, $4, average )

31 The OFMT variable (stands for Output Formatting for numbers)  A special awk variable  Control the printing of numbers when using print function  awk ‘BEGIN{print ;}’  awk ‘BEGIN{OFMT=“%.2f”; print ;}’