AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated.

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

AP Computer Science Anthony Keen. Computer 101 What happens when you turn a computer on? –BIOS tries to start a system loader –A system loader tries to.
Introduction to Unix – CS 21 Lecture 11. Lecture Overview Shell Programming Variable Discussion Command line parameters Arithmetic Discussion Control.
Introduction to C Programming
Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
More on Numerical Computation CS-2301 B-term More on Numerical Computation CS-2301, System Programming for Non-majors (Slides include materials from.
 2002 Prentice Hall. All rights reserved. 1 Intro: Java/Python Differences JavaPython Compiled: javac MyClass.java java MyClass Interpreted: python MyProgram.py.
ECE122 L11: For loops and Arrays March 8, 2007 ECE 122 Engineering Problem Solving with Java Lecture 11 For Loops and Arrays.
C. About the Crash Course Cover sufficient C for simple programs: variables and statements control functions arrays and strings pointers Slides and captured.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
PHP Server-side Programming. PHP  PHP stands for PHP: Hypertext Preprocessor  PHP is interpreted  PHP code is embedded into HTML code  interpreter.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Introduction to Python
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
Chapter 3: Data Types and Operators JavaScript - Introductory.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
C Programming n General Information on C n Data Types n Arithmetic Operators n Relational Operators n if, if-else, for, while by Kulapan Waranyuwat.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Linux+ Guide to Linux Certification, Third Edition
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Perl: Lecture 1 The language. What Perl is Merger of Unix tools – Very popular under UNIX – shell, sed, awk Programming language – C syntax Scripting.
Chapter 10: BASH Shell Scripting Fun with fi. In this chapter … Control structures File descriptors Variables.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
CSCI/CMPE 4341 Topic: Programming in Python Review: Exam I Xiang Lian The University of Texas – Pan American Edinburg, TX 78539
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
©Colin Jamison 2004 Shell scripting in Linux Colin Jamison.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Programming Fundamentals. The setw Manipulator setw changes the field width of output. The setw manipulator causes the number (or string) that follows.
Sed. Class Issues vSphere Issues – root only until lab 3.
© 2007 Pearson Addison-Wesley. All rights reserved2-1 Character Strings A string of characters can be represented as a string literal by putting double.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
2: Basics Basics Programming C# © 2003 DevelopMentor, Inc. 12/1/2003.
Announcements Assignment 1 due Wednesday at 11:59PM Quiz 1 on Thursday 1.
Data Handling in Algorithms. Activity 1 Starter Task: Quickly complete the sheet 5mins!
CSCI 330 UNIX and Network Programming Unit IX: awk II.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
By Dr P.Padmanabham Professor (CSE)&Director Bharat Institute of Engineering &Technology Hyderabad Mobile
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 next week. See next slide. Both versions of assignment 3 are posted. Due today.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Linux Administration Working with the BASH Shell.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
AWK.
CSC 4630 Meeting 7 February 7, 2007.
ECE Application Programming
DBW - PHP DBW2017.
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
Engineering Innovation Center
John Carelli, Instructor Kutztown University
PHP.
Awk.
Introduction to Bash Programming, part 3
Presentation transcript:

AWK

text processing languge

awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated versions ▫NAWK  New awk ▫GAWK  Free Software Foundation’s version

awk Basics Basic form: ▫ awk options 'selection criteria {action}' file(s) Can use regular expressions Files are read one line at a time with contents as fields Fields are numbered ($1, $2, etc…) ▫Entire line is $0 Can run standalone Can run as a program Uses a blank as the default separator

-f Option (stored awk programs) awk programs can be stored in a file awk –f awkfile datafile ▫ -f filename is the awk program ▫ datafile contains the data

Example Find the TAs in the personnel file ▫The file is blank separated  -F defines the delimiter  Use “ \ “ to escape the blank (a blank after the \) ▫Note: the blank is the default seperator anyway ▫Title is in the 3 rd field # cat personnel.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA # # awk -F\ '$3 == "TA" { print }' personnel.data Jinyue Xia TA Hadi Hashemi TA #

example To run an awk program ▫ personnel.data has the data ▫ findta.awk is the code  Looks for TA (3 rd parm)  Prints first name and telephone number (1 st and 5 th parms) ▫Note: what small formatting problem is here? # awk -F\ -f findta.awk personnel.data TAs Jinyue Hadi Done # cat personnel.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA # cat findta.awk BEGIN { print "TAs"; } $3 == "TA" {print $1 $5} END { print "Done" }

print and printf Output goes to std out ▫can be redirected with > or |  redirected name must be in quotes:  # print $2, $1 | "sort" ▫ the output of the print goes to the sort routine print is unformatted printf allows formatting ▫%s – string  %-20s  20 char spaces, justified (-) ▫%d – integer  %8d  set aside 8 spaces for the number ▫%f – floating point  %4.8f  Set aside 4 chars to the left of the decimal point and 8 to the right ▫printf needs \n to start new line

Number processing AWK supports basic computation ▫ + - addition ▫ - - subtraction ▫ * - multiplication ▫ / - division ▫ % - modulus ▫ ^ - exponentiation Also supports: ▫ ++ - add one to itself (post and pre fix) ▫ += - add and assign to self ▫ -- - subtract one from self (post and pre fix) ▫ -= - subtract from self ▫ *= - multiply self ▫ /= - divide self

Variables and Expressions awk is loosely typed do not need to declare variables ▫ x = 5 do not need $ to access like sed ▫ print x strings are double quoted ▫ x = "This is a string" no string concatenater, done by context ▫ x = "string1"; y = "string2" print x y  Space is required some conversions done automatically ▫ x = "56"; y = 43; z = "abc" print x y # gives 5643 y converted to string print x + y # gives 99 + converts x to integer print y + z # gives 43+ converts z to integer 0

Comparison and Logical Operators awk supports string and numeric comparisons ▫== is the equality operator  = is for assignment ▫ can be used on strings  Beware of conversions when dealing with strings that consist of numbers ▫~ is used for regular expressions  $2 ~ /[dh]og/  parameter 2 matches hog or dog

Comparison and Logical Operators awk supports boolean operations ▫ && - and ▫ || - or ▫ ! - not

simple comparison Field 6 is number of years with organization ▫Find those with more than 5 years # awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.data Kombol, Tony:6 Flintstone, Fred:10 # # cat personnelyears.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA Fred Flintstone RA Barney Rubble URA #

Regular Expression comparison example Find the TAs and RAs including the URAs # cat personnel.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA Fred Flintstone RA Barney Rubble URA # awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.data Jinyue Xia Hadi Hashemi Fred Flintstone Barney Rubble #

BEGIN and END Sections BEGIN and END allows for some pre and post processing ▫Both are optional General format: ▫ BEGIN { action } { action } END { action } ▫BEGIN's actions are done before the processing of the datafile begins  Good for headers, setup, etc. ▫END's actions are done after the processing of the datafile ends  Good for post processing, notes, etc.

another regular expression This is a more complex check using a file for the awk program ▫Check to see the ID is 800……  That is 800 followed by 6 characters # awk -f findbadid.awk personnelbad.data List of bad IDs follows Bad Id has a bad id: End of list # cat personnelbad.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA Fred Flintstone RA Barney Rubble URA Bad Id LX # cat findbadid.awk BEGIN { print "List of bad IDs follows"; } $4 !~ /^ / { print $1 " " $2 " has a bad id:" $4}; END { print "End of list"; } #

awk file example # cat grades.data Fred Ziffle:99:A Arnold Ziffle: 55: F Tara Boomdea: 85:B Neo:100:A Buffy Summers: 72:C Sheldon Cooper:67:D Zorbon Prentwist: 88 : B Zorbax Bottlewit:88:B Bad Grade: 33: A # cat ckgrades.awk BEGIN { print "Listing Bs\n" } $3 == "B" { print $0 } END { print "\nDone" } # # awk -F: -f ckgrades.awk grades.data Listing Bs Tara Boomdea: 85:B Zorbax Bottlewit:88:B Done # Note: " : B " does not get matched

Positional Parameters Parameters are usually used as the fields of each line A parameter can be passed to the awk program ▫Used with a shell program ▫Must be in quotes in the program  e.g.  Instead of ▫ $4 > 12 ▫4 th parm in line is > 12 ▫ $4 > '$2' ▫4 th parm in line is > 2 nd parm passed to the program: ▫ prog.awk 50 82

Arrays awk supports arrays ▫arrays do not need to be "declared"  "declared" the minute they are used Arrays are associative ▫index can be  numeric  alphabetic ▫ thisday["Tue"] = "Tuesday"; thisday[2] = "Tuesday";  above are two array elements for the array thisday  each reference a separate string  printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ; printf("thisday[2] is %s", thisday[2]) ; ▫Both will print "Tuesday" for the array referenced

Arrays ENVIRON[ ] ▫an assosciative array containing all the environmental variables # awk 'BEGIN{for (env in ENVIRON)print env "=" ENVIRON[env]}' SSH_CLIENT= HOME=/home/tkombol TERM=xterm LESSOPEN=| /usr/bin/lesspipe %s SHELL=/bin/bash USER=tkombol _=/usr/bin/awk SHLVL=1 PWD=/home/tkombol SSH_CONNECTION= LANG=en_US.UTF-8 MAIL=/var/mail/tkombol LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36: HISTCONTROL=ignoredups PATH=/usr/local/bin:/usr/bin:/bin:/usr/games LESSCLOSE=/usr/bin/lesspipe %s %s LOGNAME=tkombol SSH_TTY=/dev/pts/2 #

Built-in Variables awk has a set of built-in variables ▫Some can be overridden Built-In Variables VariableFunctionDefault NRCumulative # of lines read- FSInput Field Separatorspace OFSOutput Field Separatorspace OFMTDefault FP format%.6f RSRecord separatornewline NFNumber of fields in current line- FILENAMECurrent input file- ARGCNumber of arguments in command line- ARGVArray containing list of arguments- ENVIRONAssoc. array of all environment variables-

Functions awk has several built-in functions ▫() are optional if no parms  encouraged to use ▫Arithmetic functions ▫String functions

Arithmetic Functions int(x) sqrt(x)

String Functions length() ▫length of complete line length(x) ▫length of x tolower(s) ▫returns s as lower case toupper(s) ▫returns s as upper case substr(str,m) ▫returns string starting at m to end of string substr(str,m,n) ▫returns string starting at m for n characters index(s1,s2) ▫finds the position of s2 inside s2 split(str,arr,ch) ▫splits str int an array, the delimiter is ch system("cmd") ▫exectutes a system (Linux) command and returns exit status

If Syntax: ▫ if (cond true) { statements } else { statements } ▫Notes:  else is optional  {} not needed for single statements

For Syntax form 1: ▫ for ( startval ; condition ; control) statement  C like in form ▫Example:  for ( k=1 ; k<9 ; k++ ) print k Syntax form 2: ▫ for ( var in array) statement  Will scan every var in the array  Great for associative array  Non numeric indices  Gaps in array  See ENVIRON example in previous slide

While Syntax: ▫ while (cond is true) { statement(s) }

continue and break Continue and break can be used to stop all loops ▫for ▫while break ▫stops the loop continue ▫stops processing statements in this loop ▫continues to next iteration

Resources Awk - A Tutorial and Introduction - by Bruce BarnettAwk - A Tutorial and Introduction - by Bruce Barnett ▫ Awk Tutorial - Main PageAwk Tutorial - Main Page ▫

Summary awk is a "primative" scripting language good for processing text files ▫filtering perl is a more modern replacement ▫"religious war" over which is better if you understand awk it will be a good basis to understant perl