AWK.

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

CIS 240 Introduction to UNIX Instructor: Sue Sampson.
Introduction to Unix – CS 21 Lecture 11. Lecture Overview Shell Programming Variable Discussion Command line parameters Arithmetic Discussion Control.
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
More on Numerical Computation CS-2301 B-term More on Numerical Computation CS-2301, System Programming for Non-majors (Slides include materials from.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
Guide To UNIX Using Linux Third Edition
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated.
C Programming n General Information on C n Data Types n Arithmetic Operators n Relational Operators n if, if-else, for, while by Kulapan Waranyuwat.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Linux+ Guide to Linux Certification, Third Edition
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Chapter 10: BASH Shell Scripting Fun with fi. In this chapter … Control structures File descriptors Variables.
A talk about AWK Don Newcomb 18 Jan What is AWK? AWK is an interpreted computer language It is primarily used for text processing and data formatting.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
CSCI/CMPE 4341 Topic: Programming in Python Review: Exam I Xiang Lian The University of Texas – Pan American Edinburg, TX 78539
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
©Colin Jamison 2004 Shell scripting in Linux Colin Jamison.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (3) Ruibin Bai (Room AB326) Division of Computer Science The University.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
2: Basics Basics Programming C# © 2003 DevelopMentor, Inc. 12/1/2003.
CSCI 330 UNIX and Network Programming Unit IX: awk II.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
By Dr P.Padmanabham Professor (CSE)&Director Bharat Institute of Engineering &Technology Hyderabad Mobile
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
Linux Administration Working with the BASH Shell.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
Lesson 5-Exploring Utilities
awk- An advanced Filter
CSC 4630 Meeting 7 February 7, 2007.
Lecture 14 Programming with awk II
Input from STDIN STDIN, standard input, comes from the keyboard.
ECE Application Programming
Arrays: Checkboxes and Textareas
DBW - PHP DBW2017.
Shell Scripting March 1st, 2004 Class Meeting 7.
Lecture 13 & 14.
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
Engineering Innovation Center
What is Bash Shell Scripting?
John Carelli, Instructor Kutztown University
PHP.
Chapter 2 Programming Basics.
Winter 2019 CISC101 4/28/2019 CISC101 Reminders
Awk.
awk- An Advanced Filter
Introduction to Bash Programming, part 3
Introduction to C Programming
Presentation transcript:

AWK

awk text processing languge

awk Created for Unix by Aho, Weinberger and Kernighan Basicaly an: interpreted text processing programming language Updated versions NAWK New awk GAWK Free Software Foundation’s version

awk Basics Basic form: Can use regular expressions awk options 'selection criteria {action}' file(s) Can use regular expressions Files read one line at a time with contents as fields Fields are numbered ($1, $2, etc…) Entire line is $0 Can run standalone Can run as a program Uses a blank as the default separator

-f Option (stored awk programs) awk programs can be stored in a file awk –f awkfile datafile -f filename is the awk program datafile contains the data

Example Find the TAs in the personnel file The file is blank separated -F defines the delimiter Use “\ “ to escape the blank (a blank after the \) Note: the blank is the default seperator anyway Title is in the 3rd field # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # # awk -F\ '$3 == "TA" { print }' personnel.data

example To run an awk program personnel.data has the data findta.awk is the code Looks for TA (3rd parm) Prints first name and telephone number (1st and 5th parms) Note: what small formatting problem is here? # awk -F\ -f findta.awk personnel.data TAs Jinyue704-687-2222 Hadi704-687-3333 Done # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # cat findta.awk BEGIN { print "TAs"; } $3 == "TA" {print $1 $5} END { print "Done"

print and printf Output goes to std out print is unformatted can be redirected with > or | redirected name must be in quotes: # print $2, $1 | "sort" the output of the print goes to the sort routine print is unformatted printf allows formatting %s – string %-20s 20 char spaces, justified (-) %d – integer %8d set aside 8 spaces for the number %f – floating point %4.8f Set aside 4 chars to the left of the decimal point and 8 to the right printf needs \n to start new line

Number processing AWK supports basic computation Also supports: + - addition - - subtraction * - multiplication / - division % - modulus ^ - exponentiation Also supports: ++ - add one to itself (post and pre fix) += - add and assign to self -- - subtract one from self (post and pre fix) -= - subtract from self *= - multiply self /= - divide self

Variables and Expressions awk is loosely typed do not need to declare variables x = 5 do not need $ to use variables like sed or bash print x strings are double quoted x = "This is a string" no string concatenater, done by context x = "string1"; y = "string2" print x y Space is required some conversions done automatically x = "56"; y = 43; z = "abc" print x y # gives 5643 y converted to string print x + y # gives 99 + converts x to integer print y + z # gives 43 + converts z to integer 0

Comparison and Logical Operators awk supports string and numeric comparisons == is the equality operator = is for assignment < and > can be used on strings Beware of conversions when dealing with strings that consist of numbers ~ is used for regular expressions $2 ~ /[dh]og/ parameter 2 matches hog or dog

Comparison and Logical Operators awk supports boolean operations && - and || - or ! - not

simple comparison Field 6 is number of years with organization Find those with more than 5 years # awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.data Kombol, Tony:6 Flintstone, Fred:10 # # cat personnelyears.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 #

Regular Expression comparison example Find the TAs and RAs including the URAs # awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.data Jinyue Xia 704-687-2222 Hadi Hashemi 704-687-3333 Fred Flintstone 704-687-1212 Barney Rubble 704-687-3344 # # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 Fred Flintstone RA 800123321 704-687-1212 Barney Rubble URA 800112233 704-687-3344

BEGIN and END Sections BEGIN and END General format: Allows for some pre and post processing Both are optional General format: BEGIN { action } { action } END { action } BEGIN's actions are done before the processing of the datafile begins Good for headers, setup, etc. END's actions are done after the processing of the datafile ends Good for post processing, notes, etc.

another regular expression This is a more complex check using a file for the awk program Check to see the ID is 800…… That is 800 followed by 6 characters # cat findbadid.awk BEGIN { print "List of bad IDs follows"; } $4 !~ /^800....../ { print $1 " " $2 " has a bad id:" $4}; END { print "End of list"; # # cat personnelbad.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 Bad Id LX 809123456 704-687-8890 0 # awk -f findbadid.awk personnelbad.data List of bad IDs follows Bad Id has a bad id:809123456 End of list

awk file example # cat ckgrades.awk BEGIN { print "Listing Bs\n" } END { print "\nDone" # awk file example # awk -F: -f ckgrades.awk grades.data Listing Bs Tara Boomdea: 85:B Zorbax Bottlewit:88:B Done # # cat grades.data Fred Ziffle:99:A Arnold Ziffle: 55: F Tara Boomdea: 85:B Neo:100:A Buffy Summers: 72:C Sheldon Cooper:67:D Zorbon Prentwist: 88 : B Zorbax Bottlewit:88:B Bad Grade: 33: A Note: ": B" does not get matched

Positional Parameters Parameters are usually used as the fields of each line A parameter can be passed to the awk program Used with a shell program Must be in quotes in the program e.g. Instead of $4 > 12 4th parm in line is > 12 $4 > '$2' 4th parm in line is > 2nd parm passed to the program: prog.awk 50 82

Arrays awk supports arrays Arrays are associative arrays do not need to be "declared" "declared" the minute they are used Arrays are associative index can be numeric alphabetic thisday["Tue"] = "Tuesday"; thisday[2] = "Tuesday"; above are two array elements for the array thisday each reference a separate string printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ; printf("thisday[2] is %s", thisday[2]) ; Both will print "Tuesday" for the array referenced

Arrays ENVIRON[ ] an assosciative array containing all the environmental variables # awk 'BEGIN{for (env in ENVIRON)print env "=" ENVIRON[env]}' SSH_CLIENT=10.23.161.139 59365 22 HOME=/home/tkombol TERM=xterm LESSOPEN=| /usr/bin/lesspipe %s SHELL=/bin/bash USER=tkombol _=/usr/bin/awk SHLVL=1 PWD=/home/tkombol SSH_CONNECTION=10.23.161.139 59365 152.15.95.103 22 LANG=en_US.UTF-8 MAIL=/var/mail/tkombol LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36: HISTCONTROL=ignoredups PATH=/usr/local/bin:/usr/bin:/bin:/usr/games LESSCLOSE=/usr/bin/lesspipe %s %s LOGNAME=tkombol SSH_TTY=/dev/pts/2 #

Built-in Variables awk has a set of built-in variables Some can be overridden Built-In Variables Variable Function Default NR Cumulative # of lines read - FS Input Field Separator space OFS Output Field Separator OFMT Default FP format %.6f RS Record separator newline NF Number of fields in current line FILENAME Current input file ARGC Number of arguments in command line ARGV Array containing list of arguments ENVIRON Assoc. array of all environment variables

Functions awk has several built-in functions () are optional if no parms encouraged to use Arithmetic functions String functions

Arithmetic Functions int(x) sqrt(x)

String Functions length() length(x) tolower(s) toupper(s) length of complete line length(x) length of x tolower(s) returns s as lower case toupper(s) returns s as upper case substr(str,m) returns string starting at m to end of string substr(str,m,n) returns string starting at m for n characters index(s1,s2) finds the position of s2 inside s2 split(str,arr,ch) splits str int an array, the delimiter is ch system("cmd") exectutes a system (Linux) command and returns exit status

if Syntax: if (cond true) { statements } else { statements } Notes: else is optional {} not needed for single statements

for Syntax form 1: Syntax form 2: Example: for ( startval ; condition ; control ) statement C like in form Example: for ( k=1 ; k<9 ; k++ ) print k Syntax form 2: for ( var in array ) statement Will scan every var in the array Great for associative array Non numeric indices Gaps in array See ENVIRON example in previous slide

While Syntax: while (cond is true) { statement(s) }

continue and break Continue and break can be used to stop all loops for while break stops the loop continue stops processing statements in this loop continues to next iteration

Resources Awk - A Tutorial and Introduction - by Bruce Barnett http://www.grymoire.com/Unix/Awk.html Awk Tutorial - Main Page http://robert.wsi.edu.pl/awk/

Which is not a “scripting language: Auk Awk Perl Pearl Bash Bam

Summary awk is a "primative" scripting language good for processing text files filtering perl is a more modern replacement "religious war" over which is better if you understand awk it will be a good basis to understant perl