1 Xiaolan Zhang Spring 2013 CISC3130: awk. 2 Outlines Overview awk command line awk program model: record & field, pattern/action pair awk program elements:

Slides:



Advertisements
Similar presentations
Introduction to Unix – CS 21 Lecture 11. Lecture Overview Shell Programming Variable Discussion Command line parameters Arithmetic Discussion Control.
Advertisements

Lecture 2 Introduction to C Programming
Introduction to C Programming
 2000 Prentice Hall, Inc. All rights reserved. Chapter 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line.
Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
CSc 352 Shell Scripts Saumya Debray Dept. of Computer Science
Guide To UNIX Using Linux Third Edition
Guide To UNIX Using Linux Third Edition
Chapter 7. 2 Objectives You should be able to describe: The string Class Character Manipulation Methods Exception Handling Input Data Validation Namespaces.
Introduction to C Programming
Bash Shell Scripting 10 Second Guide Common environment variables PATH - Sets the search path for any executable command. Similar to the PATH variable.
Shell Script Examples.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
An Introduction to Unix Shell Scripting
Chap 3 – PHP Quick Start COMP RL Professor Mattos.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Chapter 3 Processing and Interactive Input. 2 Assignment  The general syntax for an assignment statement is variable = operand; The operand to the right.
1 awk awk is a file-processing programming language. Makes it easy to perform text manipulation tasks. Is used in –Generating reports –Matching patterns.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 2 Chapter 2 - Introduction to C Programming.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Exam Revision Ruibin Bai (Room AB326) Division of Computer Science The University of Nottingham.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
6/3/2016 CSI Chapter 02 1 Introduction of Flow of Control There are times when you need to vary the way your program executes based on given input.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)
Chapter 10: BASH Shell Scripting Fun with fi. In this chapter … Control structures File descriptors Variables.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
Introduction to Unix – CS 21
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
LIN Unix Lecture 7 Hana Filip. LIN Text Processing Command Line Utility Programs (cont.) sed LAST WEEK wc sort tr uniq awk TODAY join paste.
©Colin Jamison 2004 Shell scripting in Linux Colin Jamison.
CISC3130, Spring 2011 Dr. Zhang 1 Bash Programming Review.
CSCI 330 UNIX and Network Programming Unit IX: Shell Scripts.
CISC3130 Spring 2013 Fordham Univ. 1 Bash Scripting: control structures.
CSCI 330 UNIX and Network Programming
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 2 - Introduction to C Programming Outline.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
 2007 Pearson Education, Inc. All rights reserved. A Simple C Program 1 /* ************************************************* *** Program: hello_world.
A FIRST BOOK OF C++ CHAPTER 14 THE STRING CLASS AND EXCEPTION HANDLING.
Variable Variables A variable variable has as its value the name of another variable without $ prefix E.g., if we have $addr, might have a statement $tmp.
Unit – 3 Control structures. Condition Statements 1.If.…..else :- Has someone ever told you, "if you work hard, then you will succeed"? And what happens.
By Dr P.Padmanabham Professor (CSE)&Director Bharat Institute of Engineering &Technology Hyderabad Mobile
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
1 UNIX Operating Systems II Part 2: Shell Scripting Instructor: Stan Isaacs.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
Linux Administration Working with the BASH Shell.
1 Lecture 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line of Text 2.3Another Simple C Program: Adding.
1 Xiaolan Zhang Spring 2013 Unix Commands. 2 Outlines awk Commands working with files Process-related commands.
Lesson 5-Exploring Utilities
CSC 4630 Meeting 7 February 7, 2007.
Shell Scripting March 1st, 2004 Class Meeting 7.
What is Bash Shell Scripting?
John Carelli, Instructor Kutztown University
Chapter 2 - Introduction to C Programming
Linux Shell Script Programming
2.6 The if/else Selection Structure
Introduction to Bash Programming, part 3
CIS 136 Building Mobile Apps
Review.
Presentation transcript:

1 Xiaolan Zhang Spring 2013 CISC3130: awk

2 Outlines Overview awk command line awk program model: record & field, pattern/action pair awk program elements: variable, statement Variable, Expression, Function Numeric operators String functions Array variable Function User-controlled input Input/Output Redirection External command

awk: what is it? programming language was designed to simplify many common text processing tasks Online manual: info system vs. man system Version issue: old awk (before mid-1980, and after) awk, oawk, nawk, gawk, mawk … 3

Overview awk [ -F fs ] [ -v var=value... ] 'program' [ -- ] [ var=value... ] [ file(s) ] awk [ -F fs ] [ -v var=value... ] -f programfile [ -- ] [ var=value... ] [ file(s) ] -F option: specified field separator Program: Consists of pairs of pattern and braced action, e.g., /zhang/ {print $3} NR<10 {print $0} provided in command line or file … Initialization: With –v option: take effect before program is started Other: might be interspersed with filenames, i.e., apply to different files supplied after them 4

awk script/program An executable file #!/bin/awk –f BEGIIN{ lines=0; total=0; } { lines++; total+=$1; } 5 END{ if (lines>0) print “agerage is “, total/lines; else print “no records” } Demo: $ average.awk avg.data

awk programming model Input: awk views an input stream as a collection of records, each of which can be further subdivided into fields. Normally, a record is a line, and a field is a word of one or more nonwhite space characters. However, what constitutes a record and a field is entirely under the control of the programmer, and their definitions can even be changed during processing. Input is switched automatically from one input file to next, and awk itself normally handles opening, reading,and closing of each input file Programmer do not worry about this 6

awk program An awk program: consists of pairs of patterns and braced actions, possibly supplemented by functions that implement actions. For each pattern that matches input, action is executed; all patterns are examined for every input record pattern { action } ##Run action if pattern matches Either part of a pattern/action pair may be omitted. If pattern is omitted, action is applied to every input record { action } ##Run action for every record If action is omitted, default action is to print matching record on standard output pattern ##Print record if pattern matches 7

Awk pattern Pattern: a condition that specify what kind of records the associated action should be applied to string and/or numeric expressions: If evaluated to nonzero (true) for current input record, associated action is carried out. Or an regular expression (ERE): to match input record, same as $0 ~ /regexp/ NF = = 0 Select empty records NF > 3 Select records with more than 3 fields NR < 5 Select records 1 through 4 (FNR = = 3) && (FILENAME ~ /[.][ch]$/) Select record 3 in C source files $1 ~ /jones/ Select records with "jones" in field 1 /[Xx][Mm][Ll]/ Select records containing "XML", ignoring lettercase $0 ~ /[Xx][Mm][Ll]/ Same as preceding selection 8

BEGIN, END pattern BEGIN pattern: associated action is performed just once, before any command-line files or ordinary command-line assignments are processed, but after any leading –v option assignments have been done. normally used to handle special initialization tasks END pattern: associated action is performed just once, after all of input data has been processed. normally used to produce summary reports or to perform cleanup actions 9

Action Enclosed by braces Statements: separated by newline or ; Assignment statement line=1 sum=sum+value print statement print ″ sum= ″, sum if statement, if/else statement while loop, do/while loop, for loop (three parts, and one part) break, continue 10

11 $0 the current record $1, $2, … $NF the first, second, … last field of current record

Simple one-line awk program Using awk to cut awk -F ':' '{print $1,$3;}' /etc/passwd To simulate head awk 'NR<10 {print $0}' /etc/passwd To count lines: awk ‘END {print NR}’ /etc/passwd What’s my UID (numerical user id?) awk –F ‘:’ ‘/^zhang/ {print $3}’ /etc/passswd 12

Doing something new Output the logarithm of numbers in first field echo 10 | awk ‘{print $0,log($0)}’ Sum all fields together awk '{sum=0; for (i=1;i<NF;i++) sum+=sum+$i; print sum}' data2 How about weighted sum? Four fields with weight assignments (0.1, 0.3, 0.4,0.2) awk '{sum= $1*0.1+$2*0.3+$3*0.4+$4*0.2; print sum}' data2 13

14 Outlines Overview awk command line awk program model: record & field, pattern/action pair awk program elements: variable, statement Variable, Expression, Function Numeric operators String functions Array variable Function User-controlled input Input/Output Redirection External command

Awk variables Difference from C/C++ variables Initialized to 0, or empty string No need to declare, variable types are decided based on context All variables are global (even those used in function, except function parameters) Difference from shell variables: Reference without $, except for $0,$1,…$NF Conversion between numeric value and string value N=123; s=“”N ## s is assigned “123” S=123, N=0+S ## N is assigned 123 Floating point arithmetic operations awk '{print $1 “F=“ ($1-32)*5/9 “C”}' data echo 38 | awk '{print $1 “F=“ ($1-32)*5/9 “C”}' 15

16

17

Working with strings length(a): return the length of a stirng substr (a, start, len): returns a copy of sub-string of len, starting at start-th character in a substr(“abcde”, 2, 3) returns “bcd” toupper(a), tolower(a): lettercase conversion index(a,find): returns starting position of find in a Index(“abcde”, “cd”) returns 3 match(a,regexp): matches string a against regular express regexp, return index if matching succeeed, otherwise return 0 Similar to (a ~ regexp): return 1 or 0 18

String matching Two operators, ~ (matches) and !~ (does not match) "ABC" ~ "^[A-Z]+$" is true, because the left string contains only uppercase letters,and the right regular expression matches any string of (ASCII) uppercase letters Regular expression can be delimited by either quotes or slashes: "ABC" ~/^[A-Z]+$/ 19

Working with strings: subtitute sub (regexp, replacement, target) gsub(regexp, replacement, target) -- global Matches target against regexp, and replaces the lestmost (sub) or all (gsub) longest match by string replacement E.g., gsub(/[^$-0-9.,]/,”*”, amount) Replace illegal amount with * To extract all constant string from a file sub (/^[^"]+"/, "", value) ## replace everything before " by empty string sub(/".*$/, "", value); ## replace everything after " by empty string 20

Working with string: splitting split (string, array, regexp): break string into pieces stored in array, using delimiter as given by regexp function split_path (target) { n = split (target, paths, "/"); for (k=1;k<=n;k++) print paths[k] ##Alternative way to iterate through array: ## for (path in paths) ## print paths[path] } 21 Demo: string.awk

String formatting sprintf(), printf () 22

23 Outlines Overview awk command line awk program model: record & field, pattern/action pair awk program elements: variable, statement Variable, Expression, Function Numeric operators String functions Command line arguments Array variable Function User-controlled input Input/Output Redirection External command

Awk: command line arguments Recall the following keys about awk: Command line syntax awk [ -F fs ] [ -v var=value... ] 'program' [ -- ] [ var=value... ] [ file(s) ] awk [ -F fs ] [ -v var=value... ] -f programfile [ -- ] [ var=value... ] [ file(s) ] Program model awk by default opens each file specified in command line, read one record at a time, and execute all matching actions in the program 24

Awk: command line arguments run copy_awk Read test.awk command, and test it test.awk file1 file2 … filen What happens and why? Now try to call test.awk file1 file2 targetfile=file3 v=3 25

26 Outlines Overview awk command line awk program model: record & field, pattern/action pair awk program elements: variable, statement Variable, Expression, Function Numeric operators String functions Command line arguments Array variable Function User-controlled input Input/Output Redirection External command

awk array variables Array can be indexed using integers or strings (associated array) For example, ARGV[0], ARGV[1], …, ARGV[ARGC-1] Demonstrate using example of grade calculation 27

Associative array Suppose input file is as follows: ## weights A 90 ## A if total is greater than or equal to 90 B 80 C 70 D 60 F 0 alice jack smith john zack

#!/bin/awk -f NR==1 { ## read the weights for (num=1;num<=NF;num++) { w[num] = $num } /^[A-F] / { ## read the letter-grade mapping ##thresholds thresh[$0] = $1 } 29 /^[a-z]/ { # this code is executed once for each line sum=0; for (col=2;col<=NF;col++) sum+=($col*w[col-1]); printf ("%s %d ", $0, sum); if (sum>=thresh["A"]) print "A" else if (sum>=thresh["B"]) print "B" else if (sum>=thresh["C"]) print "C" else if (sum>=thresh["D"]) print "D" else print "F" } weighted_array.awk Need $ when refer to the fields in the record No $ for other variables !

30 Outlines Overview awk command line awk program model: record & field, pattern/action pair awk program elements: variable, statement Variable, Expression, Function Numeric operators String functions Array variable Function User-controlled input Input/Output Redirection External command

Awk user-defined function Can be defined anywhere: before, after or between pattern/action groups Convention: placed after pattern/action code, in alphabetic order function name(arg1,arg2, …, argn) { statement(s) } name(exp1,exp2,…,expn); result = name(exp1,exp2,…,expn); return statement: return expr Terminate current func, return control to caller with value of expr Default value: 0 or “” (empty string) 31 Named argument: local variable to function, Hide global var. with same name

Variable and argument function a(num) { for (n=1;n<=num;n++) printf ("%s", "*"); } { n=$1 a(n) print n } 32 Warning: Variables used in function body, but not included in argument list are global variable Todo: 1.What’s the output? echo 3 | awk –f global_var.ark 2. Try it …

Solution: make n local variable Hard to avoid variables with same name , espeically i, j, k,... function a(num, n) { for (n=1;n<=num;n++) printf ("%s", "*"); } { n=$1 a(n) print n } 33 Todo: 1.What’s the output now? echo 3 | awk –f global_var.ark Convention, list non-argument local variables last, with extra leading spaces

#!/bin/awk -f function factor (number) { factors="" ## intialize string storing the factoring result m=number; ## m: remaining part to be factored for (i=2;(m>1) && (i^2<=m);) ## try i, i start from 2, goes up to sqrt of m { ## code omitted … } if ( m>1 && factors!="" ) ## if m is not yet 1, factors = factors " * " m print number, (factors=="")? " is prime ": (" = " factors) } { factor($1);} ## call factor function to factor first field for each record Awk function 34 factoring.awk Do these: 1. Test it: echo 2013 | factoring.awk 2. Modify to return factors string, instead of print it 3. Add a function, isPrime, Hint: you can call factor() 4. For each line in inputs, count # of prime numbers in the line

35 Outlines Overview awk command line awk program model: record & field, pattern/action pair awk program elements: variable, statement Variable, Expression, Function Numeric operators String functions Array variable Function User-controlled input Input/Output Redirection External command

User-controlled Input Usually, one does not worry about reading from file You specify what to do with each line of inputs Sometimes, you want to Read next record: in order to processing current one … Read different files: Dictionary files versus text files (to spell check): need to load dictionary files first … Read record from a pipeline: Use getline 36

User-controlled Input 37

Usage of getline Interact awk $ awk 'BEGIN {print "Hi:"; getline answer; print "You said: ", answer;}' Hi: Yes? You said: Yes? To load dictionary: nwords=1 while ((getline words[nwords] 0) nwords++; To set current time into a variable “date” | getline now close(“date”) print “time is now: “ now 38

Output redirection: to files #!/bin/awk -f #usage: copy.awk file1 file2 … filen target=targetfile BEGIN { if (ARGC<2) { print "Usage: copy.awk files... target=target_file_name" exit } for (k=0;k<ARGC;k++) if (ARGV[k] ~ /target=/) { ## Extract target file name target_file=substr(ARGV[k],8); } printf " " > target_file close (target_file) } END {close(target_file); } ## optional, as files will be closed upon termination { print FILENAME, $0 >> target_file } 39 Access command line arguments Todo: 1.Try copy.awk out

Output redirection: to pipeline #!/bin/awk -f # demonstrate using pipeline BEGIN { FS = ":" } { # select username for users using bash if ($7 ~ "/bin/bash") print $1 >> "tmp.txt" } 40 END{ while ((getline 0) { cmd="mail -s Fellow_BASH_USER " $0 print "Hello," $0 | cmd ## send an to every bash user } close ("tmp.txt") }

Execute external command Using system function (similar to C/C++) E.g., system (“rm –f tmp”) to remove a file if (system(“rm –f tmp”)!=0) print “failed to rm tmp” A shell is started to run the command line passed as argument Inherit awk program’s standard input/output/error 41

42 Outline Overview awk command line awk program model: record & field, pattern/action pair awk program elements: variable, statement Variable, Expression, Function Numeric operators String functions Array variable Function User-controlled input Input/Output Redirection External command