Presentation is loading. Please wait.

Presentation is loading. Please wait.

2000 Copyrights, Danielle S. Lahmani UNIX Tools G22.2245-001, Fall 2000 Danielle S. Lahmani Lecture 6.

Similar presentations


Presentation on theme: "2000 Copyrights, Danielle S. Lahmani UNIX Tools G22.2245-001, Fall 2000 Danielle S. Lahmani Lecture 6."— Presentation transcript:

1 2000 Copyrights, Danielle S. Lahmani UNIX Tools G22.2245-001, Fall 2000 Danielle S. Lahmani email: lahmani@cs.nyu.edu Lecture 6

2 2000 Copyrights, Danielle S. Lahmani Overview Awk SED

3 2000 Copyrights, Danielle S. Lahmani AWK developed in 1978 at Bell Labs, by Aho, Weinberger, and Kerninghan. pattern scanning and processing language programmable filter for text files

4 2000 Copyrights, Danielle S. Lahmani AWK: programming language  search a set of files for patterns,  perform specified actions upon lines or fields that contain instances of patterns. does not alter input files. process one input line at a time

5 2000 Copyrights, Danielle S. Lahmani AWK: features  convenient numeric processing  variables, general selection (based on patterns) and control flow in the actions.  convenient way of accessing fields within lines.

6 2000 Copyrights, Danielle S. Lahmani AWK: usage Usage: awk 'program' [filename]* awk -f cmdfile [filename]* ( ‘program’ single quote to suppress parameter substitution) program or cmdfile contain a set of statements of the form: pattern {action} …

7 2000 Copyrights, Danielle S. Lahmani AWK: Examples prints the third and second columns of a table in that order { print $3 $2} print all lines in which the first field is different from the previous first field –$1 !=prev { print; prev = $1 }

8 2000 Copyrights, Danielle S. Lahmani AWK: patterns  selector that determines whether action is to be executed  pattern can be:  the special token BEGIN or END  regular expressions  arithmetic relation operators  string-valued expressions  arbitrary combination of the above

9 2000 Copyrights, Danielle S. Lahmani BEGIN and END patterns BEGIN and END provide a way to gain control before and after processing, for initialization and wrap-up. BEGIN: actions are performed before the first input line is read. END: actions are done after the last input line has been processed.

10 2000 Copyrights, Danielle S. Lahmani AWK: actions  action may include a list of one or more C like statements, as well as arithmetic and string expressions and assignments and multiple output streams.  action is performed on every line that matches pattern.  If pattern is not provided, action is performed on every input line

11 2000 Copyrights, Danielle S. Lahmani AWK: actions (continued)  If action is not provided, all matching lines are sent to standard output.  Since patterns and actions are optional, actions must be enclosed in braces to distinguish them from pattern.

12 2000 Copyrights, Danielle S. Lahmani AWK: RECORDS newline: Default record separator So, by default, AWK processes its input a line at a time. NR is the variable whose value is the number of the current record. RS: record separator

13 2000 Copyrights, Danielle S. Lahmani AWK: FIELDS Each input line is split into fields. FS: field separator: default is blanks or tabs -Fc option sets FS to the character c $0 is the entire line $1 is the first field, $2 is the second field, …. $NF NF is a built-in variable whose value is set to the number of fields. Only fields begin with $, variables are unadorned

14 2000 Copyrights, Danielle S. Lahmani Printing: print and printf (for formatted output) the following prints the first two fields in reverse order: print $2, $1 The following numbers all the lines: $awk '{ print NR, $0 }' Output may be diverted to multiple files (maximum 10 output files) { print $1 > "foo1" ; print $2 > "foo2" }

15 2000 Copyrights, Danielle S. Lahmani Built-in functions include: "length" function to compute length of a string { print length, $0} substr(s, m, n) produces the substring of s that begins at position m and is at most n characters long.

16 2000 Copyrights, Danielle S. Lahmani Arithmetic and variables AWK variables take on numeric (floating point) or string values according to context. User-defined variables are unadorned they need not be declared. By default, user-defined variables are initialized to the null string which has numerical value zero.

17 2000 Copyrights, Danielle S. Lahmani Flow of control statements: Supports most of the standard control structures of C This program looks for pairs of identical adjacent words NF > 0 { If ( $1 == lastword) Print "double:", $1, "Line:", NR for ( i = 2; i <= NF; i++) { If ( $i == $(i-1)) { print "Double:", $i, "Line:", NR} } lastword = $NF }

18 2000 Copyrights, Danielle S. Lahmani Arrays and associative arrays Array elements are not declared. Subscripts may have any non-null value, including non-numeric strings

19 2000 Copyrights, Danielle S. Lahmani SED: Stream-oriented, Non- Interactive, Text Editor Typical Usage: –edit files too large for interactive editing –edit any size files where editing sequence is too complicated to type in interactive mode –perform “multiple global” editing functions efficiently in one pass through the input –edit multiples files automatically –good tool for writing conversion programs

20 2000 Copyrights, Danielle S. Lahmani SED Usage sed ‘list of ed commands’ filenames…. Reads on line at a time from input file applies the commands from list in order to each line writes its edited form on standard output

21 2000 Copyrights, Danielle S. Lahmani SED Usage sed [-n] -e ‘command’ [file]* sed [-n] -f scriptfile [file]* - n suppresses default output (except for lines specified with the p command, or pflag of the s (substitute) command.

22 2000 Copyrights, Danielle S. Lahmani SED: Overall Operation References: Unix In a Nutshell (o’reilly) input file is unchanged processes one line at the time copies standard input to standard output, perhaps performing one or more editing commands on each input line

23 2000 Copyrights, Danielle S. Lahmani SED: pattern and hold spaces pattern space: workspace or temporary buffer where a single line of input (with N command, multi-line) is held while the editing commands are applied hold space: secondary temporary buffer for temporary storage only (see discussion later)

24 2000 Copyrights, Danielle S. Lahmani SED: conceptual overview  Each line of input is copied into a pattern space (range of pattern matches)  Before any editing is done, all editing commands are compiled into a form to be more efficient during the execution phase.  All editing commands in a sed script are applied in order to each input line.

25 2000 Copyrights, Danielle S. Lahmani SED: conceptual overview (cont’)  If a command changes the input, subsequent command address will be applied to the current line in the pattern space, not the original input line.  The original input file is unchanged (editing commands modify a copy of the input file). The copy is sent to standard output. (but can be redirected to a file) Editing commands are applied to all lines (globally) unless line addressing restricts the lines affected

26 2000 Copyrights, Danielle S. Lahmani SED: GENERAL FORMAT OF AN EDITING COMMAND [address1, address2] [function] [arguments] addresses selecting lines for editing by –line numbers: (decimal integers) –context addresses (using regular expressions)

27 2000 Copyrights, Danielle S. Lahmani SED: REGULAR EXPRESSIONS c: ordinary character, matches that character ^ matches the beginning of the line $ matches the end of the line '\n' matches an embedded newline character, nut not the newline at the end of a pattern space.. period matches any single character, but not newline r* matches any number (zero or more) of the regular expression preceding it.

28 2000 Copyrights, Danielle S. Lahmani SED: Regular Expressions (cont’) […] matches any character in the … [^…] matches any character not in … r1r2 matches the concatenation of r1r2 \(..\) is a tagged regular expression '\d' means the same string of characters matched by an expression enclosed in '\(' and '\)' earlier in the same pattern; d is a single digit // null regular expression is equivalent to the last regular expression compiled.

29 2000 Copyrights, Danielle S. Lahmani Sed: examples $ print last line of last input file 1 print first line of first input file /pattern/print lines containing pattern

30 2000 Copyrights, Danielle S. Lahmani Sed: pattern addressing If the command hasthen the command is applied to No addresseach input line One addressall lines that match the address.Some commands accept only one Address: a, i, r, q and = Two comma separated first matching line and all addressessucceeding lines up to and including a line matching the second address. address followed by ! all lines that do not match the address

31 2000 Copyrights, Danielle S. Lahmani SED: number of addresses (cont’) Braces {} are used to apply multiple commands to one address or address pair [/pattern1/][,/pattern2/] { command1 command2 } (give examples )

32 2000 Copyrights, Danielle S. Lahmani SED: Whole line oriented functions DELETEd APPENDa CHANGEc SUBSTITUTEs INSERTi n

33 2000 Copyrights, Danielle S. Lahmani SED: Whole line oriented functions DELETE: [address1][,address2]d delete the addressed line(s) from the pattern space; line(s) not passed to standard output. A new line of input is read and editing resumes with the first command of the script.

34 2000 Copyrights, Danielle S. Lahmani SED: whole line functions: APPEND [address]a\ append text after each line matched by address text is not available in the pattern space subsequent commands cannot be applied to it( no change in line-number counter)

35 2000 Copyrights, Danielle S. Lahmani SED: whole line functions INSERT: [address]i \ insert text before each line matched by address. Same as function a for text treatment.

36 2000 Copyrights, Danielle S. Lahmani SED:Whole line functions (cont') CHANGE: [address1][,address2]c\ replace the lines selected by the address with text. Contents of pattern space are deleted no subsequent editing can be applied to it or to.

37 2000 Copyrights, Danielle S. Lahmani SED: Whole line functions n read next input line in pattern space, replacing current line. Current line is written to output if it should be. Control passes to the command following n instead of resuming at the top of the script.

38 2000 Copyrights, Danielle S. Lahmani SED:s: Substitute function [address]s substitute replacement for pattern on each addressed line. [address] can be 0, 1, or 2 addresses.

39 2000 Copyrights, Danielle S. Lahmani SED:s: substitute command that modify the substitution can be : n: number (1 to 512) replacement for only the nth occurrence of pattern. g: replace all instances of on each addressed line, not just the first instance. p:print the pattern space if successful replacement was done w file: write pattern space to file if a successful replacement was done. A maximum of 10 different files can be opened.

40 2000 Copyrights, Danielle S. Lahmani SED: SUBSTITUTE FUNCTION (cont' ) is a string of characters, may contain special metacharacters: &replaced by the string matched by \d matches the dth substring (d is a single digit) previously specified in enclosed by '\(' and '\)'. (give examples here)

41 2000 Copyrights, Danielle S. Lahmani SED:Input-output functions p print w write input lines to filename r read another file's contents into the input q quit the sed script (no further output)

42 2000 Copyrights, Danielle S. Lahmani SED Line information = display the line number of a line l display control characters in ascii p display the line

43 2000 Copyrights, Danielle S. Lahmani Flow of control functions ! don't { grouping b branch to label or at end of script t same as b, but branch only after substitution : label place a label branched to by t or b

44 2000 Copyrights, Danielle S. Lahmani Sed Drawbacks ( references: The Unix Programming Environment, Kernighan & Pike) hard to remember text from one line to another not possible to go backward in the file no way to do forward references like /…./+1 no facilities to manipulate numbers

45 2000 Copyrights, Danielle S. Lahmani SED: Multiple input-output functions Functions spelled out in capital letters, to deal with pattern spaces containing embedded newlines, to provide pattern matches across lines in the input. N next input line is appended to the current line in the pattern space. (create embedded newline) D delete first part of the pattern space up to embedded newline P print first part of the pattern space up to embedded newline

46 2000 Copyrights, Danielle S. Lahmani Hold and get Functions h hold pattern space: –copies the contents of the pattern space into a hold area (wipe out hold area) H hold pattern space –Copies contents of pattern space into hold area ; append to what's in the hold area.

47 2000 Copyrights, Danielle S. Lahmani Hold and Get Functions (cont’) g get contents of hold area –copies contents of hold space in pattern space;destroys previous contents of pattern space. G get contents of hold area –Appends the contents of the hold area to the contents of pattern space; former and new contents are separated by a newline -x exchange contents of hold space and pattern space


Download ppt "2000 Copyrights, Danielle S. Lahmani UNIX Tools G22.2245-001, Fall 2000 Danielle S. Lahmani Lecture 6."

Similar presentations


Ads by Google