Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xuan Guo Chapter 3: Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, 2003.

Similar presentations


Presentation on theme: "Xuan Guo Chapter 3: Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, 2003."— Presentation transcript:

1 Xuan Guo Chapter 3: Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, 2003. CSc 3320 1

2 Xuan Guo Regular Expression CSc 3320 2 Suppose we have a 10,000 lines text file, and we want to search words from the text file. Query 1: Words in forms of “aa _ _ cc” Query 2: Words “atlanta” or “Atlanta” Query 3: Words consisting of more than three “yyy”

3 Xuan Guo Regular Expression CSc 3320 3 Query1 Query 2 Query 3 Regular Expression Engine Regular Expression Regular Expression Regular Expression Application

4 Xuan Guo Regular Expression CSc 3320 4 1. Vi 2. Sed, Awk, Grep 3. Java, C# http://www.zytrax.com/tech/web/regex.htm

5 Xuan Guo Regular Expression CSc 3320 5 Query 1: Words in forms of “aa _ _ cc” aa..cc Query 2: Words “atlanta” or “Atlanta” [aA]tlanta (atlanta|Atlanta) Query 3: Words consisting of more than three “yyy” (y){3,}

6 Xuan Guo More Example [ab] [a-z] [A-Z] [0-9] \d [^0-9] [a-z 0-9] (ae|bd) a? a+ a* (ab){3,5} (ab){3,} (ab){3} CSc 3320 6

7 Xuan Guo Other Issues CSc 3320 7 1.Anchors ^, $ 2.Metacharacters [, ], {, }, \, ^, $, ?, *, +,., (, )

8 Xuan Guo Exercise CSc 3320 8 1 Which of the following matches regexp a(ab)*a 1) abababa 2) aaba 3) aabbaa 4) aba 5) aabababa

9 Xuan Guo Exercise CSc 3320 9 2 Which of the following matches regexp ab+c? 1) abc 2) ac 3) abbb 4) bbc

10 Xuan Guo Exercise CSc 3320 10 3 Which of the following matches regexp a.[bc]+ 1) abc 2) abbbbbbbb 3) azc 4) abcbcbcbc 5) ac 6) asccbbbbcbcccc

11 Xuan Guo Exercise CSc 3320 11 4 Which of the following matches regexp (abc|xyz) 1) abc 2) xyz 3) abc|xyz

12 Xuan Guo Exercise CSc 3320 12 5 Which of the following matches regexp [a-z]+[\.\?!] 1) battle! 2) Hot 3) green 4) swamping. 5) jump up. 6) undulate? 7) is.?

13 Xuan Guo Exercise CSc 3320 13 6 Which of the following matches regexp [a-zA-Z]*[^,]= 1) Butt= 2) BotHEr,= 3) Ample 4) FIdDlE7h= 5) Brittle = 6) Other.=

14 Xuan Guo Exercise CSc 3320 14 7 Which of the following matches regexp [a-z][\.\?!]\s+[A-Z] (\s matches any space character) 1) A. B 2) c! d 3) e f 4) g. H 5) i? J 6) k L

15 Xuan Guo Exercise CSc 3320 15 8 Which of the following matches regexp (very )+(fat )?(tall|ugly) man 1) very fat man 2) fat tall man 3) very very fat ugly man 4) very very very tall man

16 Xuan Guo Exercise CSc 3320 16 9 Which of the following matches regexp ]+> 1) 2) 3) 4) <> 5)

17 Xuan Guo Answer CSc 3320 17 (1) 2, 5 (2) 1 (3) 1, 2, 3, 4, 6 (4) 1, 2 (5) 1, 4, 6 (6) 1, 5, 6 (7) 4, 5 (8) 3, 4 (9) 1, 3, 5

18 Xuan Guo Basic Regular Expression & Extended Regular Expression CSc 3320 18 Meta-characters in Basic Regular Expression ^ $. * \( \) [ ] \{ \} \ vi, grep, sed accept basic regular expression. Meta-characters in Extended Regular Expression | ^ $. * + ? ( ) [ ] { } \ egrep, grep –E, sed –E accept extended regular expression

19 Xuan Guo Grep(Global or Get Regular Expression and Print) CSc 3320 19 Filtering patterns: egrep, fgrep, grep –grep -hilnvw pattern {fileName}* –displays lines from files that match the pattern –pattern : regular expression -h : do not list file names if many files are specified -i : ignore case -l : displays list of files containing pattern -n : display line numbers -v : displays lines that do not match the pattern -w : matches only whole words only

20 Xuan Guo Grep variations CSc 3320 20 –fgrep : pattern must be fixed string –egrep : pattern can be extended regular expression -x option in fgrep: displays only lines that are exactly equal to string –extended regular expressions: + matches one or more of the single preceding character ? matches zero or one of the single preceding character | either or (ex. a* | b*)‏ () *, +, ? operate on entire subexpression not just on preceding character; ex. (ab | ba)*

21 Xuan Guo Differences CSc 3320 21 grep Search a Pattern from current directory. egrep (grep -E in linux) is extended grep where additional regular expression metacharacters have been added like +, ?, | and (). fgrep (grep -F in linux) is fixed or fast grep and behaves as grep but does not recognize any regular expression metacharacters as being special.

22 Xuan Guo CSc 3320 22 48 Dec 3BC1997 LPSX 68.00 LVX2A 138 //line 1 483Sept 5AP1996 USP 65.00 LVX2C 189 //line 2 47Oct 3ZL1998 LPSX 43.00 KVM9D 512 //line 3 219dec 2CC1999 CAD 23.00 PLV2C 68 //line 4 484nov 7PL1996 CAD 49.00 PLV2C 234 //line 5 487may 5PA1998 USP 37.00 KVM9D 644 //line 6 471May 7Zh1999 UDP 37.00 KV30D 643 // line 7 grep ”38$" exam1.dat grep "^[^48]" exam1.dat grep "[Mm]ay" exam1.dat grep "K...D" exam1.dat grep "[A-Z][A-Z][A-Z][9]D" exam1.dat grep "9\{2,3\}" exam1.dat

23 Xuan Guo Examples CSc 3320 23 grep “38$" exam1.dat grep "^[^48]" exam1.dat grep "[Mm]ay" exam1.dat grep "K...D" exam1.dat grep "[A-Z][A-Z][A-Z][9]D" exam1.dat grep "9\{2,3\}" exam1.dat

24 Xuan Guo CSV file CSc 3320 24 A CSV file consists of any number of record, separated by line breaks of some kind; each record consists of fields, separated by some other character or string, most commonly a literal comma or tab.

25 Xuan Guo CSV files CSc 3320 25 Invent.dat 1. Pen 5 20.00 2. Pencil 10 2.00 3. Rubber 3 3.50 4. Cock 2 45.50

26 Xuan Guo Pattern Scanning and Processing CSc 3320 26 awk: utility that scans one or more files and performs an action on all lines that match a particular condition The conditions and actions are specified in an awk program. awk reads a line –breaks it into fields separated by tabs/spaces –or other separators specified by -F option

27 Xuan Guo awk Command CSc 3320 27 awk program has one or more commands: awk [condition] [ \{ action \} ] where condition is one of the following: –special tokens BEGIN or END –an expression involving logical operators, relational operators, and/or regular expressions

28 Xuan Guo awk Command CSc 3320 28 awk [condition] [ \{ action \} ] action is one of the following kinds of C-like statements –if-else; while; for; break; continue –assignment statement: var=expression –print; printf; –next (skip remaining patterns on current line)‏ –exit (skips the rest of the current line)‏ –list of statements

29 Xuan Guo awk Command accessing individual fields: –$1,..., $n refer to fields 1 thru n –$0 refers to entire line built-in variable NF means number of fields % awk -F: '{ print NF, $1 }' /etc/passwd prints the number of fields and the first field in the /etc/passwd file -F: means to use : as the field separator CSc 3320 29

30 Xuan Guo awk Command BEGIN condition triggered before first line read END condition triggered after last line read FILENAME: built-in variable for name of file being processed We will use this data in following examples: CSc 3320 30

31 Xuan Guo awk Example CSc 3320 31 Serial NOProductQuantityUnit Price 1Pen520.00 2Rubber102.00 3Pencil33.50 4Cock245.50 $1$2$3$4 “invent.dat”

32 Xuan Guo awk Example CSc 3320 32 1.Print the name of each product awk ‘{print $2}’ invent.dat Pen Pencil Rubber Cock

33 Xuan Guo awk Example CSc 3320 33 2.Print the name of each product and its unit price awk ‘{print $2”>>”$4}’ invent.dat Pen>>20.00 Pencil>>2.00 Rubber>>3.50 Cock>>45.50

34 Xuan Guo awk Example CSc 3320 34 3.Print each line awk ‘{print $0}’ invent.dat 1. Pen 5 20.00 2. Pencil 10 2.00 3. Rubber 3 3.50 4. Cock 2 45.50

35 Xuan Guo awk Example CSc 3320 35 4. Print the name and unit price of the products whose quantity are greater than 5 awk ‘ $3>=5 {print $2 “>>” $4}’ invent.dat Pen>>20.00 Pencil>>2.00

36 Xuan Guo awk Example CSc 3320 36 5. Print the name and unit price of the products which contain the word “Pen” awk ‘ /Pen/ {print $2 “>>” $4}’ invent.dat Pen>>20.00 Pencil>>2.00

37 Xuan Guo awk predefined variables CSc 3320 37 VariableExample FILENAME name of file being processed Invent.dat RSNew line FSwhitespace NF number of fields 4 NR current line #

38 Xuan Guo awk Example CSc 3320 38 awk '{print FILENAME;print NR}' invent.dat invent.dat 1 invent.dat 2 invent.dat 3 invent.dat 4

39 Xuan Guo awk Example CSc 3320 39 Compute the overall value of these product 1. Pen 5 20.00 2. Pencil 10 2.00 3. Rubber 3 3.50 4. Cock 2 45.50

40 Xuan Guo awk Example CSc 3320 40 BEGIN { print "---------------------------" print "BEGIN section is only printed once.“ print "===========================" }

41 Xuan Guo awk Example CSc 3320 41 { total = $3 * $4 recno = $1 item = $2 gtotal += total printf "%d %s Rs.%f\n", recno, item, total }

42 Xuan Guo awk Example CSc 3320 42 END { print "---------------------------" printf "Total Rs. %f\n",gtotal print "END section is only printed once." print "===========================" }

43 Xuan Guo awk Example CSc 3320 43 example2 awk –f example2 invent.data

44 Xuan Guo --------------------------- BEGIN section is only printed once. =========================== 1 Pen Rs.100.000000 2 Pencil Rs.20.000000 3 Rubber Rs.10.500000 4 Cock Rs.91.000000 --------------------------- Total Rs. 221.500000 END section is only printed once. =========================== CSc 3320 44

45 Xuan Guo awk actions CSc 3320 45 Built-in functions: exp()‏, log()‏, sqrt()‏, substr() etc. If condition, for loop, while loop

46 Xuan Guo awk another example CSc 3320 46 % cat /etc/passwd nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false root:*:0:0:System Administrator:/var/root:/bin/sh... lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false

47 Xuan Guo awk Example % cat p2.awk BEGIN { print "Start of file: "} { print $1 " " $6 " " $7 } END { print "End of file", FILENAME } % awk -F: -f p2.awk /etc/passwd Start of file: nobody / /usr/bin/false root /var/root /bin/sh... lp /var/spool/cups /usr/bin/false End of file /etc/passwd CSc 3320 47

48 Xuan Guo awk Operators built-in variable NR contains current line # remember, “-F:” uses colon as separator % cat p3.awk NR > 1 && NR < 4 { print NR, $1, $6, NF } % awk -F: -f p3.awk /etc/passwd 2 root /var/root /bin/sh 7 3 daemon /var/root /usr/bin/false 7 CSc 3320 48

49 Xuan Guo awk Variables % cat p4.awk BEGIN {print "Scanning file"} { printf "line %d: %s\n", NR, $0 lineCount++; wordCount += NF; } END { printf "lines = %d, words = %d\n", lineCount, wordCount } % awk -f p4.awk /etc/passwd Scanning file line 1: nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false line 2: root:*:0:0:System Administrator:/var/root:/bin/sh... line 37: lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false lines = 37, words = 141 CSc 3320 49

50 Xuan Guo awk Control Structures % cat p5.awk { for (i = NF; i >= 1; i--)‏ printf "%s ", $i; printf "\n"; } % awk -f p5.awk /etc/passwd User:/:/usr/bin/false nobody:*:-2:-2:Unprivileged Administrator:/var/root:/bin/sh root:*:0:0:System... Services:/var/spool/cups:/usr/bin/false lp:*:26:26:Printing CSc 3320 50

51 Xuan Guo awk Condition Ranges Condition ranges: –two expressions separated by comma awk performs action on every line –from the first line that matches first expression –until line that matches second condition % awk -F: ' /nobody/,/root/ {print $0}' /etc/passwd nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false root:*:0:0:System Administrator:/var/root:/bin/sh CSc 3320 51

52 Xuan Guo awk Built-in Functions Built-in functions: –exp()‏ –log()‏ –sqrt()‏ –substr() etc. % awk -F: '{print substr($1,1,2)}' /etc/passwd no ro... lp CSc 3320 52

53 Xuan Guo Stream Editor (sed)‏ CSc 3320 53 sed –scans one or more text files –performs an edit on all lines that match a condition –actions and conditions may be stored in a file –may be specified at command line in single quotes –commands begin with an address or an addressRange or a Regular expression –does not modify the input file –writes modified file to standard output

54 Xuan Guo Sed syntax CSc 3320 54 sed -option 'general expression' [data-file] Replace words action: s/old pattern/new pattern/ Delete lines action: /pattern/d

55 Xuan Guo Sed syntax CSc 3320 55 sed -option 'general expression' [data-file] Search action: -n /pattern/p

56 Xuan Guo CSc 3320 56 ParisPS1Charles Chin01/20/8630 IndPS2Susan Green04/05/8632 SUSTPS2Lewis SUST 08/11/8523 JUSTIS1Xiao Ming11/30/849 HEBUTIS1John Main12/03/848 SUSTPS2Da Ming06/01/8635 ParisIS3Peter Webor07/05/8232 ParisPS2Ann Sreph09/28/8510 ParisIS3Margot Strong02/29/829

57 Xuan Guo Examples CSc 3320 57 Search lines that starts with HEBUT sed -n ’/^HEBUT/p' students sed ’/^HEBUT/p' students // NOT GOOD HEBUTIS1John Main12/03/848

58 Xuan Guo Examples CSc 3320 58 Replace string “SUST” with “SDUST” sed 's/SUST/SDUST/' students

59 Xuan Guo CSc 3320 59 ParisPS1Charles Chin01/20/8630 IndPS2Susan Green04/05/8632 SDUSTPS2Lewis SUST 08/11/8523 JUSTIS1Xiao Ming11/30/849 HEBUTIS1John Main12/03/848 SDUST PS2Da Ming06/01/8635 ParisIS3Peter Webor07/05/8232 ParisPS2Ann Sreph09/28/8510 ParisIS3Margot Strong02/29/829

60 Xuan Guo Examples CSc 3320 60 Replace string “SUST” with “SDUST” sed 's/SUST/SDUST/g' students

61 Xuan Guo Examples CSc 3320 61 Delete lines that contain “../../86” sed ‘/..\/..\/86/d’ students % sed 's/^/ /' file > file.new –indents each line in the file by 2 spaces % sed 's/^ *//' file > file.new –removes all leading spaces from each line of the file % sed '/a/d' file > file.new –deletes all lines containing 'a'

62 Xuan Guo Ranges by patterns CSc 3320 62 You can specify two regular expressions as the range. Assuming a "#" starts a comment, you can search for a keyword, remove all comments until you see the second keyword. In this case the two keywords are "start" and "stop:" sed '/start/,/stop/ s/#.*//' The first pattern turns on a flag that tells sed to perform the substitute command on every line. The second pattern turns off the flag. If the "start" and "stop" pattern occurs twice, the substitution is done both times. If the "stop" pattern is missing, the flag is never turned off, and the substitution will be performed on every line until the end of the file.

63 Xuan Guo Question CSc 3320 63 Does sed utility change students? How can we save the output?


Download ppt "Xuan Guo Chapter 3: Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, 2003."

Similar presentations


Ads by Google