Xuan Guo Chapter 3: Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, 2003.

Slides:



Advertisements
Similar presentations
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Advertisements

LINUX System : Lecture 3 (English-Only Lecture) Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Acknowledgement.
Regular Expressions grep
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
Grep (Global REgular expresion Print) Operation –Search a group of files –Find all lines that contain a particular regular expression pattern –Write the.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
1 CSE 303 Lecture 7 Regular expressions, egrep, and sed read Linux Pocket Guide pp , 73-74, 81 slides created by Marty Stepp
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller
Quotes: single vs. double vs. grave accent % set day = date % echo day day % echo $day date % echo '$day' $day % echo "$day" date % echo `$day` Mon Jul.
Guide To UNIX Using Linux Third Edition
UNIX Filters.
Filters using Regular Expressions grep: Searching a Pattern.
Chapter 3 UNIX Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, Original.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Unix programming Term: III B.Tech II semester Unit-II PPT Slides Text Books: (1)unix the ultimate guide by Sumitabha Das (2)Advanced programming.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
CSC 352– Unix Programming, Spring 2015 April 28 A few final commands.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
Xuan Guo Chapter 5 The Bourne Shell Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, Notes by Michael.
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
Chapter 7 UNIX Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, Original.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
BASH – Text Processing Utilities Erick, Joan © Sekolah Tinggi Teknik Surabaya 1.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Alon Efrat Computer Science Department University of Arizona Unix Tools.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Chapter 3 UNIX Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, 2003.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
FILTERS USING REGULAR EXPRESSIONS – grep and sed.
Chapter 5 The Bourne Shell Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, Notes by Michael Weeks.
CSC 352– Unix Programming, Fall 2011 November 8, 2011, Week 11, a useful subset of regular expressions, grep and sed, parts of Chapter 11.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Chapter 3 UNIX Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, Original.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
PROGRAMMING THE BASH SHELL PART III by İlker Korkmaz and Kaya Oğuz
Lesson 5-Exploring Utilities
CSC 4630 Meeting 7 February 7, 2007.
CST8177 sed The Stream Editor.
Lecture 9 Shell Programming – Command substitution
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
Folks Carelli, Instructor Kutztown University
CSC 352– Unix Programming, Spring 2016
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
Unix Talk #2 (sed).
CSCI The UNIX System Regular Expressions
Review.
Presentation transcript:

Xuan Guo Chapter 3: Utilities for Power Users Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, CSc

Xuan Guo Regular Expression CSc Suppose we have a 10,000 lines text file, and we want to search words from the text file. Query 1: Words in forms of “aa _ _ cc” Query 2: Words “atlanta” or “Atlanta” Query 3: Words consisting of more than three “yyy”

Xuan Guo Regular Expression CSc Query1 Query 2 Query 3 Regular Expression Engine Regular Expression Regular Expression Regular Expression Application

Xuan Guo Regular Expression CSc Vi 2. Sed, Awk, Grep 3. Java, C#

Xuan Guo Regular Expression CSc Query 1: Words in forms of “aa _ _ cc” aa..cc Query 2: Words “atlanta” or “Atlanta” [aA]tlanta (atlanta|Atlanta) Query 3: Words consisting of more than three “yyy” (y){3,}

Xuan Guo More Example [ab] [a-z] [A-Z] [0-9] \d [^0-9] [a-z 0-9] (ae|bd) a? a+ a* (ab){3,5} (ab){3,} (ab){3} CSc

Xuan Guo Other Issues CSc Anchors ^, $ 2.Metacharacters [, ], {, }, \, ^, $, ?, *, +,., (, )

Xuan Guo Exercise CSc Which of the following matches regexp a(ab)*a 1) abababa 2) aaba 3) aabbaa 4) aba 5) aabababa

Xuan Guo Exercise CSc Which of the following matches regexp ab+c? 1) abc 2) ac 3) abbb 4) bbc

Xuan Guo Exercise CSc Which of the following matches regexp a.[bc]+ 1) abc 2) abbbbbbbb 3) azc 4) abcbcbcbc 5) ac 6) asccbbbbcbcccc

Xuan Guo Exercise CSc Which of the following matches regexp (abc|xyz) 1) abc 2) xyz 3) abc|xyz

Xuan Guo Exercise CSc Which of the following matches regexp [a-z]+[\.\?!] 1) battle! 2) Hot 3) green 4) swamping. 5) jump up. 6) undulate? 7) is.?

Xuan Guo Exercise CSc Which of the following matches regexp [a-zA-Z]*[^,]= 1) Butt= 2) BotHEr,= 3) Ample 4) FIdDlE7h= 5) Brittle = 6) Other.=

Xuan Guo Exercise CSc Which of the following matches regexp [a-z][\.\?!]\s+[A-Z] (\s matches any space character) 1) A. B 2) c! d 3) e f 4) g. H 5) i? J 6) k L

Xuan Guo Exercise CSc Which of the following matches regexp (very )+(fat )?(tall|ugly) man 1) very fat man 2) fat tall man 3) very very fat ugly man 4) very very very tall man

Xuan Guo Exercise CSc Which of the following matches regexp ]+> 1) 2) 3) 4) <> 5)

Xuan Guo Answer CSc (1) 2, 5 (2) 1 (3) 1, 2, 3, 4, 6 (4) 1, 2 (5) 1, 4, 6 (6) 1, 5, 6 (7) 4, 5 (8) 3, 4 (9) 1, 3, 5

Xuan Guo Basic Regular Expression & Extended Regular Expression CSc Meta-characters in Basic Regular Expression ^ $. * \( \) [ ] \{ \} \ vi, grep, sed accept basic regular expression. Meta-characters in Extended Regular Expression | ^ $. * + ? ( ) [ ] { } \ egrep, grep –E, sed –E accept extended regular expression

Xuan Guo Grep(Global or Get Regular Expression and Print) CSc Filtering patterns: egrep, fgrep, grep –grep -hilnvw pattern {fileName}* –displays lines from files that match the pattern –pattern : regular expression -h : do not list file names if many files are specified -i : ignore case -l : displays list of files containing pattern -n : display line numbers -v : displays lines that do not match the pattern -w : matches only whole words only

Xuan Guo Grep variations CSc –fgrep : pattern must be fixed string –egrep : pattern can be extended regular expression -x option in fgrep: displays only lines that are exactly equal to string –extended regular expressions: + matches one or more of the single preceding character ? matches zero or one of the single preceding character | either or (ex. a* | b*)‏ () *, +, ? operate on entire subexpression not just on preceding character; ex. (ab | ba)*

Xuan Guo Differences CSc grep Search a Pattern from current directory. egrep (grep -E in linux) is extended grep where additional regular expression metacharacters have been added like +, ?, | and (). fgrep (grep -F in linux) is fixed or fast grep and behaves as grep but does not recognize any regular expression metacharacters as being special.

Xuan Guo CSc Dec 3BC1997 LPSX LVX2A 138 //line 1 483Sept 5AP1996 USP LVX2C 189 //line 2 47Oct 3ZL1998 LPSX KVM9D 512 //line 3 219dec 2CC1999 CAD PLV2C 68 //line 4 484nov 7PL1996 CAD PLV2C 234 //line 5 487may 5PA1998 USP KVM9D 644 //line 6 471May 7Zh1999 UDP KV30D 643 // line 7 grep ”38$" exam1.dat grep "^[^48]" exam1.dat grep "[Mm]ay" exam1.dat grep "K...D" exam1.dat grep "[A-Z][A-Z][A-Z][9]D" exam1.dat grep "9\{2,3\}" exam1.dat

Xuan Guo Examples CSc grep “38$" exam1.dat grep "^[^48]" exam1.dat grep "[Mm]ay" exam1.dat grep "K...D" exam1.dat grep "[A-Z][A-Z][A-Z][9]D" exam1.dat grep "9\{2,3\}" exam1.dat

Xuan Guo CSV file CSc A CSV file consists of any number of record, separated by line breaks of some kind; each record consists of fields, separated by some other character or string, most commonly a literal comma or tab.

Xuan Guo CSV files CSc Invent.dat 1. Pen Pencil Rubber Cock

Xuan Guo Pattern Scanning and Processing CSc awk: utility that scans one or more files and performs an action on all lines that match a particular condition The conditions and actions are specified in an awk program. awk reads a line –breaks it into fields separated by tabs/spaces –or other separators specified by -F option

Xuan Guo awk Command CSc awk program has one or more commands: awk [condition] [ \{ action \} ] where condition is one of the following: –special tokens BEGIN or END –an expression involving logical operators, relational operators, and/or regular expressions

Xuan Guo awk Command CSc awk [condition] [ \{ action \} ] action is one of the following kinds of C-like statements –if-else; while; for; break; continue –assignment statement: var=expression –print; printf; –next (skip remaining patterns on current line)‏ –exit (skips the rest of the current line)‏ –list of statements

Xuan Guo awk Command accessing individual fields: –$1,..., $n refer to fields 1 thru n –$0 refers to entire line built-in variable NF means number of fields % awk -F: '{ print NF, $1 }' /etc/passwd prints the number of fields and the first field in the /etc/passwd file -F: means to use : as the field separator CSc

Xuan Guo awk Command BEGIN condition triggered before first line read END condition triggered after last line read FILENAME: built-in variable for name of file being processed We will use this data in following examples: CSc

Xuan Guo awk Example CSc Serial NOProductQuantityUnit Price 1Pen Rubber Pencil Cock $1$2$3$4 “invent.dat”

Xuan Guo awk Example CSc Print the name of each product awk ‘{print $2}’ invent.dat Pen Pencil Rubber Cock

Xuan Guo awk Example CSc Print the name of each product and its unit price awk ‘{print $2”>>”$4}’ invent.dat Pen>>20.00 Pencil>>2.00 Rubber>>3.50 Cock>>45.50

Xuan Guo awk Example CSc Print each line awk ‘{print $0}’ invent.dat 1. Pen Pencil Rubber Cock

Xuan Guo awk Example CSc Print the name and unit price of the products whose quantity are greater than 5 awk ‘ $3>=5 {print $2 “>>” $4}’ invent.dat Pen>>20.00 Pencil>>2.00

Xuan Guo awk Example CSc Print the name and unit price of the products which contain the word “Pen” awk ‘ /Pen/ {print $2 “>>” $4}’ invent.dat Pen>>20.00 Pencil>>2.00

Xuan Guo awk predefined variables CSc VariableExample FILENAME name of file being processed Invent.dat RSNew line FSwhitespace NF number of fields 4 NR current line #

Xuan Guo awk Example CSc awk '{print FILENAME;print NR}' invent.dat invent.dat 1 invent.dat 2 invent.dat 3 invent.dat 4

Xuan Guo awk Example CSc Compute the overall value of these product 1. Pen Pencil Rubber Cock

Xuan Guo awk Example CSc BEGIN { print " " print "BEGIN section is only printed once.“ print "===========================" }

Xuan Guo awk Example CSc { total = $3 * $4 recno = $1 item = $2 gtotal += total printf "%d %s Rs.%f\n", recno, item, total }

Xuan Guo awk Example CSc END { print " " printf "Total Rs. %f\n",gtotal print "END section is only printed once." print "===========================" }

Xuan Guo awk Example CSc example2 awk –f example2 invent.data

Xuan Guo BEGIN section is only printed once. =========================== 1 Pen Rs Pencil Rs Rubber Rs Cock Rs Total Rs END section is only printed once. =========================== CSc

Xuan Guo awk actions CSc Built-in functions: exp()‏, log()‏, sqrt()‏, substr() etc. If condition, for loop, while loop

Xuan Guo awk another example CSc % cat /etc/passwd nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false root:*:0:0:System Administrator:/var/root:/bin/sh... lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false

Xuan Guo awk Example % cat p2.awk BEGIN { print "Start of file: "} { print $1 " " $6 " " $7 } END { print "End of file", FILENAME } % awk -F: -f p2.awk /etc/passwd Start of file: nobody / /usr/bin/false root /var/root /bin/sh... lp /var/spool/cups /usr/bin/false End of file /etc/passwd CSc

Xuan Guo awk Operators built-in variable NR contains current line # remember, “-F:” uses colon as separator % cat p3.awk NR > 1 && NR < 4 { print NR, $1, $6, NF } % awk -F: -f p3.awk /etc/passwd 2 root /var/root /bin/sh 7 3 daemon /var/root /usr/bin/false 7 CSc

Xuan Guo awk Variables % cat p4.awk BEGIN {print "Scanning file"} { printf "line %d: %s\n", NR, $0 lineCount++; wordCount += NF; } END { printf "lines = %d, words = %d\n", lineCount, wordCount } % awk -f p4.awk /etc/passwd Scanning file line 1: nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false line 2: root:*:0:0:System Administrator:/var/root:/bin/sh... line 37: lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false lines = 37, words = 141 CSc

Xuan Guo awk Control Structures % cat p5.awk { for (i = NF; i >= 1; i--)‏ printf "%s ", $i; printf "\n"; } % awk -f p5.awk /etc/passwd User:/:/usr/bin/false nobody:*:-2:-2:Unprivileged Administrator:/var/root:/bin/sh root:*:0:0:System... Services:/var/spool/cups:/usr/bin/false lp:*:26:26:Printing CSc

Xuan Guo awk Condition Ranges Condition ranges: –two expressions separated by comma awk performs action on every line –from the first line that matches first expression –until line that matches second condition % awk -F: ' /nobody/,/root/ {print $0}' /etc/passwd nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false root:*:0:0:System Administrator:/var/root:/bin/sh CSc

Xuan Guo awk Built-in Functions Built-in functions: –exp()‏ –log()‏ –sqrt()‏ –substr() etc. % awk -F: '{print substr($1,1,2)}' /etc/passwd no ro... lp CSc

Xuan Guo Stream Editor (sed)‏ CSc sed –scans one or more text files –performs an edit on all lines that match a condition –actions and conditions may be stored in a file –may be specified at command line in single quotes –commands begin with an address or an addressRange or a Regular expression –does not modify the input file –writes modified file to standard output

Xuan Guo Sed syntax CSc sed -option 'general expression' [data-file] Replace words action: s/old pattern/new pattern/ Delete lines action: /pattern/d

Xuan Guo Sed syntax CSc sed -option 'general expression' [data-file] Search action: -n /pattern/p

Xuan Guo CSc ParisPS1Charles Chin01/20/8630 IndPS2Susan Green04/05/8632 SUSTPS2Lewis SUST 08/11/8523 JUSTIS1Xiao Ming11/30/849 HEBUTIS1John Main12/03/848 SUSTPS2Da Ming06/01/8635 ParisIS3Peter Webor07/05/8232 ParisPS2Ann Sreph09/28/8510 ParisIS3Margot Strong02/29/829

Xuan Guo Examples CSc Search lines that starts with HEBUT sed -n ’/^HEBUT/p' students sed ’/^HEBUT/p' students // NOT GOOD HEBUTIS1John Main12/03/848

Xuan Guo Examples CSc Replace string “SUST” with “SDUST” sed 's/SUST/SDUST/' students

Xuan Guo CSc ParisPS1Charles Chin01/20/8630 IndPS2Susan Green04/05/8632 SDUSTPS2Lewis SUST 08/11/8523 JUSTIS1Xiao Ming11/30/849 HEBUTIS1John Main12/03/848 SDUST PS2Da Ming06/01/8635 ParisIS3Peter Webor07/05/8232 ParisPS2Ann Sreph09/28/8510 ParisIS3Margot Strong02/29/829

Xuan Guo Examples CSc Replace string “SUST” with “SDUST” sed 's/SUST/SDUST/g' students

Xuan Guo Examples CSc Delete lines that contain “../../86” sed ‘/..\/..\/86/d’ students % sed 's/^/ /' file > file.new –indents each line in the file by 2 spaces % sed 's/^ *//' file > file.new –removes all leading spaces from each line of the file % sed '/a/d' file > file.new –deletes all lines containing 'a'

Xuan Guo Ranges by patterns CSc You can specify two regular expressions as the range. Assuming a "#" starts a comment, you can search for a keyword, remove all comments until you see the second keyword. In this case the two keywords are "start" and "stop:" sed '/start/,/stop/ s/#.*//' The first pattern turns on a flag that tells sed to perform the substitute command on every line. The second pattern turns off the flag. If the "start" and "stop" pattern occurs twice, the substitution is done both times. If the "stop" pattern is missing, the flag is never turned off, and the substitution will be performed on every line until the end of the file.

Xuan Guo Question CSc Does sed utility change students? How can we save the output?