Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang

Similar presentations


Presentation on theme: "CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang"— Presentation transcript:

1 CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang chang@cs.twsu.edu chang@cs.twsu.edu

2 Substitution sed’s strongest feature is substitution, achieved with its s (substitute) command. It has the following format: [address]s/expression1/string2/flag This is how you replace the | with a colon: $ sed ‘s/|/:/g’ emp.lst | head -2 To check whether substitution is performed, you can use the cmp command as follows: $ sed ‘s/|/:/g’ emp.lst | cmp -l - emp.lst | wc -l

3 Substitution You can perform multiple substitutions with one invocation of sed by pressing [Enter] at the end of each instruction, and then close the quote at the end: $ sed ‘s/ / /g > s/ / /g’ form.html You can compress multiple spaces as below: $ sed ‘s^ *|^|^g’ emp.lst | head -2

4 Substitution sed ‘/dirctor/s/director/member/’ emp.lst sed ‘/dirctor/s//member/’ emp.lst The above command suggests that sed ‘remembers’ the scanned pattern, and stores it in // (2 frontslashes). The // representing an empty (or null) regular expression is interpreted to mean that the search and substituted patterns are the same. This is called the remembered pattern.

5 Substitution When a pattern in the source string also occurs in the replaced string, you can use the special character & to represent it. sed ‘s/director/executive director/’ emp.lst sed ‘s/director/executive &/’ emp.lst These two commands are same. The &, known as the repeated pattern, expands to the entire source string.

6 Regular Expressions The interval regular expression (IRE) uses the escaped pair of curly braces {} with a single or a pair of numbers between them. We can use this sequence to display files which have write permission set for group: $ ls -l | grep “^.\{5\}w” The regular expression ^.\{5\}w matches five characters (.\{5\} ) at the beginning ( ^ ) of the line, followed by the pattern ( w ).

7 Regular Expressions The \{5\} signifies that the previous character (. ) has to occur five times. The. (dot) character is used to match any character. The IRE has three forms: –ch\{m\} – The metacharacter ch can occur m times. –ch\{m,n\} – ch can occur between m and n times. –ch\{m,\} – ch can occur at least m times.

8 Regular Expressions We can display the listing for those files that have the write bit set either for group or others: $ ls –l | grep “^.\{5,8\}w” To locate the people born in 1945 in the sample database, use sed as follows: $ sed –n ‘/^.\{49\}45/p’ emp.lst The tagged regular expression (TRE) uses \( and \) to enclose a pattern.

9 Regular Expressions Suppose you want to replace the words John Wayne by Wayne, John. The sed substitution instruction will then look like this: $ echo “John Wayne” | sed ‘s/\(John\) \(Wayne\)/\2, \1/’ Because the TRE remembers a grouped pattern, you can look for these repeated words like this: $ grep “\[a-z][a-z][a-z]*\) *\1” note

10 Regular Expressions These are pattern matching options used by grep, sed, and perl (Page 441): –abc : match the character string “ abc ”. –* : zero or more occurrences of previous character. –. : match any character except newline. –.* : nothing or any number of characters. –a? : match zero or one instance “ a ”. –a* : match zero or more repetitions of “ a ”.

11 Regular Expressions – [abcde] : match any character within the brackets. –[a-b] : match any character within the range a to b. –[^abcde] : match any character except those within the brackets. –[^a-b] : match any character except those in the range a to b. –^ : match beginning of line, e.g., /^#/. –^$ : lines containing nothing.

12 Regular Expressions –$ : match end of line, e.g., /money.$/. –a\{2\} : match exactly two repetitions of “ a ”. –a\{4,\} : match four or more repetitions of “ a ”. –a\{2, 4\} : match between two and four repetitions of “ a ”. –\(exp\) : expression exp for later referencing with \1, \2, etc. –a|b : match a or b.


Download ppt "CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang"

Similar presentations


Ads by Google