Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B.

Similar presentations


Presentation on theme: "Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B."— Presentation transcript:

1 Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B

2 Regular Expression A pattern of special characters used to match strings in a search Typically made up from special characters called metacharacters Regular expressions are used thoughout UNIX: Editors: ed, ex, vi Utilities: grep, egrep, sed, and awk CSCI The UNIX System

3 Metacharacters any non-metacharacter matches itself RE Metacharacter
. Any one character, except new line [a-z] Any one of the enclosed characters (e.g. a-z) * Zero or more of preceding character ? or \? Zero or one of the preceding characters + or \+ One or more of the preceding characters CSCI The UNIX System

4 The grep Utility “grep” command: searches for text in file(s)
Examples: % grep root mail.log % grep r..t mail.log % grep ro*t mail.log % grep ‘ro*t’ mail.log % grep ‘r[a-z]*t’ mail.log CSCI The UNIX System

5 more Metacharacters RE Metacharacter Matches… ^ beginning of line $
end of line \char Escape the meaning of char following it [^] One character not in the set \< Beginning of word anchor \> End of word anchor ( ) or \( \) Tags matched characters to be used later (max = 9) | or \| Or grouping x\{m\} Repetition of character x, m times (x,m = integer) x\{m,\} Repetition of character x, at least m times x\{m,n\} Repetition of character x between m and m times CSCI The UNIX System

6 An operator combines regular expression atoms.
An atom specifies what text is to be matched and where it is to be found. An operator combines regular expression atoms. CSCI The UNIX System

7 An atom specifies what text is to be matched and where
it is to be found. CSCI The UNIX System

8 Single-Character Atom
A single character matches itself CSCI The UNIX System

9 matches any single character except for a new
Dot Atom matches any single character except for a new line character (\n) CSCI The UNIX System

10 Class Atom matches only single character that can be any of
the characters defined in a set: Example: [ABC] matches either A, B, or C. Notes: 1) A range of characters is indicated by a dash, e.g. [A-Q] 2) Can specify characters to be excluded from the set, e.g. [^0-9] matches any character other than a number. CSCI The UNIX System

11 Example: Classes CSCI The UNIX System

12 short-hand classes [:alnum:] [:alpha:] [:upper:] [:lower:] [:digit:]
[:space:] CSCI The UNIX System

13 Anchors Anchors tell where the next character in the pattern must
be located in the text data. CSCI The UNIX System

14 Back References: \n used to retrieve saved text in one of nine buffers
can refer to the text in a saved buffer by using a back reference: ex.: \1 \2 \3 ...\9 more details on this later CSCI The UNIX System

15 Operators CSCI The UNIX System

16 Sequence Operator In a sequence operator, if a series of atoms are shown in a regular expression, there is no operator between them. CSCI The UNIX System

17 Alternation Operator: | or \|
operator (| or \| ) is used to define one or more alternatives Note: depends on version of “grep” CSCI The UNIX System

18 Repetition Operator: \{…\}
The repetition operator specifies that the atom or expression immediately before the repetition may be repeated. CSCI The UNIX System

19 Basic Repetition Forms
CSCI The UNIX System

20 Short Form Repetition Operators: * + ?
CSCI The UNIX System

21 Group Operator In the group operator, when a group of characters is
enclosed in parentheses, the next operator applies to the whole group, not only the previous characters. Note: depends on version of “grep” use \( and \) instead CSCI The UNIX System

22 Grep detail and examples
grep is family of commands grep common version egrep understands extended REs (| + ? ( ) don’t need backslash) fgrep understands only fixed strings, i.e. is faster rgrep will traverse sub-directories recursively CSCI The UNIX System

23 Commonly used “grep” options:
Print only a count of matched lines. -i Ignore uppercase and lowercase distinctions. -l List all files that contain the specified pattern. -n Print matched lines and line numbers. -s Work silently; display nothing except error messages. Useful for checking the exit status. -v Print lines that do not match the pattern. CSCI The UNIX System

24 Example: grep with pipe
% ls -l | grep '^d' drwxr-xr-x 2 krush csci Feb 8 22:12 assignments drwxr-xr-x 2 krush csci Feb 5 07:43 feb3 drwxr-xr-x 2 krush csci Feb 5 14:48 feb5 drwxr-xr-x 2 krush csci Dec 18 14:29 grades drwxr-xr-x 2 krush csci Jan 18 13:41 jan13 drwxr-xr-x 2 krush csci Jan 18 13:17 jan15 drwxr-xr-x 2 krush csci Jan 18 13:43 jan20 drwxr-xr-x 2 krush csci Jan 24 19:37 jan22 drwxr-xr-x 4 krush csci Jan 30 17:00 jan27 drwxr-xr-x 2 krush csci Jan 29 15:03 jan29 % ls -l | grep -c '^d' 10 Pipe the output of the “ls –l” command to grep and list/select only directory entries. Display the number of lines where the pattern was found. This does not mean the number of occurrences of the pattern. CSCI The UNIX System

25 Example: grep with \< \>
% cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print the line if it contains the word “north”. % grep '\<north\>' grep-datafile north NO Ann Stephens CSCI The UNIX System

26 Example: grep with a\|b
% cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print the lines that contain either the expression “NW” or the expression “EA” % grep 'NW\|EA' grep-datafile northwest NW Charles Main eastern EA TB Savage Note: egrep works with | CSCI The UNIX System

27 Example: egrep with + Note: grep works with \+ % cat grep-datafile
northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines containing one or more 3's. % egrep '3+' grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass Note: grep works with \+ CSCI The UNIX System

28 Example: egrep with RE: ?
% cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines containing a 2, followed by zero or one period, followed by a number. % egrep '2\.?[0-9]' grep-datafile southwest SW Lewis Dalsass Note: grep works with \? CSCI The UNIX System

29 Note: grep works with \( \) \+
Example: egrep with ( ) % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines containing one or more consecutive occurrences of the pattern “no”. % egrep '(no)+' grep-datafile northwest NW Charles Main northeast NE AM Main Jr north NO Ann Stephens Note: grep works with \( \) \+ CSCI The UNIX System

30 Example: egrep with (a|b)
% cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines containing the uppercase letter “S”, followed by either “h” or “u”. % egrep 'S(h|u)' grep-datafile western WE Sharon Gray southern SO Suan Chin Note: grep works with \( \) \| CSCI The UNIX System

31 Example: fgrep % cat grep-datafile northwest NW Charles Main 300000.00
western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Find all lines in the file containing the literal string “[A-Z]****[0-9]..$5.00”. All characters are treated as themselves. There are no special characters. % fgrep '[A-Z]****[0-9]..$5.00' grep-datafile Extra [A-Z]****[0-9]..$5.00 CSCI The UNIX System

32 Print all lines beginning with the letter n.
Example: Grep with ^ % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines beginning with the letter n. % grep '^n' grep-datafile northwest NW Charles Main northeast NE AM Main Jr north NO Ann Stephens CSCI The UNIX System

33 Print all lines ending with a period and exactly two zero numbers.
Example: grep with $ % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines ending with a period and exactly two zero numbers. % grep '\.00$' grep-datafile northwest NW Charles Main southeast SE Patricia Hemenway Extra [A-Z]****[0-9]..$5.00 CSCI The UNIX System

34 Example: grep with \char
% cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines containing the number 5, followed by a literal period and any single character. % grep '5\..' grep-datafile Extra [A-Z]****[0-9]..$5.00 CSCI The UNIX System

35 Print all lines beginning with either a “w” or an “e”.
Example: grep with [ ] % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines beginning with either a “w” or an “e”. % grep '^[we]' grep-datafile western WE Sharon Gray eastern EA TB Savage CSCI The UNIX System

36 Print all lines ending with a period and exactly two non-zero numbers.
Example: grep with [^] % cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines ending with a period and exactly two non-zero numbers. % grep '\.[^0][^0]$' grep-datafile western WE Sharon Gray southwest SW Lewis Dalsass eastern EA TB Savage CSCI The UNIX System

37 Example: grep with x\{m\}
% cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines where there are at least six consecutive numbers followed by a period. % grep '[0-9]\{6\}\.' grep-datafile northwest NW Charles Main southwest SW Lewis Dalsass southeast SE Patricia Hemenway eastern EA TB Savage north NO Ann Stephens central CT KRush CSCI The UNIX System

38 Example: grep with \<
% cat grep-datafile northwest NW Charles Main western WE Sharon Gray southwest SW Lewis Dalsass southern SO Suan Chin southeast SE Patricia Hemenway eastern EA TB Savage northeast NE AM Main Jr north NO Ann Stephens central CT KRush Extra [A-Z]****[0-9]..$5.00 Print all lines containing a word starting with “north”. % grep '\<north' grep-datafile northwest NW Charles Main northeast NE AM Main Jr north NO Ann Stephens CSCI The UNIX System

39 Summary regular expressions for grep family of commands
CSCI The UNIX System

40 Regular Expressions: Exact Matches
c k s regular expression UNIX Tools rocks. match UNIX Tools sucks. match UNIX Tools is okay. no match Prof. Andrzej (AJ) Bieszczad Phone:

41 Regular Expressions: Multiple Matches
A regular expression can match a string in more than one place. a p p l e regular expression Scrapple from the apple. match 1 match 2 Prof. Andrzej (AJ) Bieszczad Phone:

42 Regular Expressions: Matching Any Character
The . regular expression can be used to match any character. o . regular expression For me to poop on. match 1 match 2 Prof. Andrzej (AJ) Bieszczad Phone:

43 Regular Expressions: Alternate Character Classes
Character classes [] can be used to match any specific set of characters. b [eor] a t regular expression beat a brat on a boat match 1 match 2 match 3 Prof. Andrzej (AJ) Bieszczad Phone:

44 Regular Expressions: Negated Character Classes
Character classes can be negated with the [^] syntax. b [^eo] a t regular expression beat a brat on a boat match no match Prof. Andrzej (AJ) Bieszczad Phone:

45 Regular Expressions: Other Character Classes
Other examples of character classes: [aeiou] will match any of the characters a, e, i, o, or u [kK]orn will match korn or Korn Ranges can also be specified in character classes [1-9] is the same as [ ] [abcde] is equivalent to [a-e] You can also combine multiple ranges [abcde ] is equivalent to [a-e1-9] Note that the - character has a special meaning in a character class but only if it is used within a range [-123] would match the characters -, 1, 2, or 3 Prof. Andrzej (AJ) Bieszczad Phone:

46 Regular Expressions: Named Character Classes
Commonly used character classes can be referred to by name alpha, lower, upper, alnum, digit, punct, cntl Syntax [:name:] [a-zA-Z] [[:alpha:]] [a-zA-Z0-9] [[:alnum:]] [45a-z] [45[:lower:]] Important for portability across languages Prof. Andrzej (AJ) Bieszczad Phone:

47 Regular Expressions: Anchors
Anchors are used to match at the beginning or end of a line (or both). ^ means beginning of the line $ means end of the line regular expression ^ b [eor] a t beat a brat on a boat match regular expression b [eor] a t $ beat a brat on a boat match ^word$ ^$ Prof. Andrzej (AJ) Bieszczad Phone:

48 Regular Expression: Repetions
y a * y regular expression The * is used to define zero or more occurrences of the single regular expression preceding it. I got mail, yaaaaaaaaaay! match regular expression o a * o For me to poop on. match .* Prof. Andrzej (AJ) Bieszczad Phone:

49 Regular Expressions: Repetion Ranges, Subexpressions
Ranges can also be specified {n,m} notation can specify a range of repetitions for the immediately preceding regex {n} means exactly n occurrences {n,} means at least n occurrences {n,m} means at least n occurrences but no more than m occurrences Example: .{0,} same as .* a{2,} same as aaa* If you want to group part of an expression so that * applies to more than just the previous character, use ( ) notation Subexpresssions are treated like a single character a* matches 0 or more occurrences of a abc* matches ab, abc, abcc, abccc, … (abc)* matches abc, abcabc, abcabcabc, … (abc){2,3} matches abcabc or abcabcabc Prof. Andrzej (AJ) Bieszczad Phone:

50 Single Quoting Regex Since many of the special characters used in regexs also have special meaning to the shell, it’s a good idea to get in the habit of single quoting your regexs This will protect any special characters from being operated on by the shell If you habitually do it, you won’t have to worry about when it is necessary Even though we are single quoting our regexs so the shell won’t interpret the special characters, sometimes we still want to use an operator as itself To do this, we escape the character with a \ (backslash) Suppose we want to search for the character sequence ‘a*b*’ Unless we do something special, this will match zero or more ‘a’s followed by zero or more ‘b’s, not what we want! ‘a\*b\*’ will fix this - now the asterisks are treated as regular characters Prof. Andrzej (AJ) Bieszczad Phone:

51 Extended Regular Expressions
Regex also provides an alternation character | for matching one or another subexpression (T|Fl)an will match Tan or Flan ^(From|Subject): will match the From and Subject lines of a typical message It matches a beginning of line followed by either the characters From or Subject followed by a ‘:’ Subexpressions are used to limit the scope of the alternation At(ten|nine)tion then matches Attention or Atninetion, not Atten or ninetion as would happen without the parenthesis - Atten|ninetion Prof. Andrzej (AJ) Bieszczad Phone:

52 Extended Regular Expressions: Repetition Shorthands
The * (star) has already been seen to specify zero or more occurrences of the immediately preceding character The + (plus) means one or more abc+d will match abcd, abccd, or abccccccd but will not match ‘abd’ while abc?d will match abd and abcd but not ‘abccd’ Equivalent to {1,} The ? (question mark) specifies an optional character, the single character that immediately precedes it July? will match Jul or July Equivalent to {0,1} Also equivalent to (Jul|July) The *, ?, and + are known as quantifiers because they specify the quantity of a match Quantifiers can also be used with subexpressions (a*c)+ will match c, ac, aac or aacaacac but will not match ‘a’ or a blank line Prof. Andrzej (AJ) Bieszczad Phone:

53 Regular Expressions: Backreferences
Sometimes it is handy to be able to refer to a match that was made earlier in a regex This is done using backreferences \n is the backreference specifier, where n is a number For example, to find if the first word of a line is the same as the last: ^\([[:alpha:]]\{1,\}\).*\1$ The \([[:alpha:]]\{1,\}\) matches 1 or more letters Prof. Andrzej (AJ) Bieszczad Phone:

54 Regular Expressions: Some Practical Examples
Variable names in C [a-zA-Z_][a-zA-Z_0-9]* Dollar amount with optional cents \$[0-9]+(\.[0-9][0-9])? Time of day (1[012]|[1-9]):[0-5][0-9] (am|pm) HTML headers <h1> <H1> <h2> … <[hH][1-4]> Prof. Andrzej (AJ) Bieszczad Phone:

55 Regular Experessions: Quick Refrences
xyz Ordinary characters match themselves (NEWLINES and metacharacters excluded) Ordinary strings match themselves \m ^ $ . [xy^$x] [^xy^$z] [a-z] r* r1r2 Matches literal character m Start of line End of line Any single character Any of x, y, ^, $, or z Any one character other than x, y, ^, $, or z Any single character in given range zero or more occurrences of regex r Matches r1 followed by r2 \(r\) \n \{n,m\} Tagged regular expression, matches r Set to what matched the nth tagged expression (n = 1-9) Repetition r+ r? r1|r2 (r1|r2)r3 (r1|r2)* {n,m} One or more occurrences of r Zero or one occurrences of r Either r1 or r2 Either r1r3 or r2r3 Zero or more occurrences of r1|r2, e.g., r1, r1r1, r2r1, r1r1r2r1,…) fgrep, grep, egrep grep, egrep grep egrep Prof. Andrzej (AJ) Bieszczad Phone:

56 grep/fgrep/egrep utilities
grep [option] pattern {filename}* grep searches the named files or standard input and prints each line that contains an instance of the pattern. grep interprets the same regular expressions as ed(1) 2000 Copyrights Danielle S. Lahmani

57 grep/egrep/fgrep utilities
g/re/p global regular expression print regular expressions are specified by giving special meanings to certain characters. 2000 Copyrights Danielle S. Lahmani

58 grep/egrep/fgrep utilities
^ and $ "anchor" the pattern to the beginning (^) or end ($) of the line. regular expressions metacharacters overlap with shell metacharacters , so it's always a good idea to enclose grep patterns in single quotes, to suppress interpretation by the shell. $ grep From $MAIL locates lines containing From in your mailbox $ grep '^From' $MAIL prints lines that begin with From in your mailbox 2000 Copyrights Danielle S. Lahmani

59 grep utility [a-z] matches any lower case letter
grep supports character classes much like those in the shell. [a-z] matches any lower case letter [^….] matches any character except those in the class [^0-9] matches any non-digit A period '.' is equivalent to the shell's ? it matches any single character. 2000 Copyrights Danielle S. Lahmani

60 grep examples print lines that begin with mary
$ who | grep '^mary' print lines that end with mary $ who | grep 'mary$’ list files others can read and write  $ls -l | grep '^ rw’ 2000 Copyrights Danielle S. Lahmani

61 grep examples: grep 'abc*' matches ab followed by zero or more c's
the closure operator * applies to previous character or metacharacter ( including a character class) in expression grep 'abc*' matches ab followed by zero or more c's grep 'ab[a-z]*' matches ab followed by any number of lower case letters 2000 Copyrights Danielle S. Lahmani

62 grep utility grep '^[^:]*::' matches beginning of line, zero or more non-colons followed by a double colon. Inside the brackets, the ^ means not. no grep regular expression matches a newline; the expressions are applied to each line individually. 2000 Copyrights Danielle S. Lahmani

63 grep/fgrep/ egrep utilities
Example: grep '^[^aeiou]*a[^aeiou]*e$' foo Looks for zero or more non-vowels followed by a, followed by zero or more non vowels followed by e at the end of the line Would match dsdakjkjkje …  \ turns off meaning of special character that follows: $ grep \' foo find all ' apostrophes in file foo 2000 Copyrights Danielle S. Lahmani

64 Common grep options -n prints line numbers $ grep -n variable *.[ch] locate variable in C source -v inverts the sense of the test $ grep -v From foo print all lines that do not contain From in file foo -i makes lower case in pattern match either case in file $ grep -i mary $HOME/bin/phone-book 2000 Copyrights Danielle S. Lahmani

65 grep common options (continued)
-c prints only a count of matched lines $grep -c /bin/csh /etc/passwd -l list filenames but not matched lines $ grep -l '^#include' /usr/include/* 2000 Copyrights Danielle S. Lahmani

66 egrep is an extended grep that works on extended regular expression,accepts: + one or more occurrences ? zero or one occurrence pat1| pat2 "or" op matches on either pat1 or pat2 ( r) regular expression r , can be nested (xy)* matches any of the empty string, xy, xyxy, xyxyxy and so on 2000 Copyrights Danielle S. Lahmani

67 fgrep/ egrep utilities
fgrep searches for many literal strings simultaneously. It does not work with *. Both, fgrep and egrep have -f option, to read patterns stored in a file. In file, newlines separate patterns to be searched for simultaneously. 2000 Copyrights Danielle S. Lahmani

68 TABLE of Regular expressions for grep and egrep
c any-non special character c matches itself \c turns off any special meaning of character c ^ beginning of line $ end of line . matches any single character […] any one of characters in …; [^…] any character not in … 2000 Copyrights Danielle S. Lahmani

69 TABLE of Regular expressions for grep and egrep (continued)
r* zero or more occurrences of r r+ one or more occurrences of r (egrep only) r? zero or one occurrences of r (egrep only) r1r2 r1 followed by r2 r1| r2 r1 or r2 ( egrep only) (r) nested regular expression r (egrep only) 2000 Copyrights Danielle S. Lahmani

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87


Download ppt "Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B."

Similar presentations


Ads by Google