Download presentation
Presentation is loading. Please wait.
1
1.5 Regular Expressions (REs)
Compiler Structures , Semester 2, 1.5 Regular Expressions (REs) Objectives what is a regular expression? give examples of REs used in the grep command
2
1. Regular Expressions A regular expression (RE or regex) is a pattern used to match against text when searching inside a file. Regexs are used everywhere in Linux: Editors: ed, ex, vi Utilities: grep, egrep, sed, and awk
3
String Regex c k s UNIX Tools rocks. UNIX Tools sucks.
regex pattern UNIX Tools rocks. text: match UNIX Tools sucks. text: match text: UNIX Tools is okay. no match
4
Multiple Matches a p p l e Scrapple from the apple.
A regex pattern can match text in more than one place. a p p l e regex pattern Scrapple from the apple. text: match 1 match 2
5
The . (dot) Regex o . For me to poop on.
The . regex pattern can be used to match any character in the text. o . regex pattern For me to poop on. text: match 1 match 2
6
The Character Class Regex
A character class [] can match any set of characters in the text. b [eor] a t regex pattern beat a brat on a boat text: match 1 match 2 match 3
7
Character Class Examples
8
Repetition Regex: * (star)
The * defines zero or more copies of the letter before it. y a * y regex pattern I got mail, yaaaaaaaaaay! text: match
9
o a * o I like the zoo. h . * o Say hello Andrew. regex pattern text:
match h . * o regex pattern Say hello Andrew. text: match
10
h . * o regex pattern Say hello to Andrew. text: match Regex are greedy – they match as much of the text as they can.
11
Anchors: ^ $ ^ b [eor] a t beat a brat on a boat b [eor] a t $
regex pattern ^ matches the beginning of the text line beat a brat on a boat text: match b [eor] a t $ text: regex pattern $ matches the end of the text line beat a brat on a boat match
12
More Anchors
13
The | (or) Regex
14
More Repetition Regexs: * + ?
15
More Regex Operations See the regular expressions "cheat-sheet" at the course website over 80 operators!!
16
2. grep “grep” uses a regex pattern to search a text file Examples:
all the lines containing a match (or matches) are printed Examples: % grep "root" test1 % grep "r..t" test1 % grep "ro*t" test1 % grep "r[a-z]*t" test1 regex pattern in "..." text filename
17
The Grep Family grep usual version egrep extended REs
| + ? don’t need backslash) fgrep only strings, i.e. is faster
18
Common “grep” Options -c Print a count of matched lines. -i
Ignore uppercase/ lowercase -l List filenames that contain matches -n Print matched lines and line numbers -s Work silently; only display error messages. -v Print lines that do not match the pattern.
19
Some Simple Examples grep searches input lines, a line at a time.
If the line contains a string that matches grep's RE (pattern), then the line is output. input lines (e.g. from a file) output matching lines (e.g. to a file) grep "RE" hello andy my name is andy my bye byhe continued
20
Examples "|" means "or" continued grep "and" grep -E "an|my"
hello andy my name is andy my bye byhe hello andy my name is andy grep -E "an|my" hello andy my name is andy my bye byhe hello andy my name is andy my bye byhe "|" means "or" continued
21
"*" means "0 or more" grep "hel*" hello andy my name is andy
my bye byhe hello andy my bye byhe "*" means "0 or more"
22
grep with \< \> begin and end of word Look for the word "north"
23
grep with a\|b a or b egrep doesn't need backslash
24
grep with \+ one or more egrep doesn't need backslash
25
grep with . any character egrep doesn't need backslash
26
grep with ^ and $ begin and end of line
27
grep with [ ] set of chars
28
Fun with a Linux Dictionary
Find the location of the words file List all the words containing "hh"
29
Look for "niether" or "neither"
Look for words with three "u"s Count the words with three "a"s
30
Complex Regex Examples
Variable names in C [a-zA-Z_][a-zA-Z_0-9]* Dollar amount with optional cents \$[0-9]+(\.[0-9][0-9])? Time of day (1[012]|[1-9]):[0-5][0-9] (am|pm) HTML headers <h1> <H1> <h2> … <[hH][1-4]>
31
3. The RE Language A RE can be defined as a pattern language (operands and operators) which matches on text strings.
32
Some Possible RE Operands
text characters (e.g. ‘a’, ‘1’, ‘(‘) the symbol e (means an empty string ‘’) in code just use "" variables, which can be assigned a RE variable = RE
33
The Basic RE Operators There are three basic operators: union ‘|’
concatenation closure *
34
Union S | T use S or T to match strings Example REs: a | b a | b | c
35
Concatenation S T Example REs:
use S followed by the T to match against strings Example REs: a b matches the string "ab" w | (a b) matches the strings "w" or "ab"
36
Closure S* Example RE: use S 0 or more times to match against strings
a* matches the strings: e, a, aa, aaa, aaaa, aaaaa, ... empty string
37
3.1. REs for C Identifiers We define two RE variables, letter and digit: letter = A | B | C | D ... Z | a | b | c | d .... z digit = 0 | 1 | 2 | 3 | 4 | 5 | | 7 | 8 | 9 id is defined using letter and digit: id = letter ( letter | digit )* continued
38
Strings matched by id include:
ab345 w h5g Strings not matched: 2 $abc ****
39
3.2. REs for Integers and Floats
We redefine digit: digit = 0|1|2|3|4|5|6|7|8|9 or digit = [1 – 9] int and float: int = {digit}+ float = {digit}+ "." {digit}+
40
Integers and floats with exponents:
number = {digit}+ ('.' {digit}+ )? ( 'E'('+'|'-')? {digit}+ )?
41
4. More on REs See RE summary on the course website:
regular_expressions_cheat_sheet.pdf I have the standard RE book: Mastering Regular Expressions Jeffrey E. F. Freidl O'Reilly & Associates continued
42
There are many websites that explain REs:
helpsheets/unix/regex.html
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.