Download presentation
Presentation is loading. Please wait.
Published byFlorence Hoover Modified over 9 years ago
1
GREP
2
Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software doing regular expressions are based on grep; perl extends it further.
3
ANY Regular Expression Search String Compiles Engine parses your search string produces a state machine
4
FALSE Searches Input sent into State Machine Conceptually, 1 shape/letter at a time
5
TRUE Found: The State Machine Object changes state (in this example it is set to true) User checks machine state when it completes running
6
Grep Expressions The “grep” language for doing Regular Expressions on text processing Grep pattern is another name called “Regular Expressions”
7
Grep Expressions A string of text to match with special characters “john.*” would return True on a search of: “john was here”
8
Grep Expressions “.*\.txt”.* is anything (.) any length (*) \. is literally a. (the \ before it means the next character is literal; that is not special) txt is just letter matching This would filter out txt files Its similar to what you see in windows, but its not the same--its more powerful than simple “wildcards” (*) you often see.
9
Special Chars. = any single character ^ = beginning of a line $ = end of line \w = word & number characters \d = decimals (numbers)
10
\ = escape char Backslash \ (leans to the left) most popular escape character Uses: sneak past Illegal characters make secret code characters Data encoding always has them
11
Examples … = three of ANYTHING \d\d\d = three numbers (decimals) remember the \ is the escape code \w\w\w = three letters (no symbols) good: abc bad: a34, ab!
12
Approach searching for “john” or “joan” What is the difference between them? jo_n what symbol works? jo\wn jo.n
13
Special Chars \D = non numbers \W = non-word characters \s = white space \S = non white space \n = new line (return/enter key) \t = tab
14
\s\s\s = three whitespaces tabs, space, possibly newlines \D\s\W = non-decimal, space, non-word Examples: x 4, ! !, = 4, A 5
15
Quantity Chars * = 0 or more ? = 0 or 1 + = 1 or more [] = any of the chars in the [abc] [^] = NOT any of the chars in [] [a-zA-Z] = ranges of chars
16
Examples X+ = 1 or more X XXX [XYZ] = any of these 1 chars X, Y, Z [XYZxyz]+ = 1+ of any of these y, XYz, zYZZyX, ZZzzzzz
17
EXAMPLES [a-zA-Z0-9] = any word or number but no spaces \.?$ = maybe ends with a. remember: $ is end of line.* = 0 to ∞ of any letter [^abc]* = 0 to ∞ anything but lowercase a,b, or c
18
Problems UniCode vs ASCII Reg.Exp. language is older than UniCode Many new Engines support UniCode Minor Extensions to the language will be required for full UniCode support
19
Options RegExp Engines typically have options ignoreCase saves you from doing [Aa] for each global repeats if a match was found until the end of the input; by default: it stops at the 1st match (useful for replace)
20
Options multiline Most breakup the input into lines: At end of line, it resets for next line This would make it ignore line endings (unless you use ^ or $ which refer to the beginning and end of lines)
21
/Common Use/ /string/ similar to “quotes” on strings if you use “string” you must escape: /\d\d/ (match 2 digit pattern) vs “\\d\\d” (match 2 digit string)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.