Download presentation
Presentation is loading. Please wait.
Published byMervin Grant Modified over 9 years ago
1
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED REGULAR EXPRESSIONS Canh Le My Feb 09, 2015
2
Outline 1.Basic regular expression review 2.Greedy/Lazy 3.Word boundary 4.Back reference 5.Named group 6.Lookahead & lookbehind 7.Conditional 8.Demo: a very simple application 9.Q/A 2
3
1. Basic regular expression review Regular expression: a regular expression (abbreviated regex or regexp) and sometimes called a rational expression is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings(wikipedia). 3
4
1. Basic regular expression review Meta-character(s)Description \wMatches an alphanumeric character, including “_”. \WMatches a non-alphanumeric character, excluding “_”. \sMatches a whitespace character. \SMatches anything BUT a whitespace. \dMatches a digit. \DMatches a non-digit. ^Matches the beginning of a line or string. $Matches the end of a line or string. 4
5
1. Basic regular expression review Meta-character(s)Description.Normally matches any character except a newline. Within square brackets the dot is literal. ( )Groups a series of pattern elements to a single element. ?Matches the preceding pattern element zero or one times. *Matches the preceding pattern element zero or more times. {M,N}Denotes the minimum M and the maximum N match count. [...]Denotes a set of possible character matches. [^...]Matches every character except the ones inside brackets. |Separates alternate possibilities. 5
6
2. Greedy/Lazy All regex repetition operators are greedy. They try to match as much as possible in a string. In some case this is not desired effect. Made it lazy by append ? behind. InputPatternOptionMatch you are so greedy /gi you are so greedy /gi you are so greedy 6
7
3. Word boundary \b \b is called a “word boundary” metacharacter. It is an anchor like caret ^ or dolar sign $. This match is zero length. \b matches position that is Before the first character in the string, if the first character is a word character. After the last character in the string, if the last character is a word character. Between two characters in the string, where one is a word character and the other is not a word character. InputPatternOptionMatch This import is really importantimport/giThis import is really important \Wimport\W/giThis import is really important Import is important\Wimport\W/gi ∅ This import is really important Import is important \bimport\b/gi This import is really important Import is important 7
8
4. Back reference Back references match the same text as previously matched by a capturing group. If the string match a regex, groups[0] is the whole input string. InputPatternOptionMatch Hello back reference(.)\1/giHello back reference alert(“XSS”);.*? /gi alert(“XSS”); abc_abc abc_cba ([abc]+)_\1/gi abc_abc ∅ 8
9
5. Named group – Named reference Supported by nearly all modern regular expression engine. Long regular expressions with lots of groups and backreferences may be hard to read. Python: (?P group) and then back reference (?P=name) .NET framework: (? group) or (?'name'group) and then back reference \k or \k'name' InputPatternMatch alert(“XSS”); \w+)>.*? alert(“XSS”); 9
10
5. Named group – Named reference Supported by nearly all modern regular expression engine. Long regular expressions with lots of groups and backreferences may be hard to read. Python: (?P group) and then back reference (?P=name) .NET framework: (? group) or (?'name'group) and then back reference \k or \k'name' 10
11
6. Lookahead & lookbehind (look around) Lookaround matches characters, but then gives up the match, returning only the result: match or no match. This match is zero length. Positive and negative lookahead: (?=regex) and (?!regex) Positive and negative lookbehind: (?<=regex) and (?<!regex) E.g.: find a word that not ending with s InputPatternOptionMatch John’s\b\w+[^s]\b/giJohn’s \b\w+(?<!s)\b/giJohn’s 11
12
7. Conditional A special construct (?ifthen|else) allows you to create conditional regular expressions. If the if part evaluates to true, then the regex engine will attempt to match the then part. Otherwise, the else part is attempted instead. E.g.: Pattern ^((From|To)|Subject): ((?(2)\w+@\w+\.[a-z]+|.+)) InputMatch From: lemycanh@gmail.com Group[1]: From Group[2]: From Group[3]: lemycanh@gmail.com To: le_my_canh@yahoo.com Group[1]: To Group[2]: To Group[3]: le_my_canh@yahoo.comle_my_canh@yahoo.com Subject: Test conditional regex Group[1]: Subject Group[3]: Test conditional regex 12 0 1 2 3
13
DEMO
14
Q&A Thank you for your attention.
15
Reference http://www.regular-expressions.info http://www.regular-expressions.info http://regex101.com http://regex101.com http://www.smashingmagazine.com/2009/05/06/introduction-to-advanced-regular- expressions/ http://www.smashingmagazine.com/2009/05/06/introduction-to-advanced-regular- expressions/ http://code.tutsplus.com/tutorials/advanced-regular-expression-tips-and-techniques-- net-11011 http://code.tutsplus.com/tutorials/advanced-regular-expression-tips-and-techniques-- net-11011 15
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.