Download presentation
Presentation is loading. Please wait.
1
Advanced Regular Expressions
Or What’s special about RegEx in MX CFUNITED – The premier ColdFusion conference
2
Your Presenter Michael Dinowitz Head of House of Fusion
Publisher of Fusion Authority Founding member of Team Macromedia Doing this since June 95 Called on for the black magic code June 28th – July 1st 2006
3
Disclaimer & Introduction
If you don’t know the basics – get out No real changes from CF 5 or CFMX 6 June 28th – July 1st 2006
4
Basic additions Greedy vs. Lazy Nested sub expressions
+ is one or more and as many as it can +? Is one or more but only as many as it needs ++ Same as greedy but does not allow back references (not in CFMX) Nested sub expressions In order of execution from outside it Then left to right June 28th – July 1st 2006
5
Character Vs. Posix classes
Non-special characters become special Uses a backslash (\) to specify being special Shorter than posix classes Harder to ‘read’ for newbies June 28th – July 1st 2006
6
Basic Character Classes
\b – word boundary Any jump from alphanumeric to non-alphanumeric refindnocase('\bbig\b', 'big') \B – any 2 of the same ‘types’ of characters refindnocase('\B', 'big') = 2 June 28th – July 1st 2006
7
More Character Classes
\A - same as ^ (not combined with (?m) \Z – same as $ (not combined with (?m) \n – newline \r – carriage return \t – tab \d – any digit ([0-9]) \D – any non digit ([^0-9]) June 28th – July 1st 2006
8
More Character Classes
\w - Any alphanumeric character ([[:alnum:]] ) \W - Any non-alphanumeric character ([^[:alnum:]] ) \s - Any whitespace character including tab, space, newline, carriage return, and form feed ([\t\n\r\f ]) \S – any non-whitespace character ([^ \t\n\r\f]) June 28th – July 1st 2006
9
Expression Modifiers At beginning of expression
(?i) Causes expression to be case insensitive (same as NoCase version) (?m) Multi-line mode ^ and $ matches line, not entire string Carriage return Chr(13) is ignored as new line June 28th – July 1st 2006
10
Expression Modifiers (?x) ignores all white space
Also allows usage of ## for comments ## will comment to end of line reFind("(?x) one ##first option |two ##second option |three\ point\ five ## note escaped spaces ", "three point five") June 28th – July 1st 2006
11
Group Modifiers Affects only the group its in
Must be at beginning of group (?##) comment Must escape # (?:) does not add group to return collection (?=) Positive look ahead (?!) negative look ahead June 28th – July 1st 2006
12
Positive Lookahead Tests if the text in the parenthesis exists
Does not save the text into return collection Does not ‘consume text’ <a(?=.+href).+?href="([^"]+).+?> June 28th – July 1st 2006
13
Negative Lookahead Tests if the text in the parenthesis does not exist
Does not save the text into return collection Does not ‘consume text’ (<a(?!.+?target) [^>]+>) June 28th – July 1st 2006
14
Replace conversion Used in REReplace()/REReplaceNoCase()
Either converts the ‘next’ character or a specific section of characters \u – converts next character to uppercase \l – converts the next character to lowercase \U…\E – converts block to uppercase \L…\E – converts block to lowercase June 28th – July 1st 2006
15
Not Supported Positive Lookbehinds Negative Lookbehinds Other features
All accessible through the Java RegEx engine Massimo has a CFC pre-built to do this June 28th – July 1st 2006
16
Resources Chapters in most CFMX books CF-RegEx mailing list
This presentation Books: Mastering Regular Expressions, 2nd Edition Teach Yourself Regular Expressions in 10 Minutes Java Regular Expressions Taming the java Dot util Dot regex Engine June 28th – July 1st 2006
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.