Download presentation
Presentation is loading. Please wait.
Published byJudith Hubbard Modified over 9 years ago
1
Satisfy Your Technical Curiosity Regular Expressions Roy Osherove www.iserializable.com Methodology & Team System Expert Sela Group www.Sela.co.il The hidden power language
2
Satisfy Your Technical Curiosity Tools http://tools.osherove.com www.ISerializable.com
3
Satisfy Your Technical Curiosity The Log File
4
Satisfy Your Technical Curiosity Developer Problem– Make this log file useful Old log file from a *nix system’s entries Converted to and from various formats Searched by users Format may change Search fields can be added, removed or renamed at runtime Date CPUs |ram|cpu HH:mm:ss action user domain.machine 25/05/1998 1|00512|x86 21:49:12 [Search] Anakin Antler.Anita1 25/05/98 1|00512|x86 21:51:15 [Update] Anakin Antler.Anita1 26/05/1998 1|00256|x86 11:02:45 [Search] Darth Cydot.Uk.Gerry2k 26/05/98 1|00256|x86 11:12:49 [Update] Darth Cydot.Uk.Gerry2k 27/05/98 1|00512|x86 15:34:30 [Search] Anakin Anterl.Anita1 12/08/1998 2|01024|x86 10:14:53 [Search] Obi Monaco.Huarez
5
Satisfy Your Technical Curiosity About 15 minutes later… Done. About 45 minutes later… Home early.
6
Satisfy Your Technical Curiosity You can be home early too! Regex is easier than you think
7
Satisfy Your Technical Curiosity What are Regular Expressions? A language to describe a language using “patterns” Think SQL or XPath – for text Originated with Perl and *nix shell scripting Many variations and frameworks exist. Only one for.NET (for now) Used in most languages
8
Satisfy Your Technical Curiosity Common Regex Uses Text Validation Phones, emails, address or any format requirement Text Manipulation Transform text Text Parsing Find in files, site Scraping, data collection
9
Satisfy Your Technical Curiosity What.NET brings to the plate Full object model Extended syntax Optimization techniques in the framework
10
Satisfy Your Technical Curiosity.NET Regular Expressions Show up in several places: In the classes of the System.Text.RegularExpressions namespace Via the RegularExpressionValidator validator control (for ASP.NET) Sprinkled in dozens of other places Browser capabilities filter In the WSDL tag And many more
11
Satisfy Your Technical Curiosity Key Classes within System.Text.RegularExpressions Regex Contains the pattern and matching options Important methods: IsMatch() returns boolean Replace() returns a string Split() returns a string array … Main Use: Validation, Splitting, Replacing text
12
Satisfy Your Technical Curiosity The Process Pattern Input Regex Matches Splits Text Replace text Options
13
Satisfy Your Technical Curiosity Validation
14
Satisfy Your Technical Curiosity Syntax Match exact text as written in the pattern ‘a’ will match all ‘a’ in the text. Except for special symbols:
15
Satisfy Your Technical Curiosity Enclosing Alternatives with [] The square brackets allow you to specify a list of alternate values. Used in conjunction with the – operator, you can even specify character ranges. [Cc]Capital or lowercase c [A-Z] Any capital letter A through Z [A-Za-z]Any capital or lowercase letter [0-9]Any digit 0 through 9 [A-Za-z0-9]Any letter or digit [0-9.+-&=%]Any digit or special char listed Notice: no escape needed
16
Satisfy Your Technical Curiosity Controlling Expression Frequency with {} The {} operators allow you to control the frequency of the preceding expression. The expression takes one of these two forms: {occurrences} [A-Za-z]{3} {MinOccurrences, MaxOccurences} [A-Za-z]{1,3}
17
Satisfy Your Technical Curiosity Basic Frequency Operators ?0 or 1 *0 or more +1 or more So, 3+ Will match 3, 33, 3333 but not 45, 678.
18
Satisfy Your Technical Curiosity Wildcard Operator:.. matches any non-newline character Unless multiline mode has been turned on for the pattern Examples: A.$ would match a capital A followed by one any character. Will not match Abc A.+ would match a capital A followed by one or more non-newline characters \.htm.? would match ".htm" followed by an optional non-newline character Backslash == escape characters that have reserved meanings in regular expressions
19
Satisfy Your Technical Curiosity Convenience Expressions \d Any digit \D Any non-digit Must match something else one \s Any whitespace character (such as a space or tab) \S Any character other than a whitespace character \w Any number or letter \W Any character other than a number or letter Many more: Unicode, Hex Values, negative lookups…
20
Satisfy Your Technical Curiosity Quick Quiz! [A-Za-z]{3} 3 capital or lowercase letters Abc, abc, aBC,1bc [A-Z][a-z]{2,4} A capital letter followed by at least 2 but not more than 4 lowercase letters Abc, Acbde, abcde, ABcde \w{3,8}\.\w{3} 3 to 8 AlphaNumeric characters, followed by a dot and 3 alpha numerics Filename.txt, d0main.com, 1234.567, 34.456
21
Satisfy Your Technical Curiosity Splitting and Manipulating
22
Satisfy Your Technical Curiosity Text Manipulation
23
Satisfy Your Technical Curiosity The Spammer
24
Satisfy Your Technical Curiosity (2) Key Classes within System.Text.RegularExpressions MatchCollection - Match MatchCollection stores all the matches found GroupCollection - Group CaptureCollection - Capture Regex.Match() returns Match Regex.Matches() returns MatchCollection … Main Use: Parsing, searching, collecting data
25
Satisfy Your Technical Curiosity Simple parsing Parsing for emails
26
Satisfy Your Technical Curiosity Grouping (the coolest part)
27
Satisfy Your Technical Curiosity Grouping (pay attention!) Groups give us object models HTML File Roy@Osherove.com Create a capture hierarchy and use it in code [\w\.\-]+@ [\w\.\-]+\.\w{2,5} (?<userName>[\w\.\-]+)@(?<domain>[\w\.\-]+\.\w{2,5})
28
Satisfy Your Technical Curiosity Grouping Emails & The Regulator
29
Satisfy Your Technical Curiosity Getting back to the first problem: Make this log file useful Old log file from a *nix system’s entries Converted to and from various formats Searched by users Format may change Search fields can be added, removed or renamed at runtime Date CPUs |ram|cpu HH:mm:ss action user domain.machine 25/05/1998 1|00512|x86 21:49:12 [Search] Anakin Antler.Anita1 25/05/98 1|00512|x86 21:51:15 [Update] Anakin Antler.Anita1 26/05/1998 1|00256|x86 11:02:45 [Search] Darth Cydot.Uk.Gerry2k 26/05/98 1|00256|x86 11:12:49 [Update] Darth Cydot.Uk.Gerry2k 27/05/98 1|00512|x86 15:34:30 [Search] Anakin Anterl.Anita1 12/08/1998 2|01024|x86 10:14:53 [Search] Obi Monaco.Huarez
30
Satisfy Your Technical Curiosity How do I start? Take a sample of the log file Recognize the data pattern for each entry Use groups to get each line’s values Create a tool that uses this regex to parse a log file The tool will use the returned results to generate the log as XML Load the XML into a DataSet Allow user to print “Select” statements on the DataSet
31
Satisfy Your Technical Curiosity Parsing a log file
32
Satisfy Your Technical Curiosity Regulazy Build simple expressions by example No syntax knowledge needed Free Tools.osherove.com
33
Satisfy Your Technical Curiosity When not to use Regex When its easier and more readable to do it otherwise Not just because it’s “cool” Hard to read Steep learning curve Hard to maintain “Sometimes, when confronted with a problem, you might decide to solve it with Regular Expressions for the wrong reasons. Now you you’ve got two problems.”
34
Satisfy Your Technical Curiosity Summary Amazing parsing flexibility Good skill to have anywhere Can save you time and nerves With Power comes responsibility Weigh the pros and cons before using
35
Satisfy Your Technical Curiosity Resources The Regulator tools.osherove.com Regulazy tools.osherove.com Regexlib.com – Regex archive (http://www.regexlib.com) + Cheat Sheethttp://www.regexlib.com http://www.regular-expressions.info Roy Osherove: Royo@sela.co.il Blog: www.iserializable.com
36
Satisfy Your Technical Curiosity Thank you! Questions? Roy Osherove: Royo@sela.co.il Blog: www.iserializable.com
37
Satisfy Your Technical Curiosity
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.