Satisfy Your Technical Curiosity Regular Expressions Roy Osherove www.iserializable.com Methodology & Team System Expert Sela Group www.Sela.co.il The.

Slides:



Advertisements
Similar presentations
Regular Expressions BKF03 Brian Ciccolo. Agenda Definition Uses – within Aspen and beyond Matching Replacing.
Advertisements

CSCI 6962: Server-side Design and Programming Input Validation and Error Handling.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson
1 CSE 303 Lecture 7 Regular expressions, egrep, and sed read Linux Pocket Guide pp , 73-74, 81 slides created by Marty Stepp
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Tutorial 14 Working with Forms and Regular Expressions.
Scripting Languages Chapter 8 More About Regular Expressions.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Form Validation CS What is form validation?  validation: ensuring that form's values are correct  some types of validation:  preventing blank.
slides created by Marty Stepp
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Slide 6a-1 CHAPTER 6 Matching Patterns: Using Regular expressions to match patterns.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
XP Tutorial 14 New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Last Updated March 2006 Slide 1 Regular Expressions.
Tutorial 14 Working with Forms and Regular Expressions.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Regular Expressions in.NET Ashraya R. Mathur CS NET Security.
PHP Workshop ‹#› Data Manipulation & Regex. PHP Workshop ‹#› What..? Often in PHP we have to get data from files, or maybe through forms from a user.
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
Finding the needle(s) in the textual haystack
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Regular Expression (continue) and Cookies. Quick Review What letter values would be included for the following variable, which will be used for validation.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expressions.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Quiz 30 minutes 10 questions No talking, texting, collaboration, etc…
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
VBScript Session 13.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
20-753: Fundamentals of Web Programming 1 Lecture 10: Server-Side Scripting II Fundamentals of Web Programming Lecture 10: Server-Side Scripting II.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
XP Tutorial 7 New Perspectives on JavaScript, Comprehensive 1 Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Linux Administration Working with the BASH Shell.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Regular Expressions Copyright Doug Maxwell (
CS 330 Class 7 Comments on Exam Programming plan for today:
Looking for Patterns - Finding them with Regular Expressions
Pattern Matching in Strings
Data Manipulation & Regex
Lecture 25: Regular Expressions
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Presentation transcript:

Satisfy Your Technical Curiosity Regular Expressions Roy Osherove Methodology & Team System Expert Sela Group The hidden power language

Satisfy Your Technical Curiosity Tools

Satisfy Your Technical Curiosity The Log File

Satisfy Your Technical Curiosity Developer Problem– Make this log file useful Old log file from a *nix system’s entries Converted to and from various formats Searched by users Format may change Search fields can be added, removed or renamed at runtime Date CPUs |ram|cpu HH:mm:ss action user domain.machine 25/05/1998 1|00512|x86 21:49:12 [Search] Anakin Antler.Anita1 25/05/98 1|00512|x86 21:51:15 [Update] Anakin Antler.Anita1 26/05/1998 1|00256|x86 11:02:45 [Search] Darth Cydot.Uk.Gerry2k 26/05/98 1|00256|x86 11:12:49 [Update] Darth Cydot.Uk.Gerry2k 27/05/98 1|00512|x86 15:34:30 [Search] Anakin Anterl.Anita1 12/08/1998 2|01024|x86 10:14:53 [Search] Obi Monaco.Huarez

Satisfy Your Technical Curiosity About 15 minutes later… Done. About 45 minutes later… Home early.

Satisfy Your Technical Curiosity You can be home early too! Regex is easier than you think

Satisfy Your Technical Curiosity What are Regular Expressions? A language to describe a language using “patterns” Think SQL or XPath – for text Originated with Perl and *nix shell scripting Many variations and frameworks exist. Only one for.NET (for now) Used in most languages

Satisfy Your Technical Curiosity Common Regex Uses Text Validation Phones, s, address or any format requirement Text Manipulation Transform text Text Parsing Find in files, site Scraping, data collection

Satisfy Your Technical Curiosity What.NET brings to the plate Full object model Extended syntax Optimization techniques in the framework

Satisfy Your Technical Curiosity.NET Regular Expressions Show up in several places: In the classes of the System.Text.RegularExpressions namespace Via the RegularExpressionValidator validator control (for ASP.NET) Sprinkled in dozens of other places Browser capabilities filter In the WSDL tag And many more

Satisfy Your Technical Curiosity Key Classes within System.Text.RegularExpressions Regex Contains the pattern and matching options Important methods: IsMatch() returns boolean Replace() returns a string Split() returns a string array … Main Use: Validation, Splitting, Replacing text

Satisfy Your Technical Curiosity The Process Pattern Input Regex Matches Splits Text Replace text Options

Satisfy Your Technical Curiosity Validation

Satisfy Your Technical Curiosity Syntax Match exact text as written in the pattern ‘a’ will match all ‘a’ in the text. Except for special symbols:

Satisfy Your Technical Curiosity Enclosing Alternatives with [] The square brackets allow you to specify a list of alternate values. Used in conjunction with the – operator, you can even specify character ranges. [Cc]Capital or lowercase c [A-Z] Any capital letter A through Z [A-Za-z]Any capital or lowercase letter [0-9]Any digit 0 through 9 [A-Za-z0-9]Any letter or digit [0-9.+-&=%]Any digit or special char listed Notice: no escape needed

Satisfy Your Technical Curiosity Controlling Expression Frequency with {} The {} operators allow you to control the frequency of the preceding expression. The expression takes one of these two forms: {occurrences} [A-Za-z]{3} {MinOccurrences, MaxOccurences} [A-Za-z]{1,3}

Satisfy Your Technical Curiosity Basic Frequency Operators ?0 or 1 *0 or more +1 or more So, 3+ Will match 3, 33, 3333 but not 45, 678.

Satisfy Your Technical Curiosity Wildcard Operator:.. matches any non-newline character Unless multiline mode has been turned on for the pattern Examples: A.$ would match a capital A followed by one any character. Will not match Abc A.+ would match a capital A followed by one or more non-newline characters \.htm.? would match ".htm" followed by an optional non-newline character Backslash == escape characters that have reserved meanings in regular expressions

Satisfy Your Technical Curiosity Convenience Expressions \d Any digit \D Any non-digit Must match something else one \s Any whitespace character (such as a space or tab) \S Any character other than a whitespace character \w Any number or letter \W Any character other than a number or letter Many more: Unicode, Hex Values, negative lookups…

Satisfy Your Technical Curiosity Quick Quiz! [A-Za-z]{3} 3 capital or lowercase letters Abc, abc, aBC,1bc [A-Z][a-z]{2,4} A capital letter followed by at least 2 but not more than 4 lowercase letters Abc, Acbde, abcde, ABcde \w{3,8}\.\w{3} 3 to 8 AlphaNumeric characters, followed by a dot and 3 alpha numerics Filename.txt, d0main.com, ,

Satisfy Your Technical Curiosity Splitting and Manipulating

Satisfy Your Technical Curiosity Text Manipulation

Satisfy Your Technical Curiosity The Spammer

Satisfy Your Technical Curiosity (2) Key Classes within System.Text.RegularExpressions MatchCollection - Match MatchCollection stores all the matches found GroupCollection - Group CaptureCollection - Capture Regex.Match() returns Match Regex.Matches() returns MatchCollection … Main Use: Parsing, searching, collecting data

Satisfy Your Technical Curiosity Simple parsing Parsing for s

Satisfy Your Technical Curiosity Grouping (the coolest part)

Satisfy Your Technical Curiosity Grouping (pay attention!) Groups give us object models HTML File Create a capture hierarchy and use it in code [\w\.\-]+\.\w{2,5}

Satisfy Your Technical Curiosity Grouping s & The Regulator

Satisfy Your Technical Curiosity Getting back to the first problem: Make this log file useful Old log file from a *nix system’s entries Converted to and from various formats Searched by users Format may change Search fields can be added, removed or renamed at runtime Date CPUs |ram|cpu HH:mm:ss action user domain.machine 25/05/1998 1|00512|x86 21:49:12 [Search] Anakin Antler.Anita1 25/05/98 1|00512|x86 21:51:15 [Update] Anakin Antler.Anita1 26/05/1998 1|00256|x86 11:02:45 [Search] Darth Cydot.Uk.Gerry2k 26/05/98 1|00256|x86 11:12:49 [Update] Darth Cydot.Uk.Gerry2k 27/05/98 1|00512|x86 15:34:30 [Search] Anakin Anterl.Anita1 12/08/1998 2|01024|x86 10:14:53 [Search] Obi Monaco.Huarez

Satisfy Your Technical Curiosity How do I start? Take a sample of the log file Recognize the data pattern for each entry Use groups to get each line’s values Create a tool that uses this regex to parse a log file The tool will use the returned results to generate the log as XML Load the XML into a DataSet Allow user to print “Select” statements on the DataSet

Satisfy Your Technical Curiosity Parsing a log file

Satisfy Your Technical Curiosity Regulazy Build simple expressions by example No syntax knowledge needed Free Tools.osherove.com

Satisfy Your Technical Curiosity When not to use Regex When its easier and more readable to do it otherwise Not just because it’s “cool” Hard to read Steep learning curve Hard to maintain “Sometimes, when confronted with a problem, you might decide to solve it with Regular Expressions for the wrong reasons. Now you you’ve got two problems.”

Satisfy Your Technical Curiosity Summary Amazing parsing flexibility Good skill to have anywhere Can save you time and nerves With Power comes responsibility Weigh the pros and cons before using

Satisfy Your Technical Curiosity Resources The Regulator tools.osherove.com Regulazy tools.osherove.com Regexlib.com – Regex archive ( + Cheat Sheethttp:// Roy Osherove: Blog:

Satisfy Your Technical Curiosity Thank you! Questions? Roy Osherove: Blog:

Satisfy Your Technical Curiosity