Regular Expressions BKF03 Brian Ciccolo
Agenda Definition Uses – within Aspen and beyond Matching Replacing
Whats a Regular Expression? In computing, regular expressions, also referred to as regex or regexp, provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.
Why Use a Regex? Validate data entry Example: Verify the format of a date field is mm/dd/yyyy Find/replace on steroids Example: Reformat phone numbers to (###) ###-####
Regex Use in Aspen Data validation o Date, time field input o Validation rules (new in 3.0 – see session TEC07) Find/replace on steroids o System Log filter o Field formatting
RegEx Examples Using Notepad++ Select the proper Search Mode Select this option for our examples
Matching – The Basics Literals - plain old text Classes ExampleDefinition [abc]a, b, or c [a-z]Any lowercase letter [a-zA-Z]Any lowercase or uppercase letter [0-9]Any digit, 0 through 9 [^a-zA-Z]Not a letter (could be a digit or punctuation)
Matching – Predefined Classes Predefined Class Definition.Any character \dAny digit: [0-9] \DAny non-digit: [^0-9] \sA whitespace character (space, tab, newline) \SA non-whitespace character: [^\s] \wA word character: [a-zA-Z_0-9] \WA non-word character (i.e., punctuation): [^\w]
Matching – Quantifiers QuantifierDefinition ? Matches 0 or 1 time (Not supported by Notepad++) +Matches 1 or more times *Matches 0 or more times {n,m} Matches at least n times but no more than m times (Not supported by Notepad++)
Matching – Greedy vs. Lazy Quantifiers are greedy by default – they match as many characters as possible Sometimes you want to match the fewest characters possible – enter lazy quantifiers QuantifierLazy Equivalent* ??? ++? **? * Not supported by Notepad++
Replacing – Groups Groups in the regex can be used in the replacement value Delimited with parentheses in the regex Identified with \n where n is the n th group in the original expression \0 represents the entire match (not supported in Notepad++)
Reformatting Dates Change mm/dd/yyyy to yyyy-mm-dd Regex: (\d+)/(\d+)/(\d+) Replacement: \3-\1-\2 Step 2 – pad the single digits! Regex: -(\d)([-"]) Replacement: -0\1\2
Reformatting Phone Numbers (v1) Wrap the area code in parentheses Regex: "(\d\d\d)- Replacement: "(\1) Ends with a space!
Reformatting Phone Numbers (v2) Strip punctuation (numbers only) Regex: \((\d+)\) (\d+)-(\d+) Replacement: \1\2\3
Reformatting Social Security Numbers Format SSN as ###-##-#### Do it in Aspen! Define a record in the Regular Expression Library table Set the regex on the Person ID field in the Data Dictionary
Define a Regular Expression Regex and format properties
Update the Data Dictionary Link to the regex
Verify the Results
Extras Wikipedia Entry Regular Expressions Cheat Sheet (V2) Java regex support Notepad++ text editor and regex support
Thank you.