Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.

Similar presentations


Presentation on theme: "Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements."— Presentation transcript:

1 Regular Expressions

2 Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements a Boolean search of “restate” would yield too many “false positives.” Regular expressions provide tremendous flexibility.

3 Getting Started Open your “RegexBuddy” program. We are going to build regular expressions to find specific text in this document using a variety of “Tokens.”

4 Specifying Literal Text Literal defined - A literal just means that the characters are to be interpreted “as is.” The application will not attempt to interpret the character. For example, suppose you where looking for the “\t” You need to tell the the application that you are looking for “\t” and not a tab space because \t typically represents a tab space

5 Specifying Literal Text Click on “Insert Token” then click on Literal Text. In the text box, type “\t” and click OK You will see “\\t” in the window regular expression window. The first “\” tells the Perl to interpret the following “\” literally.

6 Non-printable characters \t – Tab \r – Carriage return \n – Newline (UNIX/Linux) \r\n – Newline (Windows)

7 Dot and Short-Hand Character Classes. Match any character but newline (unless modified with s) Short-Hand Character Classes \w Match any word character (includes numbers and “_”). \W Match any non-word character \d Match a digit character \D Match a non-digit character \s Match a whitespace character \S Match a non-whitespace character

8 Character Class and Anchors Character Class [456] - matches 4, 5 or 6. [^456] - matches anything but 4, 5 or 6. Create an expression that matches either “Balls” or “Balks” Anchors \A – beginning of the string \z – end of the string ^ - beginning of the line $ - end of the line.

9 Alternation Alternation is essentially “OR.” | - is inserted between alternatives. Boy|Girl – matches “Boy” or “Girl”

10 Quantifiers x? Match 0 or 1 x x* Match 0 or more occurrences of x x+Match 1 or more occurrences of x (xyz)+ Match 1 or more occurrences of xyz x{m,n}Matches at least m occurrences of x up to n occurrences of x

11 Grouping and Backreferencing (string) - use for backreferencing $1 - reference to contents of first set of parentheses $2 - reference to contents of second set of parentheses. In regex toolkit Put the following in the regular expression window: (.*)\s(.*) Put the following in the “Test” window: John Smith Select Group 2 from the highlight drop-down.

12 Greediness Normally, expressions match as many characters as possible (they are greedy). $_=“ab12345AB” The regex ab[0-9]* will replace as follows: XAB We can turn off greediness by adding a “?” after the greedy character (*). The regex s/ab[0-9]*?/X will replace as follows: X12345AB

13 Substitution of subpatterns Remember using () causes Perl to remember the contents. Suppose we want to replace Fred with Freddy? Put “(Fred)” in the regular expression window Put \1dy in the replace window Put Fred Couples in the Test window

14 Look Ahead and Look Behind Allows you to check ahead or back for a particular pattern before continuing match. /PATTERN(?=pattern)/ Positive look ahead /PATTERN(?!pattern)/ Negative look ahead (?<=pattern)PATTERN/Positive look behind (?<!pattern)PATTERN/Negative look behind

15 Mode Modifiers Dot match new lines (s in Perl) Case insensitive (i in Perl) ^$ match at line breaks (m in Perl) Free-spacing (x in Perl)

16 Note on Regex Regular expressions can be used on many platforms (besides Perl). For example, there are built in Perl regular expressions from within SAS.


Download ppt "Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements."

Similar presentations


Ads by Google