Presentation is loading. Please wait.

Presentation is loading. Please wait.

REGEX. Problems Have big text file, want to extract data – Phone numbers 1-503-123-1234 503-123-1234 (503) 123-1234 123-1234 503.123.1234.

Similar presentations


Presentation on theme: "REGEX. Problems Have big text file, want to extract data – Phone numbers 1-503-123-1234 503-123-1234 (503) 123-1234 123-1234 503.123.1234."— Presentation transcript:

1 REGEX

2 Problems Have big text file, want to extract data – Phone numbers 1-503-123-1234 503-123-1234 (503) 123-1234 123-1234 503.123.1234

3 Regular Expressions – Format for specifying patterns Pattern consists of – Literals – Ranges – Special values – Quantity indicators – Groupings

4 Literals Characters without special meaning are interpreted literally 1 Look for 1 123 Look for 123 12A Look for 12A

5 Ranges [ ] enclose a group of options [123] Look for 1, 2, or 3 [AB] Look for A or B 2[BC] Look for 2 followed by B or C

6 Ranges [a-b] indicates a range [0-9] Look for 0-9 [1-3] Look for 1-3 [a-zA-Z] Look for lowercase a-z or upper [0-9A-Z] digit or uppercase letter

7 Ranges [^ ] says not any of these [^123] Look for anything but 1,2,3 AA[^A] Look for 2 A's followed by anything not an A

8 Special Characters. Means any character but newline A.C Matches ABC, ADC, A_C, A+C…

9 Special Characters ^ at start means nothing can be before $ at end means nothing else after

10 Special Characters \s any whitespace – Tab, space, etc… \d any digit – Same as [0-9] \w any word character – Same as [a-zA-Z]

11 Special Characters \S anything BUT whitespace \D anything BUT digit \W anything BUT word character

12 Quantity Indicators {n} Must have n copies of whatever came before \d{5} Match 5 digits A{3}B Match 3 A's followed by a B

13 Quantity Indicators {n, m} n to m copies \d{2,5} Match 2 to 5 digits {n,} n or more copies {3,} Match any sequence of 3 or more digits

14 Quantity Indicators ? Indicates 0 or 1 + indicates 1 or more * indicates 0 or more A?B+C* could be: BBBB, AB, ABBBC, ABCCCCC, B, BCCCC,…

15 \ \ to escape chars \[ Find a [ \.Find a. \\Find a \

16 Grouping ( ) groups sequences – Apply options to whole group – Can extract each group from results

17 | | gives multiple options

18 Testing http://www.debuggex.com/ QT Creator:

19 In C++ Part of c++11 – Only partially implemented in current GCC – Available in boost xpression library


Download ppt "REGEX. Problems Have big text file, want to extract data – Phone numbers 1-503-123-1234 503-123-1234 (503) 123-1234 123-1234 503.123.1234."

Similar presentations


Ads by Google