1
2 Regular Expressions Regular Expressions are found in Formal Language Theory and can be used to describe a class of languages called regular languages. Regular expression also provide a convenient, compact way of expressing patterns and are particularly useful for describing the token classes (e.g., integer, float, identifier, etc.) to be recognized by the lexical analysis (scanning) phase of compilation.
3 FAs = Regular Expressions
4 Regular Expressions in Popular Applications and Languages Unix utility grep – uses regular expressions to perform text searches against files or standard input text. JavaScript, Microsoft ASP and Microsoft.NET - to perform string search and replace operations. Macromedia ColdFusion – input validation using the tag (actually processed by client- side JavaScript). MYSQL for performing database searches (e.g., SELECT * FROM table WHERE REGEXP “pattern” ).
5 Regular Expressions in Popular Applications and Languages Perl – granddaddy of regular expression implementations. Support for regular expressions is a core part of Perl. PHP – supports Perl-compatible regular expression support. Java – Perl-based regular expressions were added in version 1.4 via the java.util.regex.matcher and java.util.regex.pattern classes. Source: Ben Forta. Teach Yourself Regular Expressions in 10 minutes. SAMS, 2004.
6 egrep regular expressions
7
8 Examples To find words of two or more letters that begin and end with y: % egrep ‘^y.*y$’ /usr/dict/words Try: % egrep ‘y.*y’ /usr/dict/words For help with spelling we might type % egrep ‘^rec(ei|ie)ve$’ /usr/dict/words
9 More examples For help with a crossword puzzle we might type %egrep ‘^s..u.t..e$’ /usr/dict/words To search for all occurrences of the string F(N) in the file ch4.txt, we could type %egrep ‘F\(N\)’ ch4.txt To find two consecutive occurrences of the word the separated by one or more spaces: %egrep ‘(^| )the +the( |$)’ myfile.txt
10 Practical real-world uses for regular expressions To determine if input is valid for North American Phone Numbers which are made up of a three digit area code (may not start with 0 or 1), followed by a seven-digit number formatted as a three digit prefix followed by a hyphen and a four- digit line number. –examples: (313) (810) Other uses: to check the format of zip codes, SSNs, IP addresses, URLs, addresses.