Download presentation
Presentation is loading. Please wait.
1
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g. grep) use regular expressions
2
Language Theory Chomsky identified four classes of language
Programming languages are described by a context-free grammar Regular languages are somewhat simpler Regular 3 Context-free 2 Context-sensitive 1 Unrestricted Characteristics Type Copyright © 2007 All rights reserved. Addison-Wesley.
3
Regular Grammars Regular grammars are grammars whose BNF rules are restricted to the form <lhs> -> terminal <non-terminal> Regular grammars can be represented by finite state automata and by regular expressions Copyright © 2007 All rights reserved.
4
Regular Expressions First described by Stephen Kleene
Used for pattern matching Unix utilities like grep and awk built into many scripting languages (e.g. perl) libraries exist for other languages (Pattern and Matcher classes in Java) No standard notation Many languages use Perl Compatible Regular Expressions Useful for describing things like identifiers and numbers for a programming language Copyright © 2007 All rights reserved.
5
Regular Expression Components
Atoms - the characters that can be combined to make the pattern being described Concatenation - a sequence of atoms Alternation - a choice between several patterns Kleene closure (*) - 0 or more occurrences Positive closure (+) - 1 or more occurrences nothing () Copyright © 2007 All rights reserved.
6
Patterns and Matching a pattern is generally enclosed between a matched pair of characters, most commonly // /pattern/ Languages that support pattern matching may have a match operator ~=, m// Perl !~ ~ AWK No Match operator Match operator Language Copyright © 2007 All rights reserved.
7
Metacharacters Characters that have a special meaning within a pattern
OR | 0 or 1 occurrences ? 1 or more occurrences + 0 or more occurrences * used to group characters () uses to enclose a character class [ ] matches end of string $ matches beginning of string ^ escape character \ any single character . Copyright © 2007 All rights reserved.
8
Simple Examples A single character : /a/ A sequence of characters
Matches any string that contains the letter a A sequence of characters /ab/ matches any string that contains the letter a followed immediately by the letter b /bird/ matches any string that contains the word bird /Regular/ matches any string that contains the word Regular (matches are case-sensitive by default) Copyright © 2007 All rights reserved.
9
More Examples Any character : a. A choice of two characters : a | b
a followed by any character A choice of two characters : a | b a b ac ab bc but not cd ef Optional repeated character : ab* a ab abb abbbb abracadabra Optional repeated sequence : a(bc)* a abc abcbc At least one of a sequence : ab+ ab abb abbbb abracadabra Copyright © 2007 All rights reserved.
10
Anchors Sometimes you want to check for something at the beginning or end of a string /^The/ matches only if the first three characters in the string are The /tar$/ matches only if the last three characters of the string are tar If you need to match the beginning and/or end of a word, you can add a space at the appropriate end Copyright © 2007 All rights reserved.
11
Character Classes You can put a set of characters inside square brackets to create a character class [abc] means any one of a b or c A ^ as the first character means any character that isn't in the set [^abc] means any character except a b or c You can also specify ranges of characters (based on ASCII codes) [0-9] is any digit Copyright © 2007 All rights reserved.
12
Perl Compatible Regular Expressions
Use \b to specify a word boundary Named character classes \d for any digit \w for letters, digits and underscores \s for whitespace \D, \W, \S exclude the characters in the lower case set {} after a regular expression can be used to specify a number of repeats /i at end of pattern means case-insensitive /s at end of pattern means match newlines . normally only matches characters other than newlines Copyright © 2007 All rights reserved.
13
Regular Expressions for String Manipulation
split( regexp, string) tokenizes a string s/regexp/replacement/ substitutes for regexp g at end means do all occurrences Expression memory allows you to remember what matches parts of pattern in parentheses Copyright © 2007 All rights reserved.
14
Regular Expressions in Java
Java has classes for using regular expressions The String class has a matches method parameter is a regular expression The java.util.regex package has classes that can be used for pattern matching operations Pattern represents regular expressions Matcher creates an object that performs various pattern matching operations Copyright © 2007 All rights reserved.
15
Try these Give a regular expression to recognize
java identifiers integer literals a phone number with optional country code number on a license plate Can you think of any others? Copyright © 2007 All rights reserved.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.