Download presentation
Presentation is loading. Please wait.
1
ISBN 0-321-33025-0 Regular expressions http://en.wikipedia.org/wiki/Regular_expression Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve in the library) Linux grep command
2
Copyright © 2006 Addison-Wesley. All rights reserved.1-2 Language Theory Chomsky identified four classes of language –Types 2 and 3 useful for programming language specification Backus (on ALGOL58 committee) developed notation for specifying programming languages TypeCharacteristics 0Unrestricted 1Context-sensitive 2Context-free 3Regular
3
Copyright © 2006 Addison-Wesley. All rights reserved.1-3 Regular Grammars Regular grammars are grammars whose BNF rules are restricted to the form -> terminal Regular grammars can be represented by finite state automata and by regular expressions
4
Copyright © 2006 Addison-Wesley. All rights reserved.1-4 Regular Expressions First described by Stephen Kleene Used for pattern matching –Unix utilities like grep and awk –built into many scripting languages –libraries exist for other languages (Pattern and Matcher classes in Java) No standard notation Useful for describing things like identifiers and numbers for a programming language
5
Copyright © 2006 Addison-Wesley. All rights reserved.1-5 Regular Expression Components Atoms - the characters that can be combined to make the pattern being described Concatenation - a sequence of atoms Alternation - a choice between several patterns Kleene closure (*) - 0 or more occurrences Positive closure (+) - 1 or more occurrences nothing ( )
6
Copyright © 2006 Addison-Wesley. All rights reserved.1-6 Patterns and Matching a pattern is generally enclosed between a matched pair of characters, most commonly // –/pattern/ Languages that support pattern matching often have a match (and a doesn't match) operator –~ is the match operator in awk –!~ is the doesn't match operator in awk
7
Copyright © 2006 Addison-Wesley. All rights reserved.1-7 Regular Expression Metacharacters Characters that have a special meaning within a pattern.any single character \escape character ^matches beginning of string $matches end of string [ ]uses to enclose a character class ()used to group characters *0 or more occurrences +1 or more occurrences ?0 or 1 occurrences |OR
8
Copyright © 2006 Addison-Wesley. All rights reserved.1-8 Simple Examples A single character : / a/ –Matches any string that contains the letter a A sequence of characters – /ab/ matches any string that contains the letter a followed immediately by the letter b – /bird/ matches any string that contains the word bird – /Regular/ matches any string that contains the word Regular (matches are case-sensitive by default)
9
Copyright © 2006 Addison-Wesley. All rights reserved.1-9 More Examples Any character : a. –a followed by any character A choice of two characters : a | b –a b ac ab bc but not cd ef Optional repeated character : ab* –a ab abb abbbb abracadabra Optional repeated sequence : a(bc)* –a abc abcbc At least one of a sequence : ab + –ab abb abbbb abracadabra
10
Copyright © 2006 Addison-Wesley. All rights reserved.1-10 Anchors Sometimes you want to check for something at the beginning or end of a string –/^The/ matches only if the first three characters in the string are The –/tar$/ matches only if the last three characters of the string are tar –If you need to match the beginning and/or end of a word, you can add a space at the appropriate end
11
Copyright © 2006 Addison-Wesley. All rights reserved.1-11 Character Classes You can put a set of characters inside square brackets to create a character class –[abc] means any one of a b or c A ^ as the first character means any character that isn't in the set –[^abc] means any character except a b or c You can also specify ranges of characters (based on ASCII codes) –[0-9] is any digit
12
Copyright © 2006 Addison-Wesley. All rights reserved.1-12 awk and Regular Expressions awk is a language that uses patterns to determine how to process lines in a text file the format of an awk program is a sequence if statements of the form pattern {action} – pattern is a regular expression Each line is checked against the pattern; lines that match are processed
13
Copyright © 2006 Addison-Wesley. All rights reserved.1-13 String functions in AWK [g]sub( pattern, replacement[, string]) –gsub replaces globally –default string is $0 –returns number of replacements index( string, toFind) length( string) split( string, array[, fieldSep]) substr( string, start[, length])
14
Copyright © 2006 Addison-Wesley. All rights reserved.1-14 Built-in Varaibles ARGC, ARGV - command line arguments –arg of form var=value assigns value to a varaible FILENAME - name of current file FNR - record number in current file NR - number of records read so far NF - number of fields in current record FS - field separator (" " by default) RS - record separator ("\n" by default) RSTART - start of string matched
15
Copyright © 2006 Addison-Wesley. All rights reserved.1-15 Regular Expressions in perl Perl uses ~= and for matches and doesn't match perl uses \b to specify a word boundary perl has some named character classes –\d for any digit –\w for letters, digits and underscores –\s for whitespace –\D, \W, \S exclude the characters in the lower case set
16
Copyright © 2006 Addison-Wesley. All rights reserved.1-16 String Manipulation in perl split( regexp, string) tokenizes a string tr/a..z/A..Z/ transliterates characters s/regexp/replacement/ substitutes for regexp –g at end means do all occurrences /i at end of pattern means case-insensitive /s at end of pattern means match newlines –. normally only matches characters other than newlines Expression memory allows you to remember what matches parts of pattern in parentheses
17
Copyright © 2006 Addison-Wesley. All rights reserved.1-17 Regular Expressions in Java Java has classes for using regular expressions –The String class has a matches method parameter is a regular expression –The java.util.regex package has classes that can be used for pattern matching operations Pattern represents regular expressions Matcher creates an object that performs various pattern matching operations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.