Download presentation
Presentation is loading. Please wait.
Published byRachel McGee Modified over 9 years ago
1
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15
2
Administrivia reminder –optional homework exercises (from lecture 5) –due tomorrow (usual rules apply) –for those of you who missed one or more questions on homework 1
3
Administrivia homework 2 –out next week –requires access to Microsoft Word –or an alternative Open Office (free download, see openoffice.org)
4
Today’s Topic Regular Expressions (RE)
5
Regular Expressions (formally) equivalent to –finite state automata (FSA), and –regular grammars used in –string pattern matching typically for a single word form search text: unix (e)grep, perl, microsoft word caution: –differences in notation and implementation Regular Grammars FSA Regular Expressions
6
Regular Expressions shorthand for describing sets of strings String –sequence of zero or more characters –(typically, unbroken by spaces) Examples –aaa –john –mary45 –NT$ – (empty string)
7
Regular Expressions –shorthand string n –exactly n occurrences of string –n = 0,1,2,3,... examples –a 4 b 3 = aaaabbb –(uv) 2 = uvuv –((ab) 2 (ba) 2 ) 2 = ababbabaababbaba Note: –parentheses are used to group sequences of characters (strings)
8
Regular Expressions shorthand for describing sets of strings string + –set of one or more occurrences of string –i.e. the set {string 1, string 2, string 3,... } –Note: set is infinite examples –a + = {a, aa, aaa, aaaa, aaaaa, …} –(abc) + = {abc, abcabc, abcabcabc, …}
9
Regular Expressions shorthand for describing sets of strings string * –set of zero or more occurrences of string –i.e. the set {string 0, string 1, string 2, string 3,... } –string 0 = (the empty string) examples –a * = {, a, aa, aaa, aaaa, …} –(abc) * = {, abc, abcabc, …} Note: –a a * = a + –a {, a, aa, aaa, aaaa, …} = {a, aa, aaa, aaaa, aaaaa, …} Language = a set of strings
10
Regular Expressions Wildcard Characters matches a range of characters. (period) matches any single character examples –. + ed = set of all strings of length 3 or greater containing ed and having at least one character preceding it worked bed pre-education ed education –. * fix = set of all strings of length 3 or greater containing fix prefix infix infixed suffix fix
11
Regular Expressions Wildcard Characters matches a range of characters [characters] (list of matching characters) matches any single character in the list examples –[s,z]ation organization organisation –[a-z] any character in the range lowercase a to z Note: not uppercase –[0-9] any digit
12
Regular Expressions: grep excerpts from the manpage –The caret ^ and the dollar sign $ are metacharacters that respectively match the empty string at the beginning and end of a line. –The symbol \b matches the empty string at the edge of a word –The symbols \ respectively match the empty string at the beginning and end of a word. terminology –word unbroken sequence of digits, underscores and letters
13
Regular Expressions: grep Excerpts from the manpage –A regular expression may be followed by one of several repetition operators: ? The preceding item is optional and matched at most once. * The preceding item will be matched zero or more times. + The preceding item will be matched one or more times. {n} The preceding item is matched exactly n times {n,} The preceding item is matched n or more times. {n,m} The preceding item is matched at least n times, but not more than m times.
14
Regular Expressions: GNU grep Excerpts from the manpage concatenation –Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions. disjunction – Two regular expressions may be joined by the infix operator |; the resulting regular expression matches any string matching either subexpression.
15
Regular Expressions: Examples Regular Expression –gupp(y|ies) examples –guppy –guppies Regular Expression –beds? examples –bed –beds
16
Regular Expressions: Examples Example –\b99 matches 99 in “there are 99 bottles …” –but not in 99 in “there are 299 bottles …” –Note: $99 contains two words, so \b99 will match 99 here –word unbroken sequence of digits, underscores and letters
17
Regular Expressions: Examples Example (sheeptalk) –ba! –baa! –baaa! … regular expression –baa*! –ba+!
18
Regular Expressions: Microsoft Word terminology: –wildcard search
19
Regular Expressions: Microsoft Word
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.