Download presentation
Presentation is loading. Please wait.
1
Regular Expression in Java 101
COMP204 Source: Sun tutorial, …
2
What are they? a way to describe patterns in strings
similar to regex in Perl cryptic syntax: “write once, ponder many times” used to search, parse, modify textual data Java: java.util.regex with classes Pattern, Matcher, and PatternSyntaxException, plus utility methods in String class
3
String constants match
regex: foo string: foo => 0:3 "foo" string: foofoofoo => 0:3 "foo” => 3:6 "foo” => 6:9 "foo"
4
Meta characters Some characters are “special”, e.g. a single dot “.” matches any character: regex: cat. string: cats => 0:4 cats Others are: ([{\^-$|]})?*+. Use meta char literally: “escape” with backslash (e.g. \.), or “quote”, e.g. \Q.\E
5
Character classes [abc] a, b, or c (simple class)
[^abc] any character except a, b, or c (negation) [a-zA-Z] a through z or A through Z, inclusive (range) [a-d[m-p]] a through d, or m through p: [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction) [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)
6
Predefined classes (see Pattern)
. Any character (may or may not match line terminators) \d digit: [0-9] \D non-digit: [^0-9] \s whitespace character: [ \t\n\x0B\f\r] \S non-whitespace character: [^\s] \w word character: [a-zA-Z_0-9] \W non-word character: [^\w]
7
Greedy Quantifiers X? X, once or not at all X* X, zero or more times
X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times
8
Reluctant quantifiers
X?? X, once or not at all X*? X, zero or more times X+? X, one or more times X{n}? X, exactly n times X{n,}? X, at least n times X{n,m}? X, at least n but not more than m times
9
Possessive Qantifiers
X?+ X, once or not at all X*+ X, zero or more times X++ X, one or more times X{n}+ X, exactly n times X{n,}+ X, at least n times X{n,m}+ X, at least n but not more than m times
10
What’s the difference // greedy quantifier regex: .*foo
string: xfooxxxxxxfoo => 0:13 "xfooxxxxxxfoo" // reluctant quantifier regex: .*?foo => 0:4 "xfoo” => 4:13 "xxxxxxfoo" // possessive quantifier regex: .*+foo No match found.
11
Capturing groups Quantifiers apply to single characters (e.g. a*, matches everything, why?), character classes (e.g. \s+) or groups (e.g. (dog){2} ) Groups are numbered left-to-right: ((A)(B(C))) => 1 ((A)(B(C))) 2 (A) 3 (B(C)) 4 (C) refer to groups with e.g. \2 for group two: regex: (\w)\1 string: hello => 2:4 “ll”
12
Boundaries ^ The beginning of a line $ The end of a line
\b A word boundary \B A non-word boundary \A The beginning of the input \G The end of the previous match \Z The end of the input but for the final terminator, if any \z The end of the input
13
Pattern class boolean b = Pattern.matches("a*b", "aaaaab"); or
Pattern p = Pattern.compile("a*b"); Matcher m = p.matcher("aaaaab"); boolean b = m.matches(); latter allows for efficient reuse
14
Splitting a string using a regex
Pattern p = Pattern.compile(“a*b”); String[] items = p.split(“aabbab”); for(String s : items) System.out.println(s); similar to split(regex) method in class String (see last slide Lecture11.ppt) String[] items = “aabbab”.split(“a*b”);
15
Matcher class loads of methods, e.g. to access groups (see test harness) or replace expressions: Pattern p = Pattern.compile(“dog”); Matcher m = p.matcher(“the dog runs”); String result = m.replaceAll(“cat”); System.out.println(result); => “the cat runs”
16
String class has one-off methods
“the dog runs”.replaceFirst(“dog”,”cat”); => “the cat runs” “aabcbdabe”.split(“a*b”); => {“”,“c”,”d”,”e”} “xfooxxxxxxfoo”.matches(“.*foo”); => true
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.