ADSA: RegExprs/8 1 241-423 Advanced Data Structures and Algorithms Objective –look at programming with regular expressions (REs) in Java Semester 2, 2013-2014.

Slides:



Advertisements
Similar presentations
Python: Regular Expressions
Advertisements

Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
1 Regular Expressions & Automata Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
13-Jun-15 Regular Expressions in Java. 2 Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java)
Regular Expressions in Java. Namespace in XML Transparency No. 2 Regular Expressions Regular expressions are an extremely useful tool for manipulating.
Regular Expressions in Java. Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java) A regular.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
1 A Quick Introduction to Regular Expressions in Java.
Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Regular expression. Validation need a hard and very complex programming. Sometimes it looks easy but actually it is not. So there is a lot of time and.
1 Overview Regular expressions Notation Patterns Java support.
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
Science: Text and Language Dr Andy Evans. Text analysis Processing of text. Natural language processing and statistics.
1 Form Validation. Validation  Validation of form data can be cumbersome using the basic techniques  StringTokenizer  If-else statements  Most of.
9-Sep-15 Regular Expressions. About “Regular” Expressions In a theory course you should have learned about regular expressions Regular expressions describe.
©TheMcGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9 Characters and Strings.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
Regular Expressions.
Portions adapted with permission from the textbook author. CS-1020 Dr. Mark L. Hornick 1 Regular Expressions and String processing Animated Version.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Regular Expressions – An Overview Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in.
 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Module 6 – Generics Module 7 – Regular Expressions.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
Variable Variables A variable variable has as its value the name of another variable without $ prefix E.g., if we have $addr, might have a statement $tmp.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Object Oriented Programming in Java Lecture 7 (Dynamic Java and reflection. Generics in depth. Regexps) Szymon Grabowski
Java Basics Regular Expressions.  A regular expression (RE) is a pattern used to search through text.  It either matches the.
RE Tutorial.
REGULAR EXPRESSION Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar.
Looking for Patterns - Finding them with Regular Expressions
Lecture 19 Strings and Regular Expressions
CSC 594 Topics in AI – Natural Language Processing
Java Programming Course Regular Expression
Week 14 - Friday CS221.
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions in Java
Selenium WebDriver Web Test Tool Training
Regular Expressions in Java
Regular Expressions in Java
Regular Expression in Java 101
Regular Expressions in Java
Presentation transcript:

ADSA: RegExprs/ Advanced Data Structures and Algorithms Objective –look at programming with regular expressions (REs) in Java Semester 2, Regular Expressions (in Java)

ADSA: RegExprs/8 2 Contents 1. What are REs? 2. First Example 3. Case Insensitive Matching 4. Some Basic Patterns 5. Built-in Character Classes 6. Sequencies and Alternatives 7. Some Boundary Matches 8. Grouping 9. (Greedy) Quantifiers 10. Three Types of Quantifiers 11. Capturing Groups 12. Escaping Metacharacters 13. split() and REs 14. Replacing Text 15. Look-ahead & Look- behind 16. More Information

ADSA: RegExprs/ What are Regular Expressions? A regular expression (RE) is a pattern used to search through text. It either matches the text (or part of it), or fails to match –you can easily extract the matching parts, or change them continued

ADSA: RegExprs/8 4 REs are not easy to use at first –they're like a different programming language inside Java But, REs bring so much power to string manipulation that they are worth the effort. Look back at the "Discrete Math" notes on REs and UNIX grep.

ADSA: RegExprs/ First Example The RE "[a-z]+" matches a sequence of one or more lowercase letters [a-z] means any character from a to z, and + means “one or more” Use this pattern to search "Now is the time" it will match ow if applied repeatedly, it will find is, the, time, then fail

ADSA: RegExprs/8 6 Code import java.util.regex.*; public class RegexTest { public static void main(String args[]) { String pattern = "[a-z]+"; String text = "Now is the time"; Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(text); while (m.find()) System.out.println( text.substring( m.start(), m.end() ) ); } } Output: ow is the time

ADSA: RegExprs/8 7 Create a Pattern and Matcher Compile the pattern Pattern p = Pattern.compile("[a-z]+"); Create a matcher for the text using the pattern Matcher m = p.matcher("Now is the time");

ADSA: RegExprs/8 8 Finding a Match m.find() returns true if the pattern matches any part of the text string; false otherwise If called again, m.find() will start searching from where the last match was found.

ADSA: RegExprs/8 9 Printing what was Matched After a successful match: –m.start() returns the index of the first character matched –m.end() returns the index of the last character matched, plus one This is what most String methods require –e.g. "Now is the time".substring(m.start(), m.end()) returns the matched substring continued

ADSA: RegExprs/8 10 If the match fails, m.start() and m.end() throw an IllegalStateException –this is a RuntimeException, so you don’t have to catch it

ADSA: RegExprs/8 11 Test Rig public class TestRegex { public static void main(String[] args) { if (args.length != 2) { System.out.println("Usage: java TestRegex string regExp"); System.exit(0); } System.out.println("Input: \"" + args[0] + "\""); System.out.println("Regular expression: \"" + args[1] + "\""); Pattern p = Pattern.compile(args[1]); Matcher m = p.matcher(args[0]); while (m.find()) System.out.println("Match \"" + m.group() + "\" at positions "+ m.start() + "-" + (m.end()-1)); } // end of main() } // end of TestRegex class

ADSA: RegExprs/8 12 m.group() returns the string matched by the pattern –usually used instead of String.substring()

ADSA: RegExprs/8 13

ADSA: RegExprs/ Case Insensitive Matching String sentence = "The quick brown fox and BROWN tiger jumps over the lazy dog"; Pattern pattern = Pattern.compile("brown", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(sentence); while (matcher.find()) System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); Text "brown" found at 10 to 15. Text "BROWN" found at 24 to 29. a flag

ADSA: RegExprs/8 15 Many flags can also be written as part of the RE: Pattern pattern = Pattern.compile( "(?i)brown" );

ADSA: RegExprs/ Some Basic Patterns abc exactly this sequence of three letters [abc] any one of the letters a, b, or c [^abc] any character except one of the letters a, b, or c [a-z] any one character from a through z [a-zA-Z0-9] any one letter or digit The set of characters defined by [...] is called a character class.

ADSA: RegExprs/8 17 Example // search for a string that begins with "bat" and a number in the range [3-7] String input = "bat1, bat2, bat3, bat4, bat5, bat6, bat7, bat8"; Pattern pattern = Pattern.compile( "bat[3-7]" ); Matcher matcher = pattern.matcher(input); while (matcher.find()) System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); Text "bat3" found at 12 to 16. Text "bat4" found at 18 to 22. Text "bat5" found at 24 to 28. Text "bat6" found at 30 to 34. Text "bat7" found at 36 to 40.

ADSA: RegExprs/ Built-in Character Classes. any one character except a line terminator \d a digit: [0-9] \D a non-digit: [^0-9] \s a whitespace character: [ \t\n\x0B\f\r] \S a non-whitespace character: [^\s] \w a word character: [a-zA-Z_0-9] \W a non-word character: [^\w] Notice the space continued

ADSA: RegExprs/8 19 In Java you will need to "double escape" the RE backslashes: \\d\\D\\S\\s\\W\\w when you use them inside Java strings Note: if you read in a pattern from somewhere (the keyboard, a file), there's no need to double escape the text.

ADSA: RegExprs/8 20 Example 1 // search for a whitespace, 'f', and any two chars Pattern pattern = Pattern.compile( "\\sf.." ); Matcher matcher = pattern.matcher( "The quick brown fox jumps over the lazy dog"); while (matcher.find()) { System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); Text " fox" found at 15 to 19.

ADSA: RegExprs/8 21 Example 2 // match against a digit followed by a word Pattern p = Pattern.compile( "\\d+\\w+" ); Matcher m = p.matcher("this is the 1st test string"); if(m.find()) System.out.println("matched [" + m.group() + "] from " + m.start() + " to " + m.end() ); else System.out.println("didn’t match"); matched [1st] from 12 to 15

ADSA: RegExprs/8 22 Subtraction You can use subtraction with character classes. –e.g. a character class that matches everything from a to z, except the vowels (a, e, i, o, u) –written as [a-z&&[^aeiou]]

ADSA: RegExprs/8 23 Search excluding vowels Pattern pattern = Pattern.compile( "[a-z&&[^aeiou]]" ); Matcher matcher = pattern.matcher("The quick brown fox."); while (matcher.find()) { System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); Text "h" found at 1 to 2. Text "q" found at 4 to 5. Text "c" found at 7 to 8. Text "k" found at 8 to 9. Text "b" found at 10 to 11. Text "r" found at 11 to 12. Text "w" found at 13 to 14. Text "n" found at 14 to 15. Text "f" found at 16 to 17. Text "x" found at 18 to 19.

ADSA: RegExprs/ Sequences and Alternatives Two patterns matches in sequence: –e.g., [A-Za-z]+[0-9] will match one or more letters immediately followed by one digit The bar, |, is used to separate alternatives –e.g., abc|xyz will match either abc or xyz –best to use brackets to make the scope clearer (abc)|(xyz)

ADSA: RegExprs/8 25 Search for 't' or 'T' Pattern pattern = Pattern.compile( "[t|T]" ); Matcher matcher = pattern.matcher( "The quick brown fox jumps over the lazy dog"); while (matcher.find()) System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); Text "T" found at 0 to 1. Text "t" found at 31 to 32.

ADSA: RegExprs/ Some Boundary Matchers ^ the beginning of a line $ the end of a line \b a word boundary \B not a word boundary \G the end of the previous match written as \\b, \\B, and \\G in Java strings

ADSA: RegExprs/8 27 Find "dog" at End of Line Pattern pattern = Pattern.compile( "dog$" ); Matcher matcher = pattern.matcher( "The quick brown dog jumps over the lazy dog"); while (matcher.find()) System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); Text "dog" found at 40 to 43.

ADSA: RegExprs/8 28 Look for a Country ArrayList countries = new ArrayList (); countries.add("Austria"); : // more adds /* Look for a country that starts with "I" with any 2nd letter and either "a" or "e" in the 3rd position. */ Pattern pattern = Pattern.compile( "^I.[ae]" ); for (String c : countries) { Matcher matcher = pattern.matcher(c); if (matcher.lookingAt()) System.out.println("Found: " + c); } Found: Iceland Found: Iraq Found: Ireland Found: Italy continued

ADSA: RegExprs/8 29 m.lookingAt() returns true if the pattern matches at the beginning of the text string, false otherwise.

ADSA: RegExprs/8 30 Word Boundaries: \b \B A word boundary is a position between \w and \W (non-word char), or at the beginning or end of a string. A word boundary is zero length.

ADSA: RegExprs/8 31 Examples String s = "A nonword boundary is the opposite of a word boundary, " + "i.e., anything other than a word boundary."; // match all words "word" Pattern p1 = Pattern.compile("\\bword\\b"); Matcher m1 = p1.matcher(s); while (m1.find()) System.out.println("p1 match: " + m1.group() + " at " + m1.start()); // match word ending with "word" but not the word "word" Pattern p2 = Pattern.compile("\\Bword\\b"); Matcher m2 = p2.matcher(s); while (m2.find()) System.out.println("p2 match: " + m2.group() + " at " + m2.start()); p1 match: word at 40 p1 match: word at 83 p2 match: word at 5

ADSA: RegExprs/ Grouping A group treats multiple characters as a single unit. –a group is created by placing characters inside parentheses –e.g. the RE (dog) is the group containing the letters "d" "o" and "g".

ADSA: RegExprs/8 33 Find the Words 'the' or 'quick' String text = "the quick brown fox jumps over the lazy dog"; Pattern pattern = Pattern.compile( "(the)|(quick)" ); Matcher matcher = pattern.matcher(text); while (matcher.find()) { System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); Text "the" found at 0 to 3. Text "quick" found at 4 to 9. Text "the" found at 31 to 34.

ADSA: RegExprs/ (Greedy) Quantifiers X represents some pattern: X? optional, X occurs once or not at all X*X occurs zero or more times X+X occurs one or more times X{n}X occurs exactly n times X{n,}X occurs n or more times X{n,m}X occurs at least n but not more than m times

ADSA: RegExprs/8 35 Example String[] exprs = { "x?", "x*", "x+", "x{2}", "x{2,}", "x{2,5}" }; String input = "xxxxxx yyyxxxxxx zzzxxxxxx"; for (String expr : exprs) { Pattern pattern = Pattern.compile(expr); Matcher matcher = pattern.matcher(input); System.out.println(" "); System.out.format("regex: %s %n", expr); while (matcher.find()) System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(),matcher.end());

ADSA: RegExprs/8 36 Output Regex: x? Text "x" found at 0 to 1. Text "x" found at 1 to 2. Text "x" found at 2 to 3. Text "x" found at 3 to 4. Text "x" found at 4 to 5. Text "x" found at 5 to 6. Text "" found at 6 to 6. Text "" found at 7 to 7. Text "" found at 8 to 8. Text "" found at 9 to 9. Text "x" found at 10 to 11. Text "x" found at 11 to 12. Text "x" found at 12 to 13. Text "x" found at 13 to 14. Text "x" found at 14 to 15. Text "x" found at 15 to 16. Text "" found at 16 to 16. Text "" found at 17 to 17. Text "" found at 18 to 18. Text "" found at 19 to 19. Text "x" found at 20 to 21. Text "x" found at 21 to 22. Text "x" found at 22 to 23. Text "x" found at 23 to 24. Text "x" found at 24 to 25. Text "x" found at 25 to 26. Text "" found at 26 to continued

ADSA: RegExprs/8 37 Regex: x* Text "xxxxxx" found at 0 to 6. Text "" found at 6 to 6. Text "" found at 7 to 7. Text "" found at 8 to 8. Text "" found at 9 to 9. Text "xxxxxx" found at 10 to 16. Text "" found at 16 to 16. Text "" found at 17 to 17. Text "" found at 18 to 18. Text "" found at 19 to 19. Text "xxxxxx" found at 20 to 26. Text "" found at 26 to Regex: x+ Text "xxxxxx" found at 0 to 6. Text "xxxxxx" found at 10 to 16. Text "xxxxxx" found at 20 to Regex: x{2} Text "xx" found at 0 to 2. Text "xx" found at 2 to 4. Text "xx" found at 4 to 6. Text "xx" found at 10 to 12. Text "xx" found at 12 to 14. Text "xx" found at 14 to 16. Text "xx" found at 20 to 22. Text "xx" found at 22 to 24. Text "xx" found at 24 to Regex: x{2,} Text "xxxxxx" found at 0 to 6. Text "xxxxxx" found at 10 to 16. Text "xxxxxx" found at 20 to Regex: x{2,5} Text "xxxxx" found at 0 to 5. Text "xxxxx" found at 10 to 15. Text "xxxxx" found at 20 to 25.

ADSA: RegExprs/8 38 Matching SSN Numbers ArrayList input = new ArrayList (); input.add(" "); input.add(" "); input.add(" (attack)"); input.add(" "); input.add(" "); for (String ssn : input) if (ssn.matches( "^(\\d{3}-?\\d{2}-?\\d{4})$" )) System.out.println("Found good SSN: " + ssn); Found good SSN: Found good SSN: continued

ADSA: RegExprs/8 39 String.matches(String regex) returns true or false depending on whether the string matches the RE (regex). str.matches(regex) is the same as: Pattern.matches(regex, str)

ADSA: RegExprs/ Three Types of Quantifiers 1. A greedy quantifier will match as much as it can, and back off if it needs to –see examples on previous slides 2. A reluctant quantifier will match as little as possible, then take more if it needs to –you make a quantifier reluctant by adding a ? : X?? X*? X+? X{n}? X{n, }? X{n, m}? continued

ADSA: RegExprs/ A possessive quantifier will match as much as it can, and never lets go –you make a quantifier possessive by appending a + : X?+ X*+ X++ X{n}+ X{n, }+ X{n, m}+

ADSA: RegExprs/8 42 Quantifier Examples The text is "aardvark". 1. Use the pattern a*ardvark ( a* is greedy) –the a* will first match aa, but then ardvark won’t match –the a* then “backs off” and matches only a single a, allowing the rest of the pattern ( ardvark ) to succeed continued

ADSA: RegExprs/ Use the pattern a*?ardvark ( a*? is reluctant) –the a*? will first match zero characters (the null string), but then ardvark won’t match –the a*? then extends and matches the first a, allowing the rest of the pattern ( ardvark ) to succeed continued

ADSA: RegExprs/ Using the pattern a*+ardvark ( a*+ is possessive) –the a*+ will match the aa, and will not back off, so ardvark never matches and the pattern match fails

ADSA: RegExprs/8 45 Reluctant Example Pattern pat = Pattern.compile( "e.+?d" ); Matcher mat = pat.matcher("extend cup end table"); while (mat.find()) System.out.println("Match: " + mat.group()); Output: Match: extend Match: end

ADSA: RegExprs/ Capturing Groups Parentheses are used for grouping, but they also capture (keep for later use) anything matched by that part of the pattern. Example: ([a-zA-Z]*)([0-9]*) matches any number of letters followed by any number of digits If the match succeeds: –\1 holds the matched letters –\2 holds the matched digits –\0 holds everything matched by the entire pattern continued

ADSA: RegExprs/8 47 Capturing groups are numbered by counting their opening parentheses from left to right: –( ( A ) ( B ( C ) ) ) \0 = \1 = ((A)(B(C))), \2 = (A), \3 = (B(C)), \4 = (C) Example: ([a-zA-Z])\1 will match a double letter, such as letter continued

ADSA: RegExprs/8 48 A word puzzle: "what is the only word in English which has three consecutive double letters?" Two possible answers are "sweet-tooth" and "hoof-footed", but they use hyphens, which I'm not allowing

ADSA: RegExprs/8 49 Matcher.group() If m is a matcher that has just got a successful match, then –m.group(n) returns the String matched by capturing group n this could be an empty string this will be null if the pattern as a whole matched but this particular group didn’t match anything –m.group(0) returns the String matched by the entire pattern (same as m.group() ) this could be an empty string

ADSA: RegExprs/8 50 Examples Move all the consonants at the beginning of a string to the end –"sheila" becomes "eilash" Pattern p = Pattern.compile( "([^aeiou]*)(.*)" ); Matcher m = p.matcher("sheila"); if (m.matches()) System.out.println(m.group(2) + m.group(1)); (.*) means “all the rest of the chars”

ADSA: RegExprs/ Escaping Metacharacters A lot of special characters – parentheses, brackets, braces, stars, the plus sign, etc. – are used in REs –they are called metacharacters continued

ADSA: RegExprs/8 52 Suppose you want to search for the character sequence a* (an a followed by an ordinary " * ") –"a*" ; doesn’t work; that means “zero or more a' s” –"a\*" ; doesn’t work; since a star doesn’t need to be escaped in Java String constants; Java ignores the \ –"a\\*" does work; it’s the three-char string a, \, * Just to make things even more difficult, it’s illegal to escape a non-metacharacter in a RE.

ADSA: RegExprs/ split() and REs String colours = "Red,White, Blue Green Yellow, Orange"; // Pattern for finding commas and whitespaces Pattern splitter = Pattern.compile( "[,\\s]+" ); String[] cols = splitter.split(colours); for (String colour : cols) System.out.println("Colour = \"" + colour + "\""); continued

ADSA: RegExprs/8 54 Or use String.split(String regex): String colours = "Red,White, Blue Green Yellow, Orange"; // Pattern for finding commas and whitespaces String[] cols = colours.split( "[,\\s]+" ); for (String colour : cols) System.out.println("Colour = \"" + colour + "\"");

ADSA: RegExprs/ Replacing Text If m is a matcher, then –m.replaceFirst(replacement) returns a new String where the first substring matched by the pattern is replaced by replacement –m.replaceAll(replacement) returns a new String where all matched substrings are replaced

ADSA: RegExprs/8 56 Example 1 Pattern pattern = Pattern.compile( "a" ); Matcher matcher = pattern.matcher("a b c a b c"); String output = matcher.replaceAll("x"); // is "x b c x b c"

ADSA: RegExprs/8 57 Example 2 String str = "Java1 Java2 JDK Java2S Java2s.com"; Pattern pat = Pattern.compile( "Java.*? " ); Matcher mat = pat.matcher(str); System.out.println("Original: " + str); str = mat.replaceAll("Java "); System.out.println("Modified: " + str); Original: Java1 Java2 JDK Java2S Java2s.com Modified: Java Java JDK Java Java2s.com

ADSA: RegExprs/ Look-ahead & Look-behind A Look-ahead expression looks forward, starting from its location in the pattern, continuing to the end of the input. A Look-behind expression starts at the beginning of the pattern and continues up to the look-behind expression. These patterns do not capture values.

ADSA: RegExprs/8 59 Operations (?:X) X, as a non-capturing group (?=X) X, via zero-width positive look-ahead (?!X) X, via zero-width negative look-ahead (?<=X) X, via zero-width positive look-behind (?<!X) X, via zero-width negative look-behind (?<X) X, as an independent, non-capturing group

ADSA: RegExprs/8 60 Look-ahead Example 1 Does the input text contain “incident” but not “theft” anywhere. Pattern: "(?!.*theft).*incident.*" Result: –"There was a crime incident" matches –"The incident involved a theft" no match –"The theft was a serious incident" no match

ADSA: RegExprs/8 61 Example 2 String regex = "John (?!Smith)[A-Z]\\w+"; Pattern pattern = Pattern.compile(regex); String str = "I think that John Smith is a fictional character. His real name might be John Jackson, John Gestling, or John Hulmes for all we know."; Matcher matcher = pattern.matcher(str); while (matcher.find()) System.out.println("MATCH: " + matcher.group()); John names excluding John Smith MATCH: John Jackson MATCH: John Gestling MATCH: John Hulmes

ADSA: RegExprs/8 62 Look-behind Example // find text which is preceded by " Pattern pat = Pattern.compile( "(?<= ); String str = "The Java2s website can be found at There, you can find some Java examples."; Matcher matcher = pat.matcher(str); while (matcher.find()) System.out.println(":" + matcher.group() + ":"); :

ADSA: RegExprs/ More Information Look in any Java textbook that deals with J2SE 1.4 or later. –I've placed a RE extract from "Java: How to Program", 7 th ed. on the ADSA website I explained REs in the "Discrete Maths" subject (using grep). continued

ADSA: RegExprs/8 64 The Java tutorial on REs is very good: – essential/regex/ Online tutorials: – guide-to-regular-expressions-in-java-part-1/ –and part-2 continued

ADSA: RegExprs/8 65 Many examples at: – The standard text on REs in different languages (including Java): –Mastering Regular Expressions Jeffrey E F Friedl O'Reilly, 2006