Download presentation
Presentation is loading. Please wait.
Published byAlexina O’Connor’ Modified over 9 years ago
1
Regular Expression Mohsen Mollanoori
2
What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ” “ A Powerful String Processing Tool ” “ A pattern that can be matched against a string ” “ A Language But Not A Language ”
3
What RegeX Does ? String Processing Matching Strings against a Specific Pattern Split Strings Change Substrings Extract Substrings
4
What Programming Languages Support RegeX ? Almost All of Them Perl Java.Net (C#, VB.Net, …) PHP Ruby Java Script … And even Many IDEs & Editors & Utilities grep eclipse Visual Studio.Net vim emacs …
5
The Notation SymbolMeaningExample. Any Single Char /.at/ matches “cat”, “bat”, “pat”, “mat” * Zero or More occurrence of preceding Char /a*b/ matches “b”, “aaaaab” + One or More occurrence of preceding Char /a+b/ matches “ab”, “aaaaab” ? Zero or One occurrence of preceding Char /a?b/ matches “ab” and “b”
6
Example 1 String: “Term”, “Term1”, “Term2” Pattern: /Term./ Result: “Term1”, “Term2”
7
Example 2 String: “Term”, “Term1”, “Term2” Pattern: /Term.?/ Result: “Term”, “Term1”, “Term2”
8
Example 3 String: “Term”, “Term1”, “Term2” Pattern: /Term1?/ Result: “Term”, “Term1”
9
Example 4 String: “Term1”, “Term11”, “Term2”, “Term” Pattern: /Term1+/ Result: “Term1”, “Term11”
10
Example 5 String: “Term1”, “Term11”, “Term2”, “Term” Pattern: /Term1*/ Result: “Term1”, “Term11”, “Term”
11
Character Classes ExampleMeaning [pnm]“p” or “n” or “m” [Qq]“Q” or “q” [A-Z]Upper Case Letters [A-Za-z]Letters [^A-Z]Every char EXCEPT A-Z [A-Z&&[^C-E]]A-Z but NOT C-E
12
Example 6 String: “CAT”, “Cat”, “cat” Pattern: /[Cc]at/ Result: “Cat”, “cat”
13
Example 7 String: “CAT”, “Cat”, “cat” Pattern: /[Cc][Aa][Tt]/ Result: “CAT”, “Cat”, “cat”
14
Example 8 String: “Term”, “Term1”, “Term2” Pattern: /[A-Za-z]+/ Result: “Term”
15
Example 9 String: “Term”, “Term1”, “Term222” Pattern: /.*[0-9]+/ Result: “Term1”, “Term222”
16
Example 10 String: “Term”, “Term1”, “Term222” Pattern: /[^0-9]+/ Result: “Term”
17
Repeating Chars (Intervals) ExampleDescription a{3}Matches “aaa” a{3,5}Matches “aaa”, “aaaa”, “aaaaa” a{3,}Matches “aaa”, “aaaa”, …
18
Predefined Character Classes ClassDescription \dDigit \DNon Digit \sSpace \SNon Space \wAlphanumeric \WNon Alphanumeric \bWord Boundary \BNon Word Boundary \AThe beginning of the input \zThe end of the input
19
Example 11 String: “This is some text !” Pattern: /is/ Result: “This is some text !”
20
Example 12 String: “This is some text !” Pattern: /\bis\b/ Result: “This is some text !”
21
Example 13 Variable Names Pattern: /[A-Za-z]\w{0,15}/
22
Groups email addresses: /[A-Za-z0-9_]+@.+\.\w+/ /([A-Za-z0-9_]+)@(.+)\.(\w+)/ $1 Username $2 Server $3 Domain
23
RegeX & Perl open (IN, “File.txt”); # open file while ($line = ) # read line by line { if($line =~ /([A-Za-z0-9_])@(.+)\.(\w+)/) { print ‘User =’, $1, “, Server =“, $2 } close(IN);
24
RegeX & Ruby open('in.txt', 'r').readlines.each do |line| puts line if line =~ /^([a-z0-9_]+)@(.+)\.(.+)$/i end
25
RegeX & Java java.util.regex.Pattern java.util.regex.Matcher java.util.Scanner java.lang.String replaceAll(regex, replacement) replaceFirst(regex, replacement) matches(regex) split(regex)
26
Example 16 String email = readEmailFromSomewhere(); if (email.matches("([A-Za-z0-9_]+)@(.+)\\.(\\w+)")) { System.out.println("valid email"); } else { System.out.println("invalid email"); }
27
Example 17 String str = "098 123-456-789"; String[] nums = str.split("[\\s-]"); for (String num : nums) { System.out.println(num); }
28
Example 18 // Remove Tags from HTML String html = “ This is a title. ” + “ This is body of a HTML file” + “! ”; String text = html.replaceAll(" ]+>", " "); String normalizedText = text.replaceAll("\\s+", " "); System.out.println(normalizedText);
29
Example 19 // hyperlik urls String html = " Please Visit http://myhomepage.com "; html = html.replaceAll("https?://([-.A-Za-z]+)“," $1 "); System.out.println(html);
30
Example 20 Convert MixedCase to underlined_format String MixedCase = "ThisIsSomeTextInMixedCaseFormat"; String temp = MixedCase.replaceAll("([a-z])([A-Z])", "$1_$2"); String underlined_format = temp.toLowerCase(); System.out.println(underlined_format); // result: this_is_some_text_in_mixed_case_format
31
Convert underlined_format to MixedCase ?
32
Example 21 The Pipe Sign Find Strings of 0s & 1s that have even number of 1s or even number of 0s str = ‘110100101' puts str =~ /^(1*(01*0)*1*|0*(10*1)*0*)$/ ? 'Yes' : 'No'
33
Example 22 Finding Unintentionally Repeated Words text = 'hello, this is some some text!' ?
34
Back References \i references to iths matched group Example: /(.)\1/ matches against “aa”, “bb”, “11”, “##”
35
Example 22 Finding Unintentionally Repeated Words text = 'hello, this is some some text!' if text =~ /(\b\w+\b)\W+\1/ puts $1 + " is repeated more than once" end # some is repeated more than once
36
You even needn't write code An Editor that supports RegeX eclipse find/replace dialog box
37
Microsoft VS.NET Quick Replace Use Regular Expression Extracting Timestamps From a log file
38
Some Rewriting System! rewrite(input) temp = input do before = temp temp = rewrite temp using rule1 temp = rewrite temp using rule2... after = temp while(before != after) return temp
39
XML
40
MML @Students { @Student(faculty="Computer Engineering" student-id="8017024") { @Name(first="Mohsen" last="Mollanoori"); @Terms { @Term(num="1") { @Lesson(name=“Statistics” mark="10"); @Lesson(name"Math1“ mark="10"); }
41
Example 23 MML 2 XML do { before = mml; mml = mml.replaceAll( "@([A-Za-z]+)(\\(([^)]*)\\))?;", " “); mml = mml.replaceAll( "@([A-Za-z]+)(\\(([^)]*)\\))?\\{([^\\{\\}]*)\\}", " $4 “); after = mml; } while (!before.equals(after));
42
Example 24 Remove Text from XML(Keep Tags Only) Is this Correct ? String xml = “ Text Here ” xml = xml.replaceAll(“>[^<]*<”, “”); Match: “ Text Here ” Result: “ ”
43
Look Ahead & Look Behind String xml = “ Text Here ” xml = xml.replaceAll(“(? )[^<]*(?=<)”, “”); Look Behind using ? ’ Looking Ahead using ?= to see a ‘<’
44
Example 25 Over Matching xml = “ aaa bbb ”; xml = xml.replaceFirst(“>.*<”, “”); Match: xml = “ aaa bbb ”; Result: xml = “ ”;
45
Greedy & Non Greedy GreedyNon Greedy **? ++? ??? {a,b}{a,b}?
46
Example 26 Solution to Over Matching xml = “ aaa bbb ”; xml = xml.replaceFirst(“>.*?<”, “”); Match: xml = “ aaa bbb ”; Result: xml = “ bbb ”;
47
Example 27 String xml = "aabb"; xml = xml.replaceAll(".{2,3}", "-"); System.out.println(xml); // result = ‘-b’ String xml = "aabb"; xml = xml.replaceAll(".{2,3}?", "-"); System.out.println(xml); // result = ‘--’
48
Example 28 String xml = "aabb"; xml = xml.replaceAll(".?", "-"); System.out.println(xml); // result: ----- String xml = "aabb"; xml = xml.replaceAll(".??", "-"); System.out.println(xml); // result: -a-a-b-b-
49
Further Reading & Works “Teach Yourself Regular Expressions in 10 Minutes”, Sams Publishing, February 28, 2004, ISBN: 0-672-32566-7 “Mastering Regular Expressions, 3rd Edition”, By Jeffrey E. F. Friedl, O'Reilly, August 2006, ISBN :0-596-52812-4 Java Regular Expression Documents Practice, Practice, Practice
50
TANX
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.