Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expression Mohsen Mollanoori. What is RegeX ?  “ A notation to describe regular languages. ”  “ Not necessarily (and not usually) regular ”

Similar presentations


Presentation on theme: "Regular Expression Mohsen Mollanoori. What is RegeX ?  “ A notation to describe regular languages. ”  “ Not necessarily (and not usually) regular ”"— Presentation transcript:

1 Regular Expression Mohsen Mollanoori

2 What is RegeX ?  “ A notation to describe regular languages. ”  “ Not necessarily (and not usually) regular ”  “ A Powerful String Processing Tool ”  “ A pattern that can be matched against a string ”  “ A Language But Not A Language ”

3 What RegeX Does ?  String Processing Matching Strings against a Specific Pattern Split Strings Change Substrings Extract Substrings

4 What Programming Languages Support RegeX ?  Almost All of Them Perl Java.Net (C#, VB.Net, …) PHP Ruby Java Script …  And even Many IDEs & Editors & Utilities grep eclipse Visual Studio.Net vim emacs …

5 The Notation SymbolMeaningExample. Any Single Char /.at/ matches “cat”, “bat”, “pat”, “mat” * Zero or More occurrence of preceding Char /a*b/ matches “b”, “aaaaab” + One or More occurrence of preceding Char /a+b/ matches “ab”, “aaaaab” ? Zero or One occurrence of preceding Char /a?b/ matches “ab” and “b”

6 Example 1 String: “Term”, “Term1”, “Term2” Pattern: /Term./ Result: “Term1”, “Term2”

7 Example 2 String: “Term”, “Term1”, “Term2” Pattern: /Term.?/ Result: “Term”, “Term1”, “Term2”

8 Example 3 String: “Term”, “Term1”, “Term2” Pattern: /Term1?/ Result: “Term”, “Term1”

9 Example 4 String: “Term1”, “Term11”, “Term2”, “Term” Pattern: /Term1+/ Result: “Term1”, “Term11”

10 Example 5 String: “Term1”, “Term11”, “Term2”, “Term” Pattern: /Term1*/ Result: “Term1”, “Term11”, “Term”

11 Character Classes ExampleMeaning [pnm]“p” or “n” or “m” [Qq]“Q” or “q” [A-Z]Upper Case Letters [A-Za-z]Letters [^A-Z]Every char EXCEPT A-Z [A-Z&&[^C-E]]A-Z but NOT C-E

12 Example 6 String: “CAT”, “Cat”, “cat” Pattern: /[Cc]at/ Result: “Cat”, “cat”

13 Example 7 String: “CAT”, “Cat”, “cat” Pattern: /[Cc][Aa][Tt]/ Result: “CAT”, “Cat”, “cat”

14 Example 8 String: “Term”, “Term1”, “Term2” Pattern: /[A-Za-z]+/ Result: “Term”

15 Example 9 String: “Term”, “Term1”, “Term222” Pattern: /.*[0-9]+/ Result: “Term1”, “Term222”

16 Example 10 String: “Term”, “Term1”, “Term222” Pattern: /[^0-9]+/ Result: “Term”

17 Repeating Chars (Intervals) ExampleDescription a{3}Matches “aaa” a{3,5}Matches “aaa”, “aaaa”, “aaaaa” a{3,}Matches “aaa”, “aaaa”, …

18 Predefined Character Classes ClassDescription \dDigit \DNon Digit \sSpace \SNon Space \wAlphanumeric \WNon Alphanumeric \bWord Boundary \BNon Word Boundary \AThe beginning of the input \zThe end of the input

19 Example 11 String: “This is some text !” Pattern: /is/ Result: “This is some text !”

20 Example 12 String: “This is some text !” Pattern: /\bis\b/ Result: “This is some text !”

21 Example 13 Variable Names Pattern: /[A-Za-z]\w{0,15}/

22 Groups email addresses: /[A-Za-z0-9_]+@.+\.\w+/ /([A-Za-z0-9_]+)@(.+)\.(\w+)/ $1 Username $2 Server $3 Domain

23 RegeX & Perl open (IN, “File.txt”); # open file while ($line = ) # read line by line { if($line =~ /([A-Za-z0-9_])@(.+)\.(\w+)/) { print ‘User =’, $1, “, Server =“, $2 } close(IN);

24 RegeX & Ruby open('in.txt', 'r').readlines.each do |line| puts line if line =~ /^([a-z0-9_]+)@(.+)\.(.+)$/i end

25 RegeX & Java  java.util.regex.Pattern  java.util.regex.Matcher  java.util.Scanner  java.lang.String replaceAll(regex, replacement) replaceFirst(regex, replacement) matches(regex) split(regex)

26 Example 16 String email = readEmailFromSomewhere(); if (email.matches("([A-Za-z0-9_]+)@(.+)\\.(\\w+)")) { System.out.println("valid email"); } else { System.out.println("invalid email"); }

27 Example 17 String str = "098 123-456-789"; String[] nums = str.split("[\\s-]"); for (String num : nums) { System.out.println(num); }

28 Example 18 // Remove Tags from HTML String html = “ This is a title. ” + “ This is body of a HTML file” + “! ”; String text = html.replaceAll(" ]+>", " "); String normalizedText = text.replaceAll("\\s+", " "); System.out.println(normalizedText);

29 Example 19 // hyperlik urls String html = " Please Visit http://myhomepage.com "; html = html.replaceAll("https?://([-.A-Za-z]+)“," $1 "); System.out.println(html);

30 Example 20 Convert MixedCase to underlined_format String MixedCase = "ThisIsSomeTextInMixedCaseFormat"; String temp = MixedCase.replaceAll("([a-z])([A-Z])", "$1_$2"); String underlined_format = temp.toLowerCase(); System.out.println(underlined_format); // result: this_is_some_text_in_mixed_case_format

31 Convert underlined_format to MixedCase ?

32 Example 21 The Pipe Sign  Find Strings of 0s & 1s that have even number of 1s or even number of 0s str = ‘110100101' puts str =~ /^(1*(01*0)*1*|0*(10*1)*0*)$/ ? 'Yes' : 'No'

33 Example 22 Finding Unintentionally Repeated Words text = 'hello, this is some some text!' ?

34 Back References  \i references to iths matched group  Example: /(.)\1/ matches against “aa”, “bb”, “11”, “##”

35 Example 22 Finding Unintentionally Repeated Words text = 'hello, this is some some text!' if text =~ /(\b\w+\b)\W+\1/ puts $1 + " is repeated more than once" end # some is repeated more than once

36 You even needn't write code  An Editor that supports RegeX eclipse find/replace dialog box

37 Microsoft VS.NET Quick Replace Use Regular Expression Extracting Timestamps From a log file

38 Some Rewriting System! rewrite(input) temp = input do before = temp temp = rewrite temp using rule1 temp = rewrite temp using rule2... after = temp while(before != after) return temp

39 XML

40 MML @Students { @Student(faculty="Computer Engineering" student-id="8017024") { @Name(first="Mohsen" last="Mollanoori"); @Terms { @Term(num="1") { @Lesson(name=“Statistics” mark="10"); @Lesson(name"Math1“ mark="10"); }

41 Example 23 MML 2 XML do { before = mml; mml = mml.replaceAll( "@([A-Za-z]+)(\\(([^)]*)\\))?;", " “); mml = mml.replaceAll( "@([A-Za-z]+)(\\(([^)]*)\\))?\\{([^\\{\\}]*)\\}", " $4 “); after = mml; } while (!before.equals(after));

42 Example 24 Remove Text from XML(Keep Tags Only) Is this Correct ? String xml = “ Text Here ” xml = xml.replaceAll(“>[^<]*<”, “”); Match: “ Text Here ” Result: “ ”

43 Look Ahead & Look Behind String xml = “ Text Here ” xml = xml.replaceAll(“(? )[^<]*(?=<)”, “”); Look Behind using ? ’ Looking Ahead using ?= to see a ‘<’

44 Example 25 Over Matching xml = “ aaa bbb ”; xml = xml.replaceFirst(“>.*<”, “”); Match: xml = “ aaa bbb ”; Result: xml = “ ”;

45 Greedy & Non Greedy GreedyNon Greedy **? ++? ??? {a,b}{a,b}?

46 Example 26 Solution to Over Matching xml = “ aaa bbb ”; xml = xml.replaceFirst(“>.*?<”, “”); Match: xml = “ aaa bbb ”; Result: xml = “ bbb ”;

47 Example 27 String xml = "aabb"; xml = xml.replaceAll(".{2,3}", "-"); System.out.println(xml); // result = ‘-b’ String xml = "aabb"; xml = xml.replaceAll(".{2,3}?", "-"); System.out.println(xml); // result = ‘--’

48 Example 28 String xml = "aabb"; xml = xml.replaceAll(".?", "-"); System.out.println(xml); // result: ----- String xml = "aabb"; xml = xml.replaceAll(".??", "-"); System.out.println(xml); // result: -a-a-b-b-

49 Further Reading & Works  “Teach Yourself Regular Expressions in 10 Minutes”, Sams Publishing, February 28, 2004, ISBN: 0-672-32566-7  “Mastering Regular Expressions, 3rd Edition”, By Jeffrey E. F. Friedl, O'Reilly, August 2006, ISBN :0-596-52812-4  Java Regular Expression Documents  Practice, Practice, Practice

50 TANX


Download ppt "Regular Expression Mohsen Mollanoori. What is RegeX ?  “ A notation to describe regular languages. ”  “ Not necessarily (and not usually) regular ”"

Similar presentations


Ads by Google