OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Python: Regular Expressions
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
1 Regular Expressions & Automata Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
13-Jun-15 Regular Expressions in Java. 2 Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java)
Regular Expressions in Java. Namespace in XML Transparency No. 2 Regular Expressions Regular expressions are an extremely useful tool for manipulating.
Regular Expressions in Java. Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java) A regular.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
1 A Quick Introduction to Regular Expressions in Java.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Regular expression. Validation need a hard and very complex programming. Sometimes it looks easy but actually it is not. So there is a lot of time and.
1 Overview Regular expressions Notation Patterns Java support.
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
Science: Text and Language Dr Andy Evans. Text analysis Processing of text. Natural language processing and statistics.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
1 Form Validation. Validation  Validation of form data can be cumbersome using the basic techniques  StringTokenizer  If-else statements  Most of.
Last Updated March 2006 Slide 1 Regular Expressions.
9-Sep-15 Regular Expressions. About “Regular” Expressions In a theory course you should have learned about regular expressions Regular expressions describe.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
ADSA: RegExprs/ Advanced Data Structures and Algorithms Objective –look at programming with regular expressions (REs) in Java Semester 2,
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expressions CSC207 – Software Design. Motivation Handling white space –A program ought to be able to treat any number of white space characters.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
Regular Expressions.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Regular Expressions – An Overview Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in.
 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin.
VBScript Session 13.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Module 6 – Generics Module 7 – Regular Expressions.
Python for NLP Regular Expressions CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
RE Tutorial.
Regular Expressions Upsorn Praphamontripong CS 1110
Lecture 19 Strings and Regular Expressions
CSC 594 Topics in AI – Natural Language Processing
Java Programming Course Regular Expression
Week 14 - Friday CS221.
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions in Java
CS 1111 Introduction to Programming Fall 2018
Regular Expressions in Java
Regular Expressions in Java
Regular Expression in Java 101
Regular Expressions in Java
Presentation transcript:

OOP Tirgul 11

What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2

Regular Expressions Basics

 A regular expression is a kind of pattern that can be applied to text ( String s, in Java).  A regular expression either matches the text (or part of the text), or it fails to match Regular Expressions If a regular expression is complex - then you can easily find out which parts of the regular expression match which parts of the text. If a regular expression matches a part of the text - you can easily find out which part.

Some simple Patterns abc exactly this sequence of three letters“abc” ”ab” [abc] any one of the letters a, b, or c“a” “ab” [^abc] any one character except one of the letters a, b, or c (immediately within an open bracket, ^ means “not”. Anywhere else it means beginning of string) “z” “b” “za” [a-z] any one character from a through z, inclusive “t” “9” [a-zA-Z0-9] any one letter or digit “P” “r” “8” “_ “ “(“ “#”

Some simple Patterns * Matches zero or more occurrences[abc]* “abba” “abta” + Matches one or more occurrencesabc+ “abcc” “”, “aabc” ? Matches zero or one occurrences (optional)[abc]? “” “a” “bcbt”

 If one pattern is followed by another, the two patterns must match consecutively.  For example, [A-Za-z]+[0-9] will match one or more letters immediately followed by one digit.  The vertical bar | is used to separate alternatives.  For example, the pattern abc|[xyz]+ will match abc or one of x,y,z at least one time. Sequences and alternatives abc zyyxyxz x xabc xyzabc zyydxyxz g

. any one character except a line terminator \d a digit: [0-9] \D a non-digit: [^0-9] \s a whitespace character: [ \t\n\v\f\r] \S a non-whitespace character: [^\s] \w a word character: [a-zA-Z_0-9] \W a non-word character: [^\w] Predefined character classes Notice the space. Spaces are significant in regular expressions!

 For the regular expression: \d+.?|abc|[xyz][abc]\s* Example – match entire string “90” “90_” “abc” “” “0” “90abc” “a” “xa ” “xp” “xab”

 For the regular expression: \d+.?|abc|[xyz][abc]\s* Example – match entire string “90” “90_” “abc” “” “0” “90abc” “a” “xa ” “xp” “xab”

Doing it in Java

Doing it in Java, I Compile the regex pattern Create a matcher for a specific piece of text Import regex package

Doing it in Java, II  Now that we have a matcher m :  m.matches() returns true if the pattern matches the entire text string, and false otherwise

Doing it in Java, II - Example Will it match? False

Doing it in Java, II - Example Will it match? False

Doing it in Java, II - Example Will it match? True

Doing it in Java, II  Now that we have a matcher m :  m.matches() returns true if the pattern matches the entire text string, and false otherwise  m.lookingAt() returns true if the pattern matches at the beginning of the text string, and false otherwise

Doing it in Java, II - Example Will it match? True

Doing it in Java, III m.find() returns true if the pattern matches any part of the text string, false otherwise.  If called again, m.find() will start searching from where the last match was found.  m.find() will return true for as many matches as the string contains; after that, it will return false.  When m.find() returns false, matcher m will be reset to the beginning of the text string (and may be used again).

Doing it in Java, III  After a successful match, m.start() will return the index of the first character matched.  After a successful match, m.end() will return the index of the last character matched + 1.

Doing it in Java, III  If no match was attempted, or if the match was unsuccessful, m.start() and m.end() will throw an IllegalStateException.  This is a RuntimeException, so you don’t have to catch it.

Doing it in Java, III - Example What will be the matches?

Advanced Regular Expressions

 Backslashes have a special meaning in regular expressions; e.g., \b means a word boundary.  Backslashes have a special meaning in Java; e.g., \b means the backspace character.  Java syntax rules apply first!  "\b[a-z]+\b" - a string with backspace.  Solution: add another backslash: "\\b[a-z]+\\b”. Double Backslashes

 A lot of special characters are used in defining regular expressions; these are called metacharacters.  parentheses ( “(“,”)”, “[“,”]”,”{“,”}”)  stars (“*”)  plus signs (“+”)  etc. Using Metacharacters in Java

 Suppose you want to search for the character sequence a* (an a followed by a star):  "a*"; doesn’t work; that means “zero or more a’s”.  "a\*"; doesn’t work; since a star doesn’t need to be escaped (in Java String constants). This is a compilation error.  "a\\*" does work; it’s the three-character string “a\*”. Using Metacharacters in Java

Double Backslashes - Example

 In regular expressions, parentheses are used for grouping, but they also capture (keep for later use) anything matched by that part of the pattern.  Example: find a list of Integers i.e., 7,6,4,3,2,8,10  If the match succeeds, \1 holds the one-before-last integer and \2 holds the last.  In addition, \0 holds everything matched by the entire pattern. Capturing groups (\d+,)+(\d+) \0 \1 \2

 If m is a Matcher that has just performed a successful match:  m.group(n) returns the String matched by capturing group n, where Capturing groups are numbered by counting their opening parentheses from left to right. This could be an empty string. This will be null if the pattern as a whole matched but this particular group didn’t match anything. Capturing groups in Java

 If m is a Matcher that has just performed a successful match:  m.group() returns the String matched by the entire pattern (same as m.group(0)). This could also be an empty string. Capturing groups in Java

Capturing groups - Example What will be printed?

Capturing groups - Example What will be printed?

Capturing groups - Example Try to find again what \1 captured Without \1, the expression will not match What group(2) will return? Exception! What will be printed?

 Say word holds a word in English  We want to move all the consonants at the beginning of word (if any) to the end of the word (so cat becomes atc) Example Use of Capturing Groups עיצורים What will be printed?

 If p is a Pattern, m a Matcher on p, then:  m.replaceFirst(String replacement) returns a new String where the first substring matched by p has been replaced by replacement. Note: replacement is a string, not a pattern string.  m.replaceAll(String replacement) returns a new String where every substring matched by p has been replaced by replacement.  m.reset() resets this matcher (i.e. it will start searching from the beginning of the text).  m.reset(newText) resets this matcher and gives it a new text to examine (which may be a String, StringBuffer, or CharBuffer). Example Use of Capturing Groups

Summary

 Regular expressions are a very powerful tool to analyze and manipulate text:  Basic expressions:., *, +, [a-z], a{n,m}, w?, …  Recursive: expr, expr1|expr2, expr1expr2  Capturing groups: (exp1)((exp2)|(exp3))\1\2  Patterns in Java:  Pattern patt = Pattern.compile("[a-z]+");  Matcher matcher = patt.matcher("Now is the time"); matcher.matches(), matcher.find(), matcher.group(i)  white spaces, \\, … Summary