Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expressions Lecture 3. Regular Expressions Motivation: To search for strings using partially specified patterns. Examples: To validate data fields.

Similar presentations


Presentation on theme: "Regular Expressions Lecture 3. Regular Expressions Motivation: To search for strings using partially specified patterns. Examples: To validate data fields."— Presentation transcript:

1 Regular Expressions Lecture 3

2 Regular Expressions Motivation: To search for strings using partially specified patterns. Examples: To validate data fields (dates, email, address) To filter text (disallowed web sites) To identify particular strings in a text To do replacement in a text (color -> colour)

3 Formal Definition of Regular Expressions Regular expressions can be defined over a finite set of alphabet ∑: 1.  is a regular expression and denotes the set {  }. 2.For each a in ∑, a is a regular expression and denotes the set {a}. 3.If r and s are regular expressions denoting the sets R and S respectively, then –(r | s) is a regular expression denoting R  S. –(r.s) is a regular expression denoting R  S. –(r*) is a regular expression denoting R*.

4 Advantages of RE’s The language can be stated as a formal algebra. Regular expressions form a language for expressing patterns. Recognizer for regular expressions can be efficiently implemented.

5 Recognizers A recognizer for a language is a program that takes as input a string x and answers “yes” if x is a sentence of the language and “no” otherwise. This recognizer is a machine which only emits two possible responses to its input.

6 Finite State Automaton A Finite State Automaton (FSA) is an abstract finite machine. Regular expressions can be viewed as a way to describe a Finite State Automaton (FSA). Kleene’s Theorem (1956): FSA and RE describe the same languages: –Any regular expression can be implemented as an FSA. –Any FSA can be described by a regular expression. Regular language are those that can be recognized by FSAs (or characterized by a regular expression).

7 Basic Metacharacters Wild card:. Optionally: ? Repetition: * and + Choice: [Mm][0123456789] Ranges: [a-z][0-9] Negation: [^Mm] (only when ‘^’ occurs immediately after ‘[‘) Disjunction: |

8 Special Backslashes \d: digit (i.e. numeral) \D: non-digit \s: ‘whitespace’ \S: non-whitespace \w: ‘alphanumeric’ ([a-zA-Z0-9]) \W: non-alphanumeric Standard escape sequences \t: tab \n: newline \ is a general escape character.

9 Anchors Anchors are zero width characters. Anchors do not match strings in the text instead they match positions in the text. ^: matches beginning of line (or text) $: matches end of line (or text) \b: matches word boundary (i.e. a location with \w on one side but not the other)

10 Introduction to Python Development started in 1990 at CWI (National Research Institute for Mathematics and Computer Science) in Amsterdam. Owned by Python Software Foundation. Open Source Language –Download from www.python.orgwww.python.org –Extensive Documentation and tutorials

11 Introduction to Python Available for Unix, Linux, Windows, MAC, etc. Easy to Learn, User friendly. Clear Syntax. Object Oriented Paradigm (encourages good programming practices). A small number of Powerful high-level data types. New built in functions/modules and data types can be added by implementing it in a complied language like C/C++.

12 Introduction to Python Variables –Name that refers to a certain value –Limitations: Cannot be a keyword (i.e. print, and, or, if etc.) Cannot start with a number. Case sensitive. Cannot include illegal characters (i.e. $, %, +, =, etc.)

13 Introduction to Python Numbers –Integers: Whole numbers no decimal places. Size = 4 Byte (32 bit). Whole number result when divided two integers. Long integers are represented by L at end of number (454321354534L). These numbers are larger than 2 billion. –Floating point numbers are the numbers with decimal point values. –If you want result in decimal value then use at least one decimal number. (e.g: 10/4.0 = 2.5 and 10/4 = 2)

14 Introduction to Python Strings –String is the set of text and must be inside single or double quotation marks. e.g: course = “Introduction to AI techniques” –Use back slash if you need to add few special functionality. e.g: var1 = “He said \”I play cricket\” ” var2 = “It\’s amazing” Others: Include Backslash = \\ New Line = \n Tab = \t etc.

15 Introduction to Python Concatenation: + Operator Overloaded e.g: Str = str1 + “XYZ” + str2 Repetition: Repeating a string e.g: str = “superman” print str*3 >>>superman superman superman

16 Introduction to Python Math Operations –Basic operations: Add +, Subtract -, Multiply *, Divide /, Exponent **, Modulus % –Order of precedence Parenthesis, Exponents, Multiply/Divide, Add/Subtract. Left to right e.g: 6 * ( 3+2 ) = 30 6 * 3 + 2 = 20

17 Introduction to Python Input –For string, use raw_input() e.g.: email = raw_input(“What is your email?”) –For numbers, use input() e.g.: age = input(“What is your age”)

18 Introduction to Python Output –For string, use print e.g.: print “What is your email?” >>> What is your email? e.g.: email = “What is your email?” print email >>> What is your email? –For numbers, use same print e.g.: print pi >>> 3.14159

19 Introduction to Python Comments –Use to # symbol to specify comments –Everything after # will be ignored by interpreter e.g.: age = 32 # age must be greater than 32 –To comment multiple lines using start and end symbols (??????) Indentation –Space sensitive language (Danger: Be careful) e.g.: if x != y:if x != y: x = y

20 Python Resources How to Think Like a Computer Scientist: Learning with Python, by Allen B. Downey, Jeffrey Elkner and Chris Meyers. This text has been released under the Open Book Project.How to Think Like a Computer Scientist: Learning with PythonOpen Book Project Learning Python, by Mark Lutz and David Ascher. This is a good book for beginners to Python. Look here for corrections, source code, etc.Learning PythonLook here Programming Python, by Mark Lutz. A programmer's reference. 1255pp, and not intended for beginners.Programming Python Dive into Python, by Mark Pilgrim. Advertised as a free book for "experienced programmers". The Homepage also has a number of useful links to Python resources.Dive into Python Main site for Python documentation Note the reference page for regular expression syntax.Main site for Python documentationregular expression syntax Python HOWTO Page (including the Regular Expression HOWTO)Python HOWTO Page


Download ppt "Regular Expressions Lecture 3. Regular Expressions Motivation: To search for strings using partially specified patterns. Examples: To validate data fields."

Similar presentations


Ads by Google