Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
YES-NO machines Finite State Automata as language recognizers.
Turing Machines (At last!). Designing Universal Computational Devices Was Not The Only Contribution from Alan Turing… Enter the year 1940: The world is.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Regular Expressions & Automata Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Regular Expressions in Java. Namespace in XML Transparency No. 2 Regular Expressions Regular expressions are an extremely useful tool for manipulating.
Regular Expressions in Java. Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java) A regular.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
CS 310 – Fall 2006 Pacific University CS310 Turing Machines Section 3.1 November 6, 2006.
October 29, 2009Introduction to Cognitive Science Lecture 14: Theory of Computation I 1 Finite Automata Example 2: Can we build a finite automaton that.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Finite Automata Chapter 5. Formal Language Definitions Why need formal definitions of language –Define a precise, unambiguous and uniform interpretation.
Topics Automata Theory Grammars and Languages Complexities
Turing Machines CS 105: Introduction to Computer Science.
1 Overview Regular expressions Notation Patterns Java support.
Languages and Machines Unit two: Regular languages and Finite State Automata.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Turing Machines A more powerful computation model than a PDA ?
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Thopson NFA Presenter: Yuen-Shuo Li Date: 2014/5/7 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
CS490 Presentation: Automata & Language Theory Thong Lam Ran Shi.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided by author Slides edited for.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
L ECTURE 3 Chapter 4 Regular Expressions. I MPORTANT T ERMS Regular Expressions Regular Languages Finite Representations.
___________________________________________ COMPILER Theory___________________________________________ Fourth Year (First Semester) Dr. Hamdy M. Mousa.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Turing Machines Chapter 17. Languages and Machines SD D Context-Free Languages Regular Languages reg exps FSMs cfgs PDAs unrestricted grammars Turing.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 203: Introduction to Formal Languages and Automata
September1999 CMSC 203 / 0201 Fall 2002 Week #15 – 2/4/6 December 2002 Prof. Marie desJardins.
1 Section 13.1 Turing Machines A Turing machine (TM) is a simple computer that has an infinite amount of storage in the form of cells on an infinite tape.
1 Turing Machines and Equivalent Models Section 13.1 Turing Machines.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 12 Mälardalen University 2007.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Conversions Regular Expression to FA FA to Regular Expression.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lexical Analysis (Tokenizing) COMP 3002 School of Computer Science.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Theory of Computation Lecture #
Jaya Krishna, M.Tech, Assistant Professor
Chapter 9 TURING MACHINES.
REGULAR LANGUAGES AND REGULAR GRAMMARS
CS21 Decidability and Tractability
Compiler Construction
Decidability and Tractability
Regular Expressions in Java
Regular Expressions in Java
Announcements - P1 part 1 due Today - P1 part 2 due on Friday Feb 1st
Regular Expressions in Java
Presentation transcript:

Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park

Complexity to Computability Just looked at algorithmic complexity How many steps required At high end of complexity is computability Decidable Undecidable

Complexity to Computability Approach complexity from different direction Look at simple models of computation Regular expressions Finite automata Turing Machines

Overview Regular expressions Notation Patterns Java support Automata Languages Finite State Machines Turing Machines Computability

Regular Expression (RE) Notation for describing simple string patterns Very useful for text processing Finding / extracting pattern in text Manipulating strings Automatically generating web pages

Regular Expression Regular expression is composed of Symbols Operators ConcatenationAB UnionA | B ClosureA*

Definitions Alphabet Set of symbols  Examples  {a, b}, {A, B, C}, {a-z,A-Z,0-9}… Strings Sequences of 0 or more symbols from alphabet Examples  , “a”, “bb”, “cat”, “caterpillar”… Languages Sets of strings Examples  , {  }, {“a”}, {“bb”, “cat”}… empty string

More Formally Regular expression describes a language over an alphabet L(E) is language for regular expression E Set of strings generated from regular expression String in language if it matches pattern specified by regular expression

Regular Expression Construction Every symbol is a regular expression Example “a” REs can be constructed from other REs using Concatenation Union | Closure *

Regular Expression Construction Concatenation A followed by B L(AB) = { ab | a  L(A) AND b  L(B) } Example a {“a”} ab {“ab”}

Regular Expression Construction Union A or B L(A | B) = { a | a  L(A) OR a  L(B) } Example a | b {“a”, “b”}

Regular Expression Construction Closure Zero or more A L(A*) = { a | a =  OR a  L(A) OR a  L(A)L(A)… } Example a* { , “a”, “aa”, “aaa”, “aaaa” …} (ab)*c {“c”, “abc”, “ababc”, “abababc”…}

Regular Expressions in Java Java supports regular expressions In java.util.regex.* Applies to String class in Java 1.4 Introduces additional specification methods Simplifies specification Does not increase power of regular expressions Can simulate with concatenation, union, closure

Regular Expressions in Java Concatenation ab “ab” (ab)c “abc” Union ( bar | or square brackets [ ] ) a | b “a”, “b” [abc] “a”, “b”, “c” Closure (star *) (ab)* , “ab”, “abab”, “ababab” … [ab]* , “a”, “b”, “aa”, “ab”, “ba”, “bb” …

Regular Expressions in Java One or more (plus +) (a)+One or more “a”s Range (dash –) [a–z]Any lowercase letters [0–9]Any digit Complement (caret ^ at beginning of RE) [^a]Any symbol except “a” [^a–z]Any symbol except lowercase letters

Regular Expressions in Java Precedence Higher precedence operators take effect first Precedence order Parentheses( … ) Closurea* b+ Concatenationab Uniona | b Range[ … ]

Regular Expressions in Java Examples ab+“ab”, “abb”, “abbb”, “abbbb”… (ab)+“ab”, “abab”, “ababab”, … ab | cd“ab”, “cd” a(b | c)d“abd”, “acd” [abc]d“ad”, “bd”, “cd” When in doubt, use parentheses

Regular Expressions in Java Predefined character classes [.]Any character except end of line [\d]Digit: [0-9] [\D]Non-digit: [^0-9] [\s]Whitespace character: [ \t\n\x0B\f\r] [\S]Non-whitespace character: [^\s] [\w]Word character: [a-zA-Z_0-9] [\W]Non-word character: [^\w]

Regular Expressions in Java Literals using backslash \ Need two backslash Java compiler will interpret 1 st backslash for String Examples \\]“]” \\.“.” \\\\“\” 4 backslashes interpreted as \\ by Java compiler

Using Regular Expressions in Java 1. Compile pattern import java.util.regex.*; Pattern p = Pattern.compile("[a-z]+"); 2. Create matcher for specific piece of text Matcher m = p.matcher("Now is the time"); 3. Search text Boolean found = m.find(); Returns true if pattern is found in text

Using Regular Expressions in Java If pattern is found in text m.group()  string found m.start()  index of the first character matched m.end()  index after last character matched m.group() is same as substring(m.start(), m.end()) Calling m.find() again Starts search after end of current pattern match If no more matches, return to beginning of string

Complete Java Example Code Output ow – is – the – time – import java.util.regex.*; public class RegexTest { public static void main(String args[]) { Pattern p = Pattern.compile(“[a-z]+”); Matcher m = p.matcher(“Now is the time”); while (m.find()) { System.out.print(m.group() + “ – ”); } } }

Language Recognition Accept string if and only if in language Abstract representation of computation Performing language recognition can be Simple Strings with even number of 1’s Hard Strings representing legal Java programs Impossible! Strings representing nonterminating Java programs

Automata Simple abstract computers Can be used to recognize languages Finite state machine States + transitions Turing machine States + transitions + tape

Finite State Machine States Starting Accepting Finite number allowed Transitions State to state Labeled by symbol L(M) = { w | w ends in a 1} q1q1 q2q Start State Accept State a

Finite State Machine Operations Move along transitions based on symbol Accept string if ends up in accept state Reject string if ends up in non-accepting state q1q1 q2q “011” Accept “10” Reject

Finite State Machine Properties Powerful enough to recognize regular expressions In fact, finite state machine  regular expression Languages recognized by finite state machines Languages recognized by regular expressions 1-to-1 mapping

Turing Machine Defined by Alan Turing in 1936 Finite state machine + tape Tape Infinite storage Read / write one symbol at tape head Move tape head one space left / right Tape Head …… q1q1 q2q

Turing Machine Allowable actions Read symbol from current square Write symbol to current square Move tape head left Move tape head right Go to next state

Turing Machine *10010* … … Current State Current Content Value to Write Direction to Move New state to enter START**LeftMOVING 10LeftMOVING 01LeftMOVING **No moveHALT Tape Head

Turing Machine Operations Read symbol on current square Select action based on symbol & current state Accept string if in accept state Reject string if halts in non-accepting state Reject string if computation does not terminate Halting problem It is undecidable in general whether long-running computations will eventually accept

Computability A language is computable if it can be recognized by some algorithm with finite number of steps Church-Turing thesis Turing machine can recognize any language computable on any machine Intuition Turing machine captures essence of computing Program (finite state machine) + Memory (tape)