1 i206: Lecture 18: Regular Expressions Marti Hearst Spring 2012.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Theory Of Automata By Dr. MM Alam
8/25/2009 Sofya Raskhodnikova Intro to Theory of Computation L ECTURE 1 Theory of Computation Course information Overview of the area Finite Automata Sofya.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Computer Organization Computer Architecture, Data Representation, Information Theory i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst,
Computer Organization Boolean Logic and the CPU i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, or Glenn Brookshear.
Software Design Analysis of Algorithms i206 Fall 2010 John Chuang Some slides adapted from Glenn Brookshear, Marti Hearst, or Goodrich & Tamassia.
Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.
I206: Distributed Computing Applications & Infrastructure Fall 2010.
Computational Language Finite State Machines and Regular Expressions.
C SC 473 Automata, Grammars & Languages Automata, Grammars and Languages Discourse 01 Introduction.
1 Foundations of Software Design Lecture 1: Course Overview Intro to Binary and Boolean Marti Hearst SIMS, University of California at Berkeley.
Software Design i206 Fall 2010 John Chuang Some slides adapted from Glenn Brookshear, Brian Hayes, Marti Hearst, or James Landay.
Inter-Process Communication i206 Fall 2010 John Chuang Some slides adapted from Coulouris, Dollimore and Kindberg; Calvert and Donahoo.
Operating System & Memory Hierarchy i206 Fall 2010 John Chuang Some slides adapted from Glenn Brookshear, Brian Hayes, or Marti Hearst.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
1 Foundations of Software Design Lecture 22: Regular Expressions and Finite Automata Marti Hearst Fall 2002.
Course Review i206 Fall 2010 John Chuang. 2 Outline  Test 3 topics  Course review  Course evaluation.
Finite Automata Chapter 5. Formal Language Definitions Why need formal definitions of language –Define a precise, unambiguous and uniform interpretation.
Distributed Systems & Networks i206 Fall 2010 John Chuang Some slides adapted from Coulouris, Dollimore and Kindberg.
Networking and Internetworking i206 Fall 2010 John Chuang.
Chapter 1 The Big Picture. QUIZ 2 5 Explain the abstractions we normally apply when using the following systems: DVD player Registering for classes on.
Final Exam Review Instructor : Yuan Long CSC2010 Introduction to Computer Science Apr. 23, 2013.
By: Er. Sukhwinder kaur.  Computation Computation  Algorithm Algorithm  Objectives Objectives  What do we study in Theory of Computation ? What do.
1. Fundamentals of Computer Systems Define a computer system Computer Systems in the modern world Professional standards for computer systems Ethical,
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Quiz # 2 Chapters 4, 5, & 6.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.
March 1, 2009 Dr. Muhammed Al-mulhem 1 ICS 482 Natural Language Processing Regular Expression and Finite Automata Muhammed Al-Mulhem March 1, 2009.
Compsci Today’s topics l Binary Numbers  Brookshear l Slides from Prof. Marti Hearst of UC Berkeley SIMS l Upcoming  Networks Interactive.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Language Recognition Rosen, Chapter 12.4.
1 i206: Lecture 9: Review; Intro to Algorithms Marti Hearst Spring 2012.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
Compsci Today’s topics l Binary Numbers  Brookshear l Slides from Prof. Marti Hearst of UC Berkeley SIMS l Upcoming  Networks Interactive.
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
1 i206: Lecture 4: The CPU, Instruction Sets, and How Computers Work Marti Hearst Spring 2012.
Deterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.2)
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
SM2220 – Class 06 Finite Automata. SM2220 – Class 06 Topic in theoretical computing. A subset of computation machines. Closely related to formal language.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
1 i206: Lecture 3: Boolean Logic, Logic Circuits Marti Hearst Spring 2012.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Mealy and Moore Machines Lecture 8 Overview Moore Machines Mealy Machines Sequential Circuits.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Computer Organisation
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Fundamentals & Ethics of Information Systems IS 201
University of Gujrat Department of Computer Science
Jaya Krishna, M.Tech, Assistant Professor
i206: Lecture 20: Memory Management; Operating Systems
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
INTRODUCTION TO THE THEORY OF COMPUTATION
Finite Automata.
4b Lexical analysis Finite Automata
Regular Expressions
i206: Lecture 19: Regular Expressions, cont.
4b Lexical analysis Finite Automata
Chapter 6 Programming the basic computer
Teori Bahasa dan Automata Lecture 6: Regular Expression
Presentation transcript:

1 i206: Lecture 18: Regular Expressions Marti Hearst Spring 2012

2 Formal Languages Bits & Bytes Binary Numbers Number Systems Gates Boolean Logic Circuits CPU Machine Instructions Assembly Instructions Program Algorithms Application Memory Data compression Compiler/ Interpreter Operating System Data Structures Analysis I/O Memory hierarchy Design Methodologies/ Tools Process Truth table Venn Diagram DeMorgan ’ s Law Numbers, text, audio, video, image, … Decimal, Hexadecimal, Binary AND, OR, NOT, XOR, NAND, NOR, etc. Register, Cache Main Memory, Secondary Storage Op-code, operands Instruction set arch Lossless v. lossy Info entropy & Huffman code Adders, decoders, Memory latches, ALUs, etc. Data Representation Data storage Principles ALUs, Registers, Program Counter, Instruction Register Network Distributed Systems Security Cryptography Standards & Protocols Inter-process Communication Searching, sorting, etc. Stacks, queues, maps, trees, graphs, … Big-O Formal models Finite automata regex

3

4 Theories of Computation What are the fundamental capabilities and limitations of computers? –Complexity theory: what makes some problems computationally hard and others easy? –Automata theory: definitions and properties of mathematical models of computation –Computability theory: what makes some problems solvable and others unsolvable?

5 Finite Automata Also known as finite state automata (FSA) or finite state machine (FSM) A simple mathematical model of a computer Applications: hardware design, compiler design, text processing

6 A First Example Touch-less hand-dryer DRYER OFF DRYER ON

7 A First Example Touch-less hand-dryer hand DRYER OFF DRYER ON hand

8 Finite Automata State Graphs The start state An accepting state A transition x A state

9 Finite Automata Transition:s 1  x s 2 –In state s 1 on input “ x ” go to state s 2 If end of input –If in accepting state => accept –Else => reject If no transition possible => reject

10 Language of a FA Language of finite automaton M: set of all strings accepted by M Example: Which of the following are in the language? –x, tmp2, 123, a?, 2apples A language is called a regular language if it is recognized by some finite automaton letter letter | digit S A

11 Example What is the language of this FA? + digit S B A -

12 Regular Expressions Regular expressions are used to describe regular languages Arithmetic expression example: (8+2)*3 Regular expression example: (\+|-)?[0-9]+

13 Example What is the language of this FA? Regular expression: (\+|-)?[0-9]+ + digit S B A -

14 Three Equivalent Representations Finite automata Regular expressions Regular languages Each can describe the others Theorem: For every regular expression, there is a deterministic finite-state automaton that defines the same language, and vice versa. Adapted from Jurafsky & Martin 2000

15 Regex Rules ? Zero or one occurrences of the preceding character/regex woodchucks? “ how much wood does a woodchuck chuck? ” behaviou?r “ behaviour is the British spelling of behavior ” * Zero or more occurrences of the preceding character/regex baa* ba, baa, baaa, baaaa … ba* b, ba, baa, baaa, baaaa … [ab]* , a, b, ab, ba, baaa, aaabbb, … [0-9][0-9]* any positive integer, or zero cat.*cat A string where “ cat ” appears twice anywhere + One or more occurrences of the preceding character/regex ba+ ba, baa, baaa, baaaa … {n} Exactly n occurrences of the preceding character/regex ba{3}baaa

16 Regex Rules * is greedy: Home Lazy (non-greedy) quantifier: Home

17 Regex Rules [] Disjunction (Union) [wW]ood “ how much wood does a Woodchuck chuck? ” [abcd]* “ you are a good programmer ” [A-Za-z0-9] (any letter or digit) [A-Za-z]* (any letter sequence) | Disjunction (Union) (cats?|dogs?)+ “ It ’ s raining cats and a dog. ” ( ) Grouping (gupp(y|ies))* “ His guppy is the king of guppies. ”

18 Regex Rules ^ $ \b Anchors (start/end of input string; word boundary) ^The “ The cat in the hat. ” ^The end\.$ “ The end. ” ^The.* end\.$ “ The bitter end. ” (the)* “ I saw him the other day. ” (\bthe\b)* “ I saw him the other day. ” Special rule: when ^ is FIRST WITHIN BRACKETS it means NOT [^A-Z]* (anything not an upper case letter)

19 Regex Rules \ Escape characters \. “ The + and \ characters are missing. ” \+ “ The + and \ characters are missing. ” \\ “ The + and \ characters are missing. ”

20 Operator Precedence What is the difference? –[a-z][a-z]|[0-9]* –[a-z]([a-z]|[0-9])* OperatorPrecedence ()highest * + ? {} sequences, anchors |lowest

21 Regex for Dollars No commas \$[0-9]+(\.[0-9][0-9])? With commas \$[0-9][0-9]?[0-9]?(,[0-9][0-9][0-9])*(\.[0-9][0-9])? With or without commas \$[0-9][0-9]?[0-9]?((,[0-9][0-9][0-9])*| [0-9]*) (\.[0-9][0-9])?

22 Regex in Python import re result = re.search(pattern, string) result = re.findall(pattern, string) result = re.match(pattern, string) Python documentation on regular expressions – –Some useful flags like IGNORECASE, MULTILINE, DOTALL, VERBOSE