October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.

Slides:



Advertisements
Similar presentations
CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
Advertisements

Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
YES-NO machines Finite State Automata as language recognizers.
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
1 Languages and Finite Automata or how to talk to machines...
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
Lecture 7 Sept 22, 2011 Goals: closure properties regular expressions.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Normal forms for Context-Free Grammars
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Topics Automata Theory Grammars and Languages Complexities
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Introduction to English Morphology Finite State Transducers
Great Theoretical Ideas in Computer Science.
Finite-State Machines with No Output
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
REGULAR LANGUAGES.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Lexical Analysis Constructing a Scanner from Regular Expressions.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.
CSCI 2670 Introduction to Theory of Computing September 1, 2005.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
CHAPTER 1 Regular Languages
Strings and Languages CS 130: Theory of Computation HMU textbook, Chapter 1 (Sec 1.5)
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Natural Language Processing Chapter 2 : Morphology.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
Finite Automata Chapter 1. Automatic Door Example Top View.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Finite Automata & Regular Languages Sipser, Chapter 1.
Recap: Transformation NFA  DFA  s s1s1... snsn p1p1 p2p2... pmpm >...  p1p1  p2p2  pipi s e s1s1 e s2s2 e sisi >
Transparency No. 2-1 Formal Language and Automata Theory Homework 2.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Conversions Regular Expression to FA FA to Regular Expression.
CSCI 2670 Introduction to Theory of Computing September 11, 2007.
Finite Automata A simple model of computation. 2 Finite Automata2 Outline Deterministic finite automata (DFA) –How a DFA works.
P Symbol Q E(Q) a b a b a b Convert to a DFA: Start state: Final States:
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
CIS Automata and Formal Languages – Pei Wang
Languages.
Lecture 2 Lexical Analysis
CSE 105 theory of computation
Closure Properties for Regular Languages
Nondeterministic Finite Automata
Chapter 1 Regular Language
Presentation transcript:

October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery

October 2007Natural Language Processing2 Acknowledgement Material derived from/copied from –Jurafsky and Martin, Speech and Language Processing, Prentice Hall 2000 –Richard Sproat, Lecture notes

October 2007Natural Language Processing3 Outline Words Regular Languages Regular Expressions Finite State Automata

October 2007Natural Language Processing4 What is a Word? A series of speech sounds that symbolizes meaning without being divisible into smaller units Any segment of written or printed discourse ordinarily appearing between spaces or between a space and a punctuation mark A set of linguistic forms produced by combining a single base with various inflectional elements without change in the part-of-speech elements The smallest meaningful element of language. When written it stands alone with a space on either side of it.

October 2007Natural Language Processing5 Information Associated with Words Spelling –orthographic –phonological Syntax –POS –Valency Semantics –Meaning –Relationship to other words

October 2007Natural Language Processing6 Properties of Words Sequence –characters pollution –phonemes Delimitation –whitespace –other? Structure –simple ("atomic") words –complex ("molecular") words

October 2007Natural Language Processing7 Complex Words Complex words have subparts: e.g. "enlargement" en + large + ment Some subparts are valid words large Others are prefixes and suffixes en, ment N.B. The complex word can be built in different ways: (en + large) + ment en + (large + ment)

October 2007Natural Language Processing8 Morphological Processes affixation –prefix –suffix –circumfix: għandi - mgħandix –infix: phenidine phenetidine other morphological processes –redoubling (mexa; mexxa) –vowel change (swim; swam)

October 2007Natural Language Processing9 Complex Words Formed by Concatenation dis re un en large charge infect code decide ed ing ee er ly ++ prefixesrootssuffixes

October 2007Natural Language Processing10 The Language of Words What kind of formal language is the language of words? One which can be constructed out of –A characteristic set of basic symbols (alphabet) –A characteristic set of combining operations Union (disjunction) Concatenation Iteration Regular Language; Regular Sets

October 2007Natural Language Processing11 Outline Words Regular Languages Regular Expressions Finite State Automota

October 2007Natural Language Processing12 Regular Languages A regular language is a language with a finite alphabet that can be constructed out of one or more of the following operations: –Set union –Concatenation –Transitive closure (Kleene star)

October 2007Natural Language Processing13 Some things that are regular languages Zero or more a’s followed by zero or more b’s The set of words in an English dictionary Dates URLs English?

October 2007Natural Language Processing14 Some things that are not regular languages Zero or more a’s followed by exactly the same number of b’s The set of all English palindromes (e.g. Madam I'm Adam) The set that includes all noun phrases of the form –the cat slept –the cat the dog bit slept –the cat the dog the man fed bit slept

October 2007Natural Language Processing15 Some special regular languages The universal language (Σ*) The empty language (Ø) Note: the empty language is not the same as the empty string

October 2007Natural Language Processing16 Some closure properties of regular languages Intersection Complementation Difference Reversal Power

October 2007Natural Language Processing17 Characterising Classes of Set CLASS OF SETS or LANGUAGES NOTATION MACHINE

October 2007Natural Language Processing18 Outline Words Regular Languages Regular Expressions Finite Automota

October 2007Natural Language Processing19 Regular Expressions Notation for describing regular sets Used extensively in the Unix operating system (grep, sed, etc.) and also in some Microsoft products (Word) Xerox Finite State tools use a somewhat different notation, but similar function.

October 2007Natural Language Processing20 Regular Expressions aa simple symbol A Bconcatenation A | Balternation operator A & Bintersection operator A*Kleene star

October 2007Natural Language Processing21 Characterising Classes of Set CLASS OF SETS or LANGUAGES NOTATION MACHINE

October 2007Natural Language Processing22 Outline Words Regular Languages Regular Expressions Finite Automata

October 2007Natural Language Processing23 Finite Automaton A finite automaton is a quintuple (Q, I, q0,F, δ ) where: Q is a finite set of states Σ is alphabet of symbols q0  Q is a start state F  Q are final states δ is a transition relation δ(q,i,q ' ) between a state q  Q, a symbol σ  Σ and q'  Q

October 2007Natural Language Processing24 Representation of FSA’s: State Diagram

October 2007Natural Language Processing25 State Table

October 2007Natural Language Processing26 Mr. Kleene

October 2007Natural Language Processing27 Kleene’s theorem Languages generated by NFAs are exactly equivalent to languages described by Regular Expressions. Kleene’s Theorem, part 1: To each regular expression there corresponds a NFA. Kleene’s Theorem, part 2: To each NFA there corresponds a regular expression.

October 2007Natural Language Processing28 Converting a Regular Expression to an NFA The NFA representing the empty string is: The NFA representing a single character is: 1 2 ε 1 2 a

October 2007Natural Language Processing29 Regular Expression to NFA Diagram from Leonidas Fegaras, Univ. Texas

October 2007Natural Language Processing30 Deterministic Finite Automata In deterministic finite automata (DFA), every state/symbol pair maps to a unique state In other words, δ is a function Why do we care about DFAs?

October 2007Natural Language Processing31 Deterministic Finite Automata In deterministic finite automata (DFA), every state/symbol pair maps to a unique state In other words, δ is a function Why do we care about DFAs? EFFICIENCY!!

October 2007Natural Language Processing32 Equivalence of NFA’s and DFA’s

October 2007Natural Language Processing33 Subset Construction for Determinisation States which are connected by an ε transition will be represented by the same states in the DFA. If there are multiple transitions based on the same symbol, then we can regard a transition as moving from a state to a set of states (ie. the union of all those states reachable by a transition on the current symbol). Thus these states will be combined into a single DFA state. more details ml

October 2007Natural Language Processing34 Subset construction for determinization

October 2007Natural Language Processing35 Subset construction for determinization

October 2007Natural Language Processing36 Subset construction for determinization

October 2007Natural Language Processing37 Subset construction for determinization

October 2007Natural Language Processing38 Subset construction for determinization

October 2007Natural Language Processing39 Subset construction for determinization