CSA3050: Natural Language Algorithms Finite State Devices.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Augmented Transition Networks
4b Lexical analysis Finite Automata
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Theory Of Automata By Dr. MM Alam
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
CS5371 Theory of Computation
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
Context-Free Grammars Lecture 7
Finite state automaton (FSA)
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Finite Automata Chapter 5. Formal Language Definitions Why need formal definitions of language –Define a precise, unambiguous and uniform interpretation.
Topics Automata Theory Grammars and Languages Complexities
A shorted version from: Anastasia Berdnikova & Denis Miretskiy.
CSC 361Finite Automata1. CSC 361Finite Automata2 Formal Specification of Languages Generators Grammars Context-free Regular Regular Expressions Recognizers.
Introduction to Finite Automata Adapted from the slides of Stanford CS154.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
Introduction to English Morphology Finite State Transducers
Great Theoretical Ideas in Computer Science.
Rosen 5th ed., ch. 11 Ref: Wikipedia
Languages & Strings String Operations Language Definitions.
Finite-State Machines with No Output
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
Finite State Automata and Tries Sambhav Jain IIIT Hyderabad.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Finite State Transducers for Morphological Parsing
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Finite State LanguagesCSE Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Formal Languages and Automata FORMAL LANGUAGES FINITE STATE AUTOMATA.
  An alphabet is any finite set of symbols.  Examples: ASCII, Unicode, {0,1} ( binary alphabet ), {a,b,c}. Alphabets.
Theory of Computation Automata Theory Dr. Ayman Srour.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Context-Free Grammars: an overview
Lexical analysis Finite Automata
Language Recognition (12.4)
CSCI 5832 Natural Language Processing
Speech and Language Processing
THEORY OF COMPUTATION Lecture One: Automata Theory Automata Theory.
CSCI 5832 Natural Language Processing
CSC NLP - Regex, Finite State Automata
Finite Automata.
4b Lexical analysis Finite Automata
CPSC 503 Computational Linguistics
4b Lexical analysis Finite Automata
Language Recognition (12.4)
Finite-State Machines with No Output
Lecture 7: Definite Clause Grammars
Morphological Parsing
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Statistical NLP Winter 2009
Presentation transcript:

CSA3050: Natural Language Algorithms Finite State Devices

October 2004CSA3050 NLP Algorithms2 Sources Blackburn & Striegnitz Ch. 2

October 2004CSA3050 NLP Algorithms3 Parsers vs. Recognisers Recognizers tell us whether a given input is accepted by some finite state automaton. Often we would like to have an explanation of why it was accepted. Parsers give us that kind of explanation. What form does it take?

October 2004CSA3050 NLP Algorithms4 Finite State Parser The output of a finite state parser is a sequence of nodes and arcs. If we, gave the input [h,a,h,a,!] to a parser for our first laughing automaton, it should give us [1,h,2,a,3,h,2,a,3,!,4]. The technique in Prolog for turning a recognizer into a parser is to add one or more extra arguments to keep track of the structure that was found.

October 2004CSA3050 NLP Algorithms5 Base Case Recogniser recognize1(Node,[ ]) :- final(Node). Parser parse1(Node,[ ],[Node]) :- final(Node).

October 2004CSA3050 NLP Algorithms6 Recursive Case Recogniser recognize1(Node1, String) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), recognize1(Node2, NewString). Parser parse1(Node1, String, [Node1,Label|Path]) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), parse1(Node2, NewString, Path).

October 2004CSA3050 NLP Algorithms7 Complex Labels So far we have only considered transitions with single-character labels. More complex labels are possible – e.g. symbols comprising several characters. We can construct an FSA recognizing English noun phrases that can be built from the words: the, a, wizard, witch, broomstick, hermione, harry, ron, with, fast.

October 2004CSA3050 NLP Algorithms8 FSA for Noun Phrases

October 2004CSA3050 NLP Algorithms9 FSA for NPs in Prolog initial(1). final(3). arc(1,2,a). arc(1,2,the). arc(2,2,brave). arc(2,2,fast). arc(2,3,witch). arc(2,3,wizard). arc(2,3,broomstick). arc(2,3,rat). arc(1,3,harry). arc(1,3,ron). arc(1,3,hermione). arc(3,1,with).

October 2004CSA3050 NLP Algorithms10 Parsing a Noun Phrase testparse1(Symbols,Parse) :- initial(Node), parse1(Node,Symbols,Parse). ?-testparse1([the,fast,wizard],Z). Z=[1, the, 2, fast, 2, wizard, 3]

October 2004CSA3050 NLP Algorithms11 Rewriting Categories It is also possible to obtain a more abstract parse, e.g. ?- testparse2([the,fast,wizard],Z). Z=[1, det, 2, adj, 2, noun, 3] What changes are required to obtain this behaviour?

October 2004CSA3050 NLP Algorithms12 1. Changes to the FSA %FSA %Lexicon initial(1). lex(a,det). final(3). lex(the,det). arc(1,2,det). lex(fast,adj). arc(2,2,adj). lex(brave,adj). arc(2,3,cn). lex(witch,cn). arc(1,3,pn). lex(wizard,cn). arc(3,1,prep). lex(broomstick,cn). lex(rat,cn). lex(harry,pn). lex(hermione,pn). lex(ron,pn). lex(with,prep).

October Changes to the Parser Parse1 parse1(Node1, String, [Node1,Label|Path]) :- arc(Node1,Node2,Label), traverse1(Label, String, NewString), parse1(Node2, NewString, Path). Parse2 parse2(Node1, String, [Node1,Label|Path]) :- arc(Node1,Node2,Label), traverse2(Label, String, NewString), parse2(Node2, NewString, Path). traverse2(Label,[Symbol|Sym bols],Symbols) :- lex(Symbol,Label).

October 2004CSA3050 NLP Algorithms14 Handling Jumps traverse3('#',String,String). traverse3(Cat,[Word|Words],Words) :- lex(Word,Cat).

October 2004CSA3050 NLP Algorithms15 Finite State Transducers A finite state transducer essentially is a finite state automaton that works on two (or more) tapes. The most common way to think about transducers is as a kind of ``translating machine'‘ which works by reading from one tape and writing onto the other.

October 2004CSA3050 NLP Algorithms16 A Translator from a to b initial state: arrowhead final state: double circle a:b read from first tape and write to second tape

October 2004CSA3050 NLP Algorithms17 Prolog Representation :- op(250,xfx,:). initial(1). final(1). arc(1,1,a:b).

October 2004CSA3050 NLP Algorithms18 Modes of Operation generation mode: It writes a string of as on one tape and a string bs on the other tape. Both strings have the same length. recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs. translation mode (left to right): It reads as from the first tape and writes an b for every a that it reads onto the second tape. translation mode (right to left): It reads bs from the second tape and writes an a for every f that it reads onto the first tape.

October 2004CSA3050 NLP Algorithms19 Transducers and Jumps Transducers can make jumps going from one state to another without doing anything on either one or on both of the tapes. So, transitions of the form a:# or #:a or #:# are possible.

October 2004CSA3050 NLP Algorithms20 Simple Transducer in Prolog transduce1(Node,[ ],[ ]) :- final(Node). transduce1(Node1,Tape1,Tape2) :- arc(Node1,Node2,Label), traverse1(Label, Tape1, NewTape1, Tape2, NewTape2), transduce1(Node2,NewTape1,NewTape2).

October 2004CSA3050 NLP Algorithms21 Traverse for FST traverse1(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2). testtrans1(Tape1,Tape2) :- initial(Node), transduce1(Node,Tape1,Tape2).

October 2004CSA3050 NLP Algorithms22 Handling Jumps: 4 cases Jump on both tapes. Jump on the first but not on the second tape. Jump on the second but not on the first tape. Jump on neither tape (this is what traverse1 does).

October 2004CSA3050 NLP Algorithms23 4 Corresponding Clauses traverse2('#':'#',Tape1,Tape1,Tape2,Tape2). traverse2('#':L2,Tape1,Tape1,[L2|RestTape2],RestTape2). traverse2(L1:'#',[L1|RestTape1],RestTape1,Tape2,Tape2). traverse2(L1:L2, [L1|RestTape1], RestTape1, [L2|RestTape2], RestTape2).

October 2004CSA3050 NLP Algorithms24 Morphological Analysis with FSTs Morphology is concerned with the internal structure of words. –How can a word be decomposed into morphemes? –How do the morphemes combine? –What are legitimate combinations? Basic idea is to write FSTs that map the surface form of a word to a description of the morphemes that constitute that word or vice versa. Example: wizard+s to wizard+PL or kiss+ed to kiss+PAST.

October 2004CSA3050 NLP Algorithms25 Plural Nouns in English Regular Forms –add an s as in wizard+s. –add –es as in witch +s Handled with morpho-phonological rules that insert an e whenever the morpheme preceding the s ends in s, x, ch or another fricative. Irregular forms –mouse/mice –automaton/automata Handled on a case-by-case basis Require transducer that translates wizard+s into wizard+PL, witch+es into witch+PL, mice, into mouse+PL and automata into automaton+PL.

October 2004CSA3050 NLP Algorithms26 FST for English Plurals

October 2004CSA3050 NLP Algorithms27 FST in Prolog lex(wizard:wizard,`STEM-REG1'). lex(witch:witch,`STEM-REG2'). lex(automaton:automaton,`IRREG-SG'). lex(automata:`automaton-PL',`IRREG-PL'). lex(mouse:mouse,`IRREG-SG'). lex(mice:`mouse-PL',`IRREG-PL').