FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

Beesley 2000 Introduction to the xfst Interface Review Introduction to Morphology Relations and Transducers Introduction to xfst.
Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu
A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Intro to NLP - J. Eisner1 Finite-State Methods.
Writing Lexical Transducers Using xfst
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
1 CSCI-2400 Models of Computation. 2 Computation CPU memory.
Finite Automata Finite-state machine with no output. FA consists of States, Transitions between states FA is a 5-tuple Example! A string x is recognized.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
1 Languages and Finite Automata or how to talk to machines...
Turing Machines CS 105: Introduction to Computer Science.
1 Non-Deterministic Finite Automata. 2 Alphabet = Nondeterministic Finite Automaton (NFA)
Introduction to English Morphology Finite State Transducers
May 2007CLINT/LIN xfst 1 Introduction to the xfst Interface Review Introduction to Morphology Relations and Transducers Introduction to xfst.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Finite State Automata and Tries Sambhav Jain IIIT Hyderabad.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
Computational Lexicology, Morphology and Syntax Diana Trandab ă ț Academic year
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Pushdown Automata (PDAs)
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Finite State Machinery - I Fundamentals Recognisers and Transducers.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Introduction to Computational Linguistics Finite State Machines (derived from Ken Beesley)
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
CS 154 Formal Languages and Computability February 4 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
1.2 Three Basic Concepts Languages start variables Grammars Let us see a grammar for English. Typically, we are told “a sentence can Consist.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 154 Formal Languages and Computability February 9 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
Think of a sentence to go with this picture. Can you use any of these words? then if so while though since when Try to use interesting adjectives, powerful.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Deterministic Finite Automata Nondeterministic Finite Automata.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Testing with the Finite-State Calculus Thursday AM Kenneth R. Beesley Xerox Research Centre Europe.
Department of Software & Media Technology
Finite Automata.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Languages.
Composition is Our Friend
Prepare to partition your brain to learn a whole new formalism.
CSE322 Finite Automata Lecture #2.
Speech and Language Processing
CSCI 5832 Natural Language Processing
Chapter Five: Nondeterministic Finite Automata
Writing Lexical Transducers Using xfst
Building Finite-State Machines
languages & relations regular expressions finite-state networks
Morphological Parsing
Non Deterministic Automata
Presentation transcript:

FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002

Recap Last Time: Finite State Automata can model most anything that involes a finite amount of states. We modeled a Coke Machine and saw that it could also be thought of as defining a language. We will now look at the extension to natural language more closely.

A One-Word Language canto

A Three-Word Language canto t igr e m es a

Analysis: A Successful Match canto t igr e m es a mesa Input:

Rejects The analysis of libro, tigra, cant, mesas will fail. Why?

Transducers: Beyond Accept and Reject canto t igr e m es a t igr e canto m es a

Analysis Process: Start at the Start State Match the input symbols of string against the lower- side symbol on the arcs, consuming the input symbols and finding a path to a final state. If successful, return the string of upper-side symbols on the path as the result. If unsucessful, return nothing.

A Two-Level Transducer canto t igr e m es a t igr e canto m es a mesa Output: mesa Input:

A Lexical Transducer canta Output: cant Input: r +PresInd +1P +Sg cant00000o o cant +Verb ar +PresInd+1P+Sg+Verb One Possible Path through the Network

A Lexical Transducer canto Output: cant Input: +Masc +Sg canto000 o cant +Noun o +Masc +Sg +Noun Another Possible Path through the Network

The Tags Tags or Symbols like +Noun or +Verb are arbitrary: the naming convention is determined by the (computational) linguist and depends on the larger picture (type of theory/type of application). One very successful tagging/naming convention is the Penn Treebank Tag Set

The Tags What kind of Tags might be useful?

Generation vs. Analysis The same finite state transducers we have been using for the analysis of a given surface string can also be used in reverse: for generation. The XRCE people think of analysis as lookup, of generation of lookdown.

Generation --- Lookdown canta Output: cant Input: r +PresInd +1P +Sg cant00000o ocant +Verb ar +PresInd+1P+Sg+Verb

Generation --- Lookdown Analysis Process: Start at the Start State and the beginning of the input string Match the input symbols of string against the upper- side symbols on the arcs, consuming the input symbols and finding a path to a final state. If successful, return the string of lower-side symbols on the path as the result. If generation is unsucessful, return nothing.

Concatenation One can also concatenate two existing languages (finite state networks with one another to build up new words productively/dynamically. This works nicely, but one has to write extra rules to avoid things like: *trys, *tryed, though trying is okay.

Concatenation work i ng s e d Network for the Language {“work”} Network for the Language {“s”, “ed”, “ing”}

Concatenation wor k i n g s e d Concatenation of the two networks What strings/language does this result in?

Composition Composition is an operation on two relations. Composition of the two relations and yields Example: with gives

Composition Katze chat c at chat

Katze c at chat Merging the two networks

Composition Katze cat The Composition of the Networks What is this reminiscent of?

Other Uses for the Transducers CA T cat Upper/Lower Casing