Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology.

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Principles of Programming Fundamental of C Programming Language and Basic Input/Output Function 1.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
JavaScript, Third Edition
Python. What is Python? A programming language we can use to communicate with the computer and solve problems We give the computer instructions that it.
CPSC 388 – Compiler Design and Construction
CS190/295 Programming in Python for Life Sciences: Lecture 1 Instructor: Xiaohui Xie University of California, Irvine.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Theory Of Automata By Dr. MM Alam
1 Lab Session-III CSIT-120 Fall 2000 Revising Previous session Data input and output While loop Exercise Limits and Bounds Session III-B (starts on slide.
Languages & Strings String Operations Language Definitions.
Fortran 1- Basics Chapters 1-2 in your Fortran book.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
IST 210: PHP BASICS IST 210: Organization of Data IST210 1.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Programming I Introduction Introduction The only way to learn a new programming language is by writing programs in it. The first program to.
Introduction to Java Thanks to Dan Lunney (SHS). Java Basics File names The “main” method Output to screen Escape Sequence – Special Characters format()
CONVERTING TO CHOMSKY NORMAL FORM
The string data type String. String (in general) A string is a sequence of characters enclosed between the double quotes "..." Example: Each character.
CS190/295 Programming in Python for Life Sciences: Lecture 3 Instructor: Xiaohui Xie University of California, Irvine.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Finite State Transducers for Morphological Parsing
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
Chapter 2: Java Fundamentals
8/29/2014BCHB Edwards Introduction to Python BCHB Lecture 2.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CPS 506 Comparative Programming Languages Syntax Specification.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Introduction to Lex Fan Wu
Chapter 2 part #1 C++ Program Structure
CSA3050: Natural Language Algorithms Finite State Devices.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
By Mr. Muhammad Pervez Akhtar
Three Basic Concepts Languages Grammars Automata.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
CS201 Introduction to Sabancı University 1 Chapter 2 Writing and Understanding C++ l Writing programs in any language requires understanding.
IST 210: PHP Basics IST 210: Organization of Data IST2101.
Unix RE’s Text Processing Lexical Analysis.   RE’s appear in many systems, often private software that needs a simple language to describe sequences.
Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Finite State Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Wel come.
Finite State Morphology
CS190/295 Programming in Python for Life Sciences: Lecture 1
Programming in Perl Introduction
Introduction to Python
Introduction to CS Your First C Programs
MSIS 655 Advanced Business Applications Programming
1.5 Regular Expressions (REs)
Unit 3: Variables in Java
Chapter 2 part #1 C++ Program Structure
PYTHON - VARIABLES AND OPERATORS
Presentation transcript:

Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology and Electronic Dictionaries SoSe

Outline How to work with SFST? SFST Programming Exercises

SFST programming language for developing finite-state transducers compiler which translates programs to transducers tools for – applying transducers – printing transducers – comparing transducers

SFST Example Session > echo "Hello\ World\!" > test.fststoring a small test program > fst-compiler test.fst test.acalling the compiler test.fst: 2 > fst-mor test.ainteractive transducer usage reading transducer...transducer is loaded finished. analyze> Hello World!input Hello World!recognised analyze> Hello Worldanother input no result for Hello Worldnot recognised analyze> qterminate program

SFST Basics Write code in terminal: echo " program code " > filename.fst Write code in a file: write program code, save as filename.fst Compile: fst-compiler filename.fst filename.a Execute: fst-mor filename.a

SFST Programming Language Colon operator a:b empty string symbol <> Example: m:m o:i u:<> s:c e:e identity mapping a (an abbreviation for a:a) Example: m o:i u:<> s:c e {abc}:{AB} is expanded to a:A b:B c:<> Example: {mouse}:{mice} Is this expression equivalent to the previous two?

Exercise Try to write these examples as code and try out, what they analyze and generate. m:m o:i u:<> s:c e:e m o:i u:<> s:c e {mouse}:{mice}

Exercise Try to write these examples as code and try out, what they analyze and generate. John | Mary | James mouse | {mouse}:{mice}

Disjunction John | Mary | James accepts these three strings and maps them onto themselves mouse | {mouse}:{mice} analyses mouse and mice as mouse

Multi-Character Symbols strings enclosed in are treated as a single unit. {mouse }:{mice} analyses mice as mouse

Multi-Character Symbols A more complex example: schreib { }:{} (\ { }:{e} |\ { }:{st} |\ { }:{t} |\ { }:{en} |\ { }:{t} |\ { }:{en}) The backslashes (\) indicate that the expression continues in the next line What is the analysis of schreibst and schreiben?

Exercise Try to write this example as code and try out, what it analyzes and generates. schreib { }:{} (\ { }:{e} |\ { }:{st} |\ { }:{t} |\ { }:{en} |\ { }:{t} |\ { }:{en})

Multi-Character Symbols What is the analysis of schreibst and schreiben? Schreibst: schreib Schreiben: schreib schreib

Character Ranges [abc]:[AB] is expanded to a:A|b:B|c:B Example: [a-z A-Z]:[b-za B-ZA]* What do you get for the input IBM? What does this transducer recognise? [+-]? [0-9]* (\. [0-9]+)? Note:Characters with a special meaning (! ?. & % | * : ( ) [ ] { } etc.) need to be quoted with a backslash. Blanks are ignored unless they are quoted with a backslash.

Exercise Try to write these examples as code and try out, what they analyze and generate. [a-z A-Z]:[b-za B-ZA]* [+-]? [0-9]* (\. [0-9]+)?

Character Ranges What do you get for the input IBM? HAL What does this transducer recognise? Integers ans Floats, negative or positive

Composition [a-z A-Z]:[b-za B-ZA]* || [a-z A-Z]:[b-za B-ZA]* The output of the first transducer is the input of the next transducer (in generation mode). The compiler produces a single transducer which directly produces the final output without an intermediate step.

Exercise Try to write this examples as code and try out, what it analyzes and generates. [a-z A-Z]:[b-za B-ZA]* || [a-z A-Z]:[b-za B-ZA]*

Printing Transducers > echo ‘[a-z A-Z]:[b-za B-ZA]* || [a-z A-Z]:[b-za B-ZA]*‘ > test.fst > fst-compiler-utf8 test.fst test.a > fst-print test.a 0 0 Y A 0 0 Z B 0 0 A C 0 0 B D 0 0 C E … 0 0 x z 0

SFST Programs fst-compiler test.fst test.a reads an SFST program and translates it into a transducer. Use fst-compiler-utf8 it you use Unicode symbols. fst-print test.a prints the transitions of a compiled transducer in tabular form fst-generate test.a prints all the string-to-string mappings (with colon notation) and might not terminate. fst-mor test.a analyse and generate strings interactively fst-compact test.a test.ca convert the transducer to a more efficient format fst-infl2 test.ca input.txt analyses a sequence of strings with a compact transducer Call the programs with option „-h“ to get information on available options.

Transducer Variables $Vroot$ = walk | talk | bark% list of verbs with regular inflection $Vinfl$ = :<> (\% regular verbal inflection [ ]:<> |\ { }:{s} |\ { }:{ing} |\ { }:{ed}) $Nroot$ = hat | head | trick% list of nouns with regular inflection $Ninfl$ = :<> (\% regular nominal inflection { }:{} |\ { }:{s}) $Vroot$ $Vinfl$ | $Nroot$ $Ninfl$% combine stems and inflectional endings

Exercise Try to write this examples as code and try out, what it analyzes and generates. $Vroot$ = walk | talk | bark % list of verbs with regular inflection $Vinfl$ = :<> (\ % regular verbal inflection [ ]:<> |\ { }:{s} |\ { }:{ing} |\ { }:{ed}) $Nroot$ = hat | head | trick % list of nouns with regular inflection $Ninfl$ = :<> (\ % regular nominal inflection { }:{} |\ { }:{s}) $Vroot$ $Vinfl$ | $Nroot$ $Ninfl$

Exercise Write a program that maps all letters to a number Write a program that maps the word foot onto it´s plural form

Solution [ ] :[a-z A-Z]* f e:o e:o t

Homework Write a pipeline that maps all letters to lowercase and orders them backwards Family Huber has three children. Their first child is called Mia, the next one Toni and the last one Pia. Family Band has three children as well. Michael, Paul and Pia. Write a program, that can tell us the following details about the children: which family does he/she belong to, is it a son or a daugter, was he/she the first, second or third child. Output format:

Thank you for your attention