Beesley 2001 Finite-State Technology and Linguistic Applications 12-16 March 2001 Xerox Research Centre Europe Grenoble Laboratory 6, chemin de Maupertuis.

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

Beesley 2000 Introduction to the xfst Interface Review Introduction to Morphology Relations and Transducers Introduction to xfst.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Lecture 2 Introduction To Sets CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Week 13 - Wednesday.  What did we talk about last time?  Exam 3  Before review:  Graphing functions  Rules for manipulating asymptotic bounds  Computing.
Writing Lexical Transducers Using xfst
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
CS5371 Theory of Computation
Theory of Computation What types of things are computable? How can we demonstrate what things are computable?
1 Languages and Finite Automata or how to talk to machines...
Languages and Machines Unit one: Formal Languages.
Normal forms for Context-Free Grammars
Theoretical Computer Science COMP 335 Fall 2004
Topics Automata Theory Grammars and Languages Complexities
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
May 2007CLINT/LIN xfst 1 Introduction to the xfst Interface Review Introduction to Morphology Relations and Transducers Introduction to xfst.
1 Introduction to Automata Theory Reading: Chapter 1.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Computational Lexicology, Morphology and Syntax Diana Trandab ă ț Academic year
1 Chapter 1 Automata: the Methods & the Madness Angkor Wat, Cambodia.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Algorithms and their Applications CS2004 ( ) Dr Stephen Swift 3.1 Mathematical Foundation.
Lecture 1 Computation and Languages CS311 Fall 2012.
LING/C SC/PSYC 438/538 Lecture 7 9/15 Sandiway Fong.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Human Language Technology Finite State Transducers.
Introduction to Computational Linguistics Finite State Machines (derived from Ken Beesley)
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Lecture 5 Regular Expressions CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
THEORY OF COMPUTATION Komate AMPHAWAN 1. 2.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
JavaScript 101 Introduction to Programming. Topics What is programming? The common elements found in most programming languages Introduction to JavaScript.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Theory of computation Introduction theory of computation: It comprises the fundamental mathematical properties of computer hardware, software,
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
Introduction Why do we study Theory of Computation ?
Introduction Why do we study Theory of Computation ?
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
C Sc 132 Computing Theory Professor Meiliu Lu Computer Science Department.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
A Quick Review of Set Theory A set is a collection of objects. A B D E We can enumerate the “members” or “elements” of finite sets: { A, D, B, E }. There.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Akram Salah ISSR Basic Concepts Languages Grammar Automata (Automaton)
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Introduction to Automata Theory
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
CSE202: Introduction to Formal Languages and Automata Theory
Composition is Our Friend
Introduction to Automata Theory
Compiler Construction
Discrete Maths 13. Grammars Objectives
Presentation transcript:

Beesley 2001 Finite-State Technology and Linguistic Applications March 2001 Xerox Research Centre Europe Grenoble Laboratory 6, chemin de Maupertuis MEYLAN, France Kenneth BEESLEY

Beesley 2001 Ken Beesley: Brief Introduction B.A., Linguistics and Computer Science, Brigham Young University, 1978 Diploma, Linguistics and Phonetics, Univ. of Glasgow, 1979 D.Phil., “Epistemics” (Cognitive Science), Univ. of Edinburgh, 1983 ALPNET, computer assisted translation, Arabic morphology project, exposure to Finite-State Morphology from Lauri Karttunen at COLING 1988 Microlytics (Xerox spinoff), Xerox Corporation 1993-present Morphology projects: Arabic, Spanish, Portuguese, Italian, Dutch, (Malay), (Aymara); also teaching finite-state programming techniques Some people are into finite-state programming for the mathematics and algorithms; I’m in it because it lets me build working systems for interesting natural languages.

Beesley 2001 Goals for the Week Introduce finite-state theory Introduce the Xerox Finite-State “Calculus”, a practical software implementation of the theory: xfst, lexc Try to convince you that finite-state natural-language processing is a Good Thing The Hope: Inspire a few of you to start your own computational projects, perhaps on Maltese Finite-state techniques are widely used today in both research and industry for natural-language processing. The software implementations and documentation are improving steadily, and they are increasingly available to all of us.

Beesley 2001 Schedule Monday 12 MarchLC1117Gentle Introduction Tuesday 13 Unix LabIntro. to xfst Wednesday 14Unix LabMore on xfst Thursday 15Unix LabIntro. to lexc Friday CCTLinguistics Circle

Beesley 2001 Today’s Goals Understand “Regular” Languages and Relations. Understand the mathematical operations that can be performed on such Languages and Relations. Understand how Languages, Relations, Regular Expressions, and Networks are interrelated. Understand that we can create finite-state networks and compute with them using Xerox Finite-State Technology xfst interface –Regular-Expression Compiler –Access to Finite-State Algorithms lexc language –Used mainly for lexicons and for describing morphotactics

Beesley 2001 Why is “Finite State” Computing So Interesting? Finite-state systems are mathematically elegant, easily manipulated and modifiable. Computationally efficient. Usually very compact. The programming we linguists do is declarative. We describe the facts of our natural language; i.e. we write grammars. We do not hack ad hoc code. The runtime code, which applies our systems to linguistic input, is already written and it is completely language-independent. Finite-state systems are inherently bidirectional: we can use the same system to analyze and to generate.

Beesley 2001 What is Finite-State Computing Good For? Mostly “lower-level” natural language processing Tokenization Spelling checking/correction Phonology Morphological Analysis/Generation Emphasis this week Part-of-Speech Tagging “Shallow” Syntactic Parsing and “Chunking” Finite-state techniques cannot do everything; but for tasks where they do apply, they are extremely attractive.

Beesley 2001 Where is Xerox Finite-State Technology Used? Xerox Research Xerox Palo Alto Research Center Xerox Research Centre Europe Xerox Business Units and Partners ATS MKMS Inxight Universities and Research Groups Over 70 licensees We would like to make Xerox technology the de facto standard

Beesley 2001 The Gentle Introduction Chapter 1 of The Book Physical Finite-State Machines (Automata) Linguistic Finite-State Machines –Symbol –Alphabet –Language Lookup and Generation Quick Review of Set Theory Languages, Relations and Transducers

Beesley 2001 Physical Machines with Finite States The Lightswitch Machine OFFON PUSH UP PUSH DOWN

Beesley 2001 Physical Machines with Finite States The Lightswitch Toggle Machine OFFON PUSH

Beesley 2001 Physical Machines with Finite States The Fan in Ken’s Old Car OFF HILOWMED RRR L LL

Beesley 2001 Physical Machines with Finite States Three-Way Lightswitch OFF HILOWMED RRR R

Beesley 2001 The Cola Machine Need to enter 25 cents (USA) to get a drink Accepts the following coins: Nickel = 5 cents Dime = 10 cents Quarter = 25 cents For simplicity, our machine needs exact change We will model only the coin-accepting mechanism

Beesley 2001 Physical Machines with Finite States The Cola Machine 0 N D Q NNNN DDD Start StateFinal/Accept State

Beesley 2001 The Cola Machine Language List of all the sequences of coins accepted: Q DDN DND NDD DNNN NDNN NNDN NNND NNNNN Think of the coins as SYMBOLS or CHARACTERS The set of symbols accepted is the ALPHABET of the machine Think of sequences of coins as WORDS or “strings” The set of words accepted by the machine is its LANGUAGE

Beesley 2001 Linguistic Machines c ant o t i g re m e s a m e s a “Apply”

Beesley 2001 More Linguistic Machines clea e m e s a s “Apply Up” v r e “Apply Down” m esa+Noun+Fem +Pl mesa 00s A Transducer mesas+Noun+Fem+Pl

Beesley 2001 A Morphological Analyzer Transducer Surface Word Language Analysis Word Language

Beesley 2001 A Quick Review of Set Theory A set is a collection of objects. A B D E We can enumerate the “members” or “elements” of finite sets: { A, D, B, E}. There is no significant order in a set, so { A, D, B, E } is the same set as { E, A, D, B }, etc.

Beesley 2001 Uniqueness of Elements You cannot have two or more ‘A’ elements in the same set A B DE { A, A, D, B, E} is just a redundant specification of the set { A, D, B, E }.

Beesley 2001 Cardinality of Sets The Empty Set: A Finite Set: An Infinite Set: e.g. The Set of all Positive Integers Norway Denmark Sweden

Beesley 2001 Simple Operations on Sets: Union A B C D E Set 1Set 2 B C A D E Union of Set1 and Set 2

Beesley 2001 Simple Operations on Sets (2): Union A B C C D Set 1Set 2 B C A D Union of Set1 and Set 2

Beesley 2001 Simple Operations on Sets (3): Intersection A B C C D Set 1Set 2 C Intersection of Set1 and Set 2

Beesley 2001 Simple Operations on Sets (4): Subtraction A B C C D Set 1Set 2 A B Set 1 minus Set 2

Beesley 2001 Formal Languages Very Important Concept in Formal Language Theory: A Language is just a Set of Words. We use the terms “word” and “string” interchangeably. A Language can be empty, have finite cardinality, or be infinite in size. You can union, intersect and subtract languages, just like any other sets.

Beesley 2001 Union of Languages (Sets) dog cat rat elephant mouse Language 1 Language 2 dog cat rat elephant mouse Union of Language 1 and Language 2

Beesley 2001 Intersection of Languages (Sets) dog cat rat elephant mouse Language 1 Language 2 Intersection of Language 1 and Language 2

Beesley 2001 Intersection of Languages (Sets) dog cat rat rat mouse Language 1 Language 2 Intersection of Language 1 and Language 2 rat

Beesley 2001 Subtraction of Languages (Sets) dog cat rat rat mouse Language 1 Language 2 Language 1 minus Language 2 dog cat

Beesley 2001 Languages A language is a set of words (=strings). Words (strings) are composed of symbols (letters) that are “concatenated” together. At another level, words are composed of “morphemes”. In most natural languages, we concatenate morphemes together to form whole words. For sets consisting of words (i.e. for Languages), the operation of concatenation is very important.

Beesley 2001 Concatenation of Languages work talk walk Root Language 0 ing ed s Suffix Language work working worked works talk talking talked talks walk walking walked walks The concatenation of the Suffix language after the Root language.

Beesley 2001 Languages and Networks w a l k o r t Network/Language 1 Network/Language 2 s o r s The concatenation of Network 1 and Network 2 w a l k t a a s e d i n g 0 s e d i 0 s

Beesley 2001 Grammars, Languages, Networks Grammar written in xfst or lexc Language or Relation Finite-State Network DescribesCompiles Into Recognize or Map In the coming days, we will learn how to write xfst and lexc grammars and compile them into working systems.

Beesley 2001 Tasks/Exercises Read chapter 1, at least up to page 28 Do Exercises (page 34) and (page 36). For more rigor, read Chapter 2. Do the graphing exercise in Appendix B (page 381).