Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
Theory Of Automata By Dr. MM Alam
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
YES-NO machines Finite State Automata as language recognizers.
Week 13 - Wednesday.  What did we talk about last time?  Exam 3  Before review:  Graphing functions  Rules for manipulating asymptotic bounds  Computing.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
PZ02A - Language translation
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
CS 3240 – Chuck Allison.  A model of computation  A very simple, manual computer (we draw pictures!)  Our machines: automata  1) Finite automata (“finite-state.
Topics Automata Theory Grammars and Languages Complexities
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
1 Regular Expressions/Languages Regular languages –Inductive definitions –Regular expressions syntax semantics Not covered in lecture.
Languages and Machines Unit two: Regular languages and Finite State Automata.
Language Translation Principles Part 1: Language Specification.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
9/8/20151 Natural Language Processing Lecture Notes 1.
Chapter 2 Languages.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
Finite-State Machines with No Output
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
Introduction to Theory of Automata
CMSC 330: Organization of Programming Languages Theory of Regular Expressions.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Grammars CPSC 5135.
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
1 Introduction to Regular Expressions EELS Meeting, Dec Tom Horton Dept. of Computer Science Univ. of Virginia
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
Introduction to Language Theory
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
CS 461 – Sept. 19 Last word on finite automata… –Scanning tokens in a compiler –How do we implement a “state” ? Chapter 2 introduces the 2 nd model of.
1 Module 14 Regular languages –Inductive definitions –Regular expressions syntax semantics.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Recursive Definations Regular Expressions Ch # 4 by Cohen
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Theory of Languages and Automata By: Mojtaba Khezrian.
Formal Languages and Automata FORMAL LANGUAGES FINITE STATE AUTOMATA.
BCT 2083 DISCRETE STRUCTURE AND APPLICATIONS
Finite State Machines Dr K R Bond 2009
Formal Language & Automata Theory
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Formal Language Theory
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Artificial Intelligence 2004 Speech & Natural Language Processing
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
LECTURE # 07.
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Presentation transcript:

Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State Automata and Formal Grammars

Linguistics - from Theory to Technology Computational Linguistics Theoretical Linguistics Natural Language Processing Language Technology INFO TLW Industrie

Goals of CL: * Foundations for Linguistics in Computer Science (e.g. Formal Language theory) * Computable linguistic theories (HPSG, LFG, Categorial Grammar) * Implementation of demos for linguistic theories * (Mathematical Linguistics) Computational Linguistics

Goals of NLP – practical applications of CL: * Speech recognition/synthesis * Machine translation * Summarization * Question answering * Text categorization * Grammar checking Statistical NLP: * Unsupervised * Supervised (corpus-based) Natural Language Processing

Goals of LT: * Useful linguistic resources (lexicons, grammar rules, semantics webs) * Implementation of most useful tools involving language processing (Google translation, Word spell checker, MS Speech Recognizer etc.) Language Technology Computational Linguistics Theoretical Linguistics Natural Language Processing Language Technology

Input:Output: J&M (2009) Words in textPart of speech (Noun/Verb); Morphological Information Speech Sound Text Wave textSpeech Sound Wave Sentence in textPhrases in Sentences (noun phrase, verb phrase) Sentence/textAction/Reasoning Sentence/textTranslation Language Processing - Tasks I: Words II: Speech III: Syntax IV: Semantics & Pragmatics V: Applications

Start with null information state I=0 Repeat while there is language to read: - Read a language token T - Recognize T: extract information I(T) - Update information state I using I(T) - Do some action using I(T) Processing - General Idea

I want to cash a check: -Start from state 0 -Read “I want to”, move to state 1, and output nothing -Read “cash”, move to state 3, and output V -Read “a check”, move to state 4, and output nothing Example 1 – Finite State Transducers I want to have some | ε I want to | ε cash | V cash | N a check | ε

I want to have some cash: -Start from state 0 -Read “I want to have some”, move to state 2, and output nothing -Read “cash”, move to state 4, and output N Example 1 – Finite State Transducers I want to have some | ε I want to | ε cash | V cash | N a check | ε

Your query I want to cash a check Tagging I/PRP want/VBP to/TO cash/VB a/DT check/NN Parse Example 2 – Stanford Parser linklink

Your query I want to have some cash Tagging I/PRP want/VBP to/TO have/VB some/DT cash/NN Parse Example 2 – Stanford Parser linklink

Example 3 – Google Translate

Summary We have seen ways to process: - words, word-by-word: transducers - sentences, with a tree structure: Stanford Parser A word like CASH must be disambiguated for Noun or Verb, in order to have a correct translation. Other kinds of disambiguation?

Example 4 - Word-Sense Disambiguation the light blue car: 1. de lichtblauwe auto 2.de lichte blauwe auto John likes the light blue car but not the deep blue car John was able to lift the light blue car but not the heavy blue car Google Translate: lichtblauwe auto in both cases Word-Sense disambiguation: finding the right sense of the word

Basic Model 1: Finite State Automata (FSA) q0- start state q4- accepting state arrows – transitions, also defined by a transition table

FSA - formally

Tracing the execution of an FSA “baaa!” is accepted because when taking the input symbols one by one, we reached the accepting state q4.

FSA’s as Grammars An FSA possibly describes an infinite set of strings over a finite input alphabet Σ. We thus say that an FSA describes a grammar over Σ, which derives a formal language over Σ. More officially: Σ – a finite set Σ* – all the strings over Σ (infinite) L(FSA) = the language of the FSA is the set of strings S in Σ* that are derived by the FSA. Any set described by an FSA is called regular.

Non-regular languages and complexity L = { ab, aabb, aaabbb, aaaabbbb, … } can be shown to be non-regular. No FSA can derive this language L! But there are grammars that can also generate non-regular languages! Are natural languages regular or non-regular? How hard it is for a computer to recognize regular and non-regular languages? Are there different classes of formal languages in terms of their complexity?

Another way to define regular langauges – regular expressions A regular expression is a compact way for describing a regular language. Example: baa(a*)! descibes the same language as the FSA we saw. We say that this regular expression matches any string in this language, and does not match other strings.

Regular expressions - formally Σ - a finite alphabet 1- Any string in Σ is a regular expression that matches itself “a” matches “a”; “b” matches “b”; etc. 2- If A and B are regular expressions then AB is a regular expression that matches any concatenation of a string that A matches with a string that B matches. “ab” matches “ab” 3- If A and B are regular expressions then A|B is a regular expression that matches any string that A or B match. “a|b” matches both “a” and “b” 4- If A is regular expression then A* matches any string that has zero or more As. “a*” matches the empty string, “a”, “aa”, “aaa” etc.

Examples Convention: we give precedence to *. AB* = A(B*) Convenience: we let ε match the empty string. a|b* matches {ε, "a", "b", "bb", "bbb",...} (a|b)* matches the set of all strings with no symbols other than "a" and "b", including the empty string: {ε, "a", "b", "aa", "ab", "ba", "bb", "aaa",...} ab*(c|ε) denotes the set of strings starting with "a", then zero or more "b"s and finally optionally a "c": {"a", "ac", "ab", "abc", "abb", "abbc",...}

At home Read on Transducers as preparation for Eva’s class.