CSCI 5832 Natural Language Processing

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

Natural Language Processing Lecture 3—9/3/2013 Jim Martin.
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Finite-state automata 2 Day 13 LING Computational Linguistics Harry Howard Tulane University.
Regular Expressions (RE) Used for specifying text search strings. Standarized and used widely (UNIX: vi, perl, grep. Microsoft Word and other text editors…)
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
1 Regular Expressions and Automata September Lecture #2-2.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
LIN3022 Natural Language Processing Lecture 3 Albert Gatt 1LIN3022 Natural Language Processing.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Topics Non-Determinism (NFSAs) Recognition of NFSAs Proof that regular expressions = FSAs Very brief sketch: Morphology, FSAs, FSTs Very brief sketch:
Introduction to English Morphology Finite State Transducers
Finite-state automata 3 Morphology Day 14 LING Computational Linguistics Harry Howard Tulane University.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
Computabilty Computability Finite State Machine. Regular Languages. Homework: Finish Craps. Next Week: On your own: videos +
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Natural Language Processing Lecture 2—1/15/2015 Susan W. Brown.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
CSA3050: Natural Language Algorithms Finite State Devices.
Natural Language Processing Chapter 2 : Morphology.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
1/29/02CSE460 - MSU1 Nondeterminism-NFA Section 4.1 of Martin Textbook CSE460 – Computability & Formal Language Theory Comp. Science & Engineering Michigan.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
CSE202: Introduction to Formal Languages and Automata Theory
CPSC 503 Computational Linguistics
Lecture 7 Summary Survey of English morphology
Speech and Language Processing
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Natural Language Processing
Speech and Language Processing
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CSCI 5832 Natural Language Processing
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 11/24/2018 LING 138/238 Autumn 2004.
CSCI 5832 Natural Language Processing
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Non-Deterministic Finite Automata
CSC NLP - Regex, Finite State Automata
CHAPTER 2 Context-Free Languages
Finite Automata.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CSCI 5832 Natural Language Processing
MA/CSSE 474 Theory of Computation
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Chapter 1 Regular Language
CPSC 503 Computational Linguistics
Lecture 5 Scanning.
Morphological Parsing
CSCI 5832 Natural Language Processing
Presentation transcript:

CSCI 5832 Natural Language Processing Lecture 3 Jim Martin 11/12/2018 CSCI 5832 Spring 2006

Today 1/23 Review FSA Combining FSA English Morphology Determinism and Non-Determinism Combining FSA English Morphology 11/12/2018 CSCI 5832 Spring 2006

Review Regular expressions are just a compact textual representation of FSAs Recognition is the process of determining if a string/input is in the language defined by some machine. Recognition is straightforward with deterministic machines. 11/12/2018 CSCI 5832 Spring 2006

D-Recognize 11/12/2018 CSCI 5832 Spring 2006

Three Views Three equivalent formal ways to look at what we’re up to (not including tables) Regular Expressions Finite State Automata Regular Languages 11/12/2018 CSCI 5832 Spring 2006

Regular Languages More on these in a couple of weeks S → b a a A 11/12/2018 CSCI 5832 Spring 2006

Non-Determinism 11/12/2018 CSCI 5832 Spring 2006

Non-Determinism cont. Yet another technique Epsilon transitions Key point: these transitions do not examine or advance the tape during recognition 11/12/2018 CSCI 5832 Spring 2006

Equivalence Non-deterministic machines can be converted to deterministic ones with a fairly simple construction That means that they have the same power; non-deterministic machines are not more powerful than deterministic ones in terms of the languages they can accept It also means that one way to do recognition with a non-deterministic machine is to turn it into a deterministic one. 11/12/2018 CSCI 5832 Spring 2006

Non-Deterministic Recognition In a ND FSA there exists at least one path through the machine for a string that is in the language defined by the machine. But not all paths directed through the machine for an accept string lead to an accept state. No paths through the machine lead to an accept state for a string not in the language. 11/12/2018 CSCI 5832 Spring 2006

Non-Deterministic Recognition So success in a non-deterministic recognition occurs when a path is found through the machine that ends in an accept. Failure occurs when all of the possible paths lead to failure. 11/12/2018 CSCI 5832 Spring 2006

Example b a a a ! \ q0 q1 q2 q2 q3 q4 11/12/2018 CSCI 5832 Spring 2006

Example 11/12/2018 CSCI 5832 Spring 2006

Example 11/12/2018 CSCI 5832 Spring 2006

Example 11/12/2018 CSCI 5832 Spring 2006

Example 11/12/2018 CSCI 5832 Spring 2006

Example 11/12/2018 CSCI 5832 Spring 2006

Example 11/12/2018 CSCI 5832 Spring 2006

Example 11/12/2018 CSCI 5832 Spring 2006

Example 11/12/2018 CSCI 5832 Spring 2006

Key Points States in the search space are pairings of tape positions and states in the machine. By keeping track of as yet unexplored states, a recognizer can systematically explore all the paths through the machine given an input. 11/12/2018 CSCI 5832 Spring 2006

ND-Recognize 11/12/2018 CSCI 5832 Spring 2006

Infinite Search If you’re not careful such searches can go into an infinite loop. How? 11/12/2018 CSCI 5832 Spring 2006

Why Bother? Non-determinism doesn’t get us more formal power and it causes headaches so why bother? More natural (understandable) solutions 11/12/2018 CSCI 5832 Spring 2006

Compositional Machines Formal languages are just sets of strings Therefore, we can talk about various set operations (intersection, union, concatenation) This turns out to be a useful exercise 11/12/2018 CSCI 5832 Spring 2006

Union 11/12/2018 CSCI 5832 Spring 2006

Concatenation 11/12/2018 CSCI 5832 Spring 2006

Negation Construct a machine M2 to accept all strings not accepted by machine M1 and reject all the strings accepted by M1 Invert all the accept and not accept states in M1 Does that work for non-deterministic machines? 11/12/2018 CSCI 5832 Spring 2006

Intersection Accept a string that is in both of two specified languages An indirect construction… A^B = ~(~A or ~B) 11/12/2018 CSCI 5832 Spring 2006

Motivation Let’s have a meeting on Thursday, Jan 26th. Writing an FSA to recognize English date expressions is not terribly hard. Except for the part about rejecting invalid dates. Write two FSAs: one for the form of the dates, and one for the calendar arithmetic part Intersect the two machines 11/12/2018 CSCI 5832 Spring 2006

Administration Homework questions? Anything else? 11/12/2018 CSCI 5832 Spring 2006

Assignment 1 Strings are an easy and not very good way to represent texts Normally, we want lists of sentences that consist of lists of tokens, that ultimately may point to strings representing words (lexemes) Lists are central to Python and will make your life easy if you let them 11/12/2018 CSCI 5832 Spring 2006

Transition Finite-state methods are particularly useful in dealing with a lexicon. Lots of devices, some with limited memory, need access to big lists of words. So we’ll switch to talking about some facts about words and then come back to computational methods 11/12/2018 CSCI 5832 Spring 2006

English Morphology Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes We can usefully divide morphemes into two classes Stems: The core meaning-bearing units Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions 11/12/2018 CSCI 5832 Spring 2006

English Morphology We can also divide morphology up into two broad classes Inflectional Derivational 11/12/2018 CSCI 5832 Spring 2006

Word Classes By word class, we have in mind familiar notions like noun and verb We’ll go into the gory details in Ch 5 Right now we’re concerned with word classes because the way that stems and affixes combine is based to a large degree on the word class of the stem 11/12/2018 CSCI 5832 Spring 2006

Inflectional Morphology Inflectional morphology concerns the combination of stems and affixes where the resulting word Has the same word class as the original Serves a grammatical/semantic purpose that is Different from the original But nevertheless transparently related to the original 11/12/2018 CSCI 5832 Spring 2006

Nouns and Verbs (English) Nouns are simple Markers for plural and possessive Verbs are only slightly more complex Markers appropriate to the tense of the verb 11/12/2018 CSCI 5832 Spring 2006

Regulars and Irregulars Ok so it gets a little complicated by the fact that some words misbehave (refuse to follow the rules) Mouse/mice, goose/geese, ox/oxen Go/went, fly/flew The terms regular and irregular will be used to refer to words that follow the rules and those that don’t. 11/12/2018 CSCI 5832 Spring 2006

Regular and Irregular Verbs Regulars… Walk, walks, walking, walked, walked Irregulars Eat, eats, eating, ate, eaten Catch, catches, catching, caught, caught Cut, cuts, cutting, cut, cut 11/12/2018 CSCI 5832 Spring 2006

Derivational Morphology Derivational morphology is the messy stuff that no one ever taught you. Quasi-systematicity Irregular meaning change Changes of word class 11/12/2018 CSCI 5832 Spring 2006

Derivational Examples Verb/Adj to Noun -ation computerize computerization -ee appoint appointee -er kill killer -ness fuzzy fuzziness 11/12/2018 CSCI 5832 Spring 2006

Derivational Examples Noun/Verb to Adj -al Computation Computational -able Embrace Embraceable -less Clue Clueless 11/12/2018 CSCI 5832 Spring 2006

Compute Many paths are possible… Start with compute Computer -> computerize -> computerization Computation -> computational Computer -> computerize -> computerizable Compute -> computee 11/12/2018 CSCI 5832 Spring 2006

Simple Rules 11/12/2018 CSCI 5832 Spring 2006

Adding in the Words 11/12/2018 CSCI 5832 Spring 2006

Derivational Rules 11/12/2018 CSCI 5832 Spring 2006

Parsing/Generation vs. Recognition Recognition is usually not quite what we need. Usually if we find some string in the language we need to find the structure in it (parsing) Or we have some structure and we want to produce a surface form (production/generation) Example From “cats” to “cat +N +PL” 11/12/2018 CSCI 5832 Spring 2006

Finite State Transducers The simple story Add another tape Add extra symbols to the transitions On one tape we read “cats”, on the other we write “cat +N +PL” 11/12/2018 CSCI 5832 Spring 2006

Next Time On to Chapter 3 11/12/2018 CSCI 5832 Spring 2006

FSAs and the Lexicon First we’ll capture the morphotactics The rules governing the ordering of affixes in a language. Then we’ll add in the actual words 11/12/2018 CSCI 5832 Spring 2006