600.465 - Intro to NLP - J. Eisner1 Finite-State Programming Some Examples.

Slides:



Advertisements
Similar presentations
Second Grade Saxon Math Lesson 33
Advertisements

Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.
VOORBLAD.
How to convert a left linear grammar to a right linear grammar
Days, Months, Dates, and Ordinal Numbers
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Intro to NLP - J. Eisner1 Finite-State Methods.
1 Chapter Parsing Techniques. 2 Section 12.3 Parsing Techniques We know (via a theorem) that the context- free languages are exactly those languages.
Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Spelling Lesson 11 Spelling Abbreviations Dec. Sat. Nov. Mon. Jan. Wed. Apr. Sun. Feb. Thurs. Oct. Aug. Tues. Fri. Mar. St. Blvd. Sept. Ave. Rd. Spelling.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
January 2012 Monday Tuesday Wednesday Thursday Friday Sat/ Sun / /8 14/15 21/22 28/
Context-Free Grammars Lecture 7
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/2010 Overview of NLP tasks (text pre-processing)
Information Extraction Junichi Tsujii Graduate School of Science University of Tokyo Japan Ronen Feldman Bar Ilan University Israel.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Intro to NLP - J. Eisner1 Part-of-Speech Tagging A Canonical Finite-State Task.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Intro to NLP - J. Eisner1 Finite-State Methods.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
Natural Language Processing Lecture 6 : Revision.
Chapter 10. Parsing with CFGs From: Chapter 10 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
The Meeting Place Second Grade Saxon Math Lesson 30B.
The Meeting Place Second Grade Saxon Math Lesson 34.
Finite State Parsing & Information Extraction CMSC Intro to NLP January 10, 2006.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Grammars CPSC 5135.
PART I: overview material
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
Finite-State Methods in Natural Language Processing Lauri Karttunen LSA 2005 Summer Institute July 25, 2005.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
DAILY GRAMMAR PRACTICE (DGP)
Chart Parsing and Augmenting Grammars CSE-391: Artificial Intelligence University of Pennsylvania Matt Huenerfauth March 2005.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Abbreviation rules O’Rourke Elementary 3 rd Grade.
Non Leap YearLeap Year DateDay NumberMod 7Day NumberMod 7 13-Jan Feb Mar Apr May Jun Jul
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1.It will help them feel like part of a group and also it will make the school’s sports team feel encourage. 2.To gain knowledge 3. Because they are comfortable.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
2010 Corporate Calendar incl. release and black-out dates As of 30 December 2009.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Part-of-Speech Tagging
Context-Free Grammars: an overview
CS510 Compiler Lecture 4.
Primary Longman Elect 3A Chapter 5 Asking about dates.
Payroll Calendar Fiscal Year


Building Finite-State Machines
Lecture 7: Introduction to Parsing (Syntax Analysis)
2017 Jan Sun Mon Tue Wed Thu Fri Sat
CHAPTER 2 Context-Free Languages
R.Rajkumar Asst.Professor CSE

languages & relations regular expressions finite-state networks
Chunk Parsing CS1573: AI Application Development, Spring 2003
Jan Sun Mon Tue Wed Thu Fri Sat
Finite-State Programming
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars

TIMELINE NAME OF PROJECT Today 2016 Jan Feb Mar Apr May Jun
Reviewing Abbreviations
Presentation transcript:

Intro to NLP - J. Eisner1 Finite-State Programming Some Examples

Intro to NLP - J. Eisner2 Finite-state programming

Intro to NLP - J. Eisner3 Finite-state programming

Intro to NLP - J. Eisner4 Finite-state programming

Intro to NLP - J. Eisner5 Some Xerox Extensions $ containment => restriction replacement Make it easier to describe complex languages and relations without extending the formal power of finite-state systems. slide courtesy of L. Karttunen (modified)

Intro to NLP - J. Eisner6 Containment$[ab*c] Must contain a substring that matches ab*c. Accepts xxxacyy Rejects bcba ?* [ab*c] ?* Equivalent expression b c aa,b,c,?a,b,c,? ? Warning: ? in regexps means any character at all. ? But ? in machines means any character not explicitly mentioned anywhere in the machine.

Intro to NLP - J. Eisner7 Restriction ? c b b c ? a c a => b _ c Any a must be preceded by b and followed by c. Accepts bacbbacde Rejects baca ~[ ~[?* b] a ?* ] & ~[ ?* a ~[c ?*] ] Equivalent expression slide courtesy of L. Karttunen (modified) ab contains a not preceded by b ac contains a not followed by c

Intro to NLP - J. Eisner8 Replacement a:b b a ? ? b:a a a:b a b -> b a Replace ab by ba. Transduces to Transduces abcdbaba to bacdbbaa [ ~$[a b] [[a b].x. [b a]] ]* ~$[a b] Equivalent expression slide courtesy of L. Karttunen (modified)

Intro to NLP - J. Eisner9 Replacement is Nondeterministic a b -> b a | x Replace ab by ba or x, nondeterministically. Transduces to {,,,} Transduces abcdbaba to { bacdbbaa, bacdbxa, xcdbbaa, xcdbxa }

Intro to NLP - J. Eisner10 Replacement is Nondeterministic [ a b -> b a | x ].o. [ x => _ c ] Replace ab by ba or x, nondeterministically. Transduces to {,,,} Transduces abcdbaba to { bacdbbaa, bacdbxa, xcdbbaa, xcdbxa }

Intro to NLP - J. Eisner11 Replacement is Nondeterministic a b | b | b a | a b a -> x applied to aba Four overlapping substrings match; we havent told it which one to replace so it chooses nondeterministically a b a a b a a x a a x x a x slide courtesy of L. Karttunen (modified)

Intro to NLP - J. Eisner12 More Replace Operators Optional replacement: a b (->) b a Directed replacement guarantees a unique result by constraining the factorization of the input string by Direction of the match (rightward or leftward) Length (longest or shortest) slide courtesy of L. Karttunen

Intro to NLP - J. Left-to-right, Longest-match Replacement a b | b | b a | a b x applied to aba a b a a b a a x a a x x a x slide courtesy of L. left-to-right, longest left-to-right, shortest match right-to-left, longest match right-to-left, shortest match

Intro to NLP - J. Eisner14 Using … for marking 0:[ [ 0:] ? a e i o u ] a|e|i|o|u -> [... ] p o t a t o p[o]t[a]t[o] Note: actually have to write as Note: actually have to write as -> %[... %] or or -> [... ] since [] are parens in the regexp language slide courtesy of L. Karttunen (modified)

Intro to NLP - J. Eisner15 Using … for marking 0:[ [ 0:] ? a e i o u ] a|e|i|o|u -> [... ] p o t a t o p[o]t[a]t[o] Which way does the FST transduce potatoe? slide courtesy of L. Karttunen (modified) How would you change it to get the other answer? p o t a t o e p[o]t[a]t[o][e] p[o]t[a]t[o e] vs.

Intro to NLP - J. Eisner16 Example: Finnish Syllabification define C [ b | c | d | f... define V [ a | e | i | o | u ]; s t r u k t u r a l i s m i s t r u k - t u - r a - l i s - m i [C* V+ "-" || _ [C V] Insert a hyphen after the longest instance of the C* V+ C* pattern in front of a C V pattern. C* V+ C* pattern in front of a C V pattern. slide courtesy of L. Karttunenwhy?

Intro to NLP - J. Eisner17 Conditional Replacement The relation that replaces A by B between L and R leaving everything else unchanged. A -> B Replacement L _ R Context Sources of complexity: l Replacements and contexts may overlap l Alternative ways of interpreting between left and right. slide courtesy of L. Karttunen

Intro to NLP - J. Eisner18 Hand-Coded Example: Parsing Dates Today is [Tuesday, July 25, 2000]. Today is Tuesday, [July 25, 2000]. Today is [Tuesday, July 25], Today is Tuesday, [July 25], Today is [Tuesday], July 25, Best result Bad results Need left-to-right, longest-match constraints. slide courtesy of L. Karttunen

Intro to NLP - J. Eisner19 Source code: Language of Dates Day = Monday | Tuesday |... | Sunday Month = January | February |... | December Date = 1 | 2 | 3 |... | 3 1 Year = %0To9 (%0To9 (%0To9 (%0To9))) - %0?* from 1 to 9999 AllDates = Day | (Day, ) Month Date (, Year)) slide courtesy of L. Karttunen

Intro to NLP - J. Eisner20 actually represents 7 arcs, each labeled by a string Object code: All Dates from 1/1/1 to 12/31/9999,, Feb Jan Mar May Jun Jul Apr Aug Oct Nov Dec Sep 3,, Tue Mon Wed Fri Sat Sun Thu MayJanFebMarAprJun JulAugOctNovDecSep 13 states, 96 arcs date expressions slide courtesy of L. Karttunen

Intro to NLP - J. Eisner21 Parser for Dates [DT... ] Compiles into an unambiguous transducer (23 states, 332 arcs). Today is [DT Tuesday, July 25, 2000] because yesterday was [DT Monday] and it was [DT July 24] so tomorrow must be [DT Wednesday, July 26] and not [DT July 27] as it says on the program. Xerox left-to-right replacement operator slide courtesy of L. Karttunen (modified)

Intro to NLP - J. Eisner22 Problem of Reference Valid dates Tuesday, July 25, 2000 Tuesday, February 29, 2000 Monday, September 16, 1996 Invalid dates Wednesday, April 31, 1996 Thursday, February 29, 1900 Tuesday, July 26, 2000 slide courtesy of L. Karttunen

Intro to NLP - J. Eisner23 Refinement by Intersection AllDates ValidDates WeekdayDate MaxDays In Month 31 => Jan|Mar|May| … _ 30 => Jan|Mar|Apr| … _ LeapYears Feb 29, => _ … slide courtesy of L. Karttunen (modified) Xerox contextual restriction operator Q: Why do these rules start with spaces? (And is it enough?) Q: Why does this rule end with a comma? Q: Can we write the whole rule?

Intro to NLP - J. Eisner24 Defining Valid Dates AllDates & MaxDaysInMonth & LeapYears & WeekdayDates = ValidDates AllDates: 13 states, 96 arcs date expressions ValidDates: 805 states, 6472 arcs date expressions slide courtesy of L. Karttunen

Intro to NLP - J. Eisner25 Parser for Valid and Invalid Dates [AllDates - [ID... ], [VD... ] Today is [VD Tuesday, July 25, 2000], not [ID Tuesday, July 26, 2000]. valid date invalid date 2688 states, arcs slide courtesy of L. Karttunen Comma creates a single FST that does left-to-right longest match against either pattern

Intro to NLP - J. Eisner26 More Engineering Applications Markup Dates, names, places, noun phrases; spelling/grammar errors? Hyphenation Informative templates for information extraction (FASTUS) Word segmentation (use probabilities!) Part-of-speech tagging (use probabilities – maybe!) Translation Spelling correction / edit distance Phonology, morphology: series of little fixups? constraints? Speech Transliteration / back-transliteration Machine translation? Learning …

Intro to NLP - J. Eisner27 FASTUS – Information Extraction Appelt et al, 1992-? Input: Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be shipped to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with … Output: Relationship: TIE-UP Entities:Bridgestone Sports Co. A local concern A Japanese trading house Joint Venture Company:Bridgestone Sports Taiwan Co. Amount:NT$

Intro to NLP - J. Eisner28 FASTUS: Successive Markups (details on subsequent slides) Tokenization.o. Multiwords.o. Basic phrases (noun groups, verb groups …).o. Complex phrases.o. Semantic Patterns.o. Merging different references

Intro to NLP - J. Eisner29 FASTUS: Tokenization Spaces, hyphens, etc. wouldnt would not their them s company. company. but Co. Co.

Intro to NLP - J. Eisner30 FASTUS: Multiwords set up joint venture San Francisco Symphony Orchestra, Canadian Opera Company … use a specialized regexp to match musical groups.... what kind of regexp would match company names?

Intro to NLP - J. Eisner31 FASTUS : Basic phrases Output looks like this (no nested brackets!): … [NG it] [VG had set_up] [NG a joint_venture] [Prep in] … Company Name:Bridgestone Sports Co. Verb Group:said Noun Group:Friday Noun Group:it Verb Group:had set up Noun Group:a joint venture Preposition:in Location:Taiwan Preposition:with Noun Group:a local concern

Intro to NLP - J. Eisner32 FASTUS: Noun Groups Build FSA to recognize phrases like approximately 5 kg more than 30 people the newly elected president the largest leftist political force a government and commercial project Use the FSA for left-to-right longest-match markup What does FSA look like? See next slide …

Intro to NLP - J. Eisner33 FASTUS: Noun Groups Described with a kind of non-recursive CFG … (a regexp can include names that stand for other regexps) NG Pronoun | Time-NP | Date-NP NG (Det) (Adjs) HeadNouns … Adjs sequence of adjectives maybe with commas, conjunctions, adverbs … Det DetNP | DetNonNP DetNP detailed expression to match the only five, another three, this, many, hers, all, the most … …

Intro to NLP - J. Eisner34 FASTUS: Semantic patterns BusinessRelationship = NounGroup(Company/ies) VerbGroup(Set-up) NounGroup(JointVenture) with NounGroup(Company/ies) | … ProductionActivity = VerbGroup(Produce) NounGroup(Product) NounGroup(Company/ies) NounGroup & … is made easy by the processing done at a previous level Use this for spotting references to put in the database.