LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.

Slides:



Advertisements
Similar presentations
Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
Advertisements

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CS 4705 Basic Parsing with Context-Free Grammars.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
9/8/20151 Natural Language Processing Lecture Notes 1.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Introduction to CL & NLP CMSC April 1, 2003.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
CSA2050 Introduction to Computational Linguistics Parsing I.
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
November 2009HLT: Sentence Parsing1 HLT Sentence Parsing Algorithms 2 Problems with Depth First Top Down Parsing.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Natural Language Processing Vasile Rus
Programming Languages Translator
Basic Parsing with Context Free Grammars Chapter 13
Machine Learning in Natural Language Processing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
Natural Language - General
Parsing and More Parsing
CSA2050 Introduction to Computational Linguistics
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics

LING 001 Introduction to Linguistics, Spring Computational linguistics Syntax, semantics, grammar, and the lexicon Lexical semantics and ontologies Phonology/morphology, word segmentation, and tagging Summarization Language generation Paraphrasing and textual entailment Parsing and chunking Spoken language processing, understanding and speech-to-speech translation Linguistic, psychological and mathematical models of language Computational pragmatics Dialogue and conversational agents Computational models of discourse Information retrieval Question answering Word sense disambiguation Information extraction and text mining Semantic role labeling Sentiment analysis and opinion mining Corpus-based modeling of language Machine translation and translation aids Multilingual processing Multimodal systems and representations Statistical and machine learning methods Applications Corpus development and language resources Evaluation methods and user studies

LING 001 Introduction to Linguistics, Spring Computational linguistics Emphasis on integrating linguistic and other knowledge to produce working systems. System performance is important. Computational linguistics deals with language as it’s actually used. little need to worry about rare constructions and distinctions; need to worry about fragments, typos, false starts, ambiguities, non-native speakers, etc. Ambiguity in natural language is pervasive, which makes computational linguistics hard.

LING 001 Introduction to Linguistics, Spring Ambiguity Lexical: Bank Unlockable Syntactic: I shot an elephant in my pajamas. (How he got in my pajamas, I'll never know.) I forgot how good beer tastes. I met Mary and Elena’s mother at the mall yesterday. Semantic: Every cat chases a mouse. The police refused the demonstrators a permit because they feared violence.... they advocated violence.

LING 001 Introduction to Linguistics, Spring Parsing Parsing: taking an input and producing some sort of structure for it. A syntactic parser is a device (or algorithm) that takes a phrase or sentence as input, and uses a grammar (including a lexicon) to produce the syntactic structure(s) appropriate for that phrase or sentence (often called parse trees or just trees).

LING 001 Introduction to Linguistics, Spring Context-free grammar The type of grammar often applied in parsing is known as a context-free grammar. A context-free grammar is a set of rules/productions (and a lexicon) that specify how a syntactic constituent can be composed of smaller constituents (The term “context-free” means that expanding a constituent doesn't depend on what other constituents are around it).

LING 001 Introduction to Linguistics, Spring Context-free grammar The symbols for constituents (e.g., phrases and sentences) are called non-terminal symbols. Those representing words are called terminal symbols. Each rule has a single non-terminal symbol on the left hand side of the arrow. This symbol is expanded into the symbols (non-terminal or terminal) on the right hand side. The non-terminal symbols on the right hand side can then be expanded by other rules. The vertical stroke | is just a shorthand for alternative expansions. The grammar “accepts” a sentence if there is a way of expanding S (the start symbol), then expanding all the sub-constituents, and so on, until the leaves of the tree match the words in the sentence (which are terminal symbols). If we want to accept noun phrases, we can treat NP as a start symbol.

LING 001 Introduction to Linguistics, Spring Parsing Parsing is to run a grammar backwards to find possible structures of a sentence. It can be viewed as a search problem. Top-down strategy: All the expansions of the start symbol are considered, then expansions of each of those constituents, and so on, until we reach expansions that match all the words in the sentence. (What are the problems?)

LING 001 Introduction to Linguistics, Spring Parsing Bottom-up strategy: The words are examined and all the small constituents that might contain them are postulated, then we see which of those can be fitted together into larger constituents, and so on, until we reach a tree. (what are the problems?)

LING 001 Introduction to Linguistics, Spring Parsing The left-corner strategy (top-down prediction with bottom-up verification): Make the left-most expansion (top-down), find rules that handle the left-most words (bottom-up), repeat the procedure. Does this flight include a meal?

LING 001 Introduction to Linguistics, Spring Parsing Does this flight include meal?

LING 001 Introduction to Linguistics, Spring Parsing Does this flight include meal?

LING 001 Introduction to Linguistics, Spring Probabilistic CFGs and Statistic Parsing Attach probabilities to context-free grammar rules (PCFG): the expansions for a given non-termimal sum to 1. Goal: find a single parse tree (the max probability tree) for a sentence instead of all possible parse trees.

LING 001 Introduction to Linguistics, Spring Probabilistic CFGs and Statistic Parsing.15*.40*.05*.05*…=1.5* *.40*.40*.05*…=1.7*10 -6

LING 001 Introduction to Linguistics, Spring Probabilistic CFGs and Statistic Parsing Probabilities can be computed from an annotated database (a Treebank). The Penn Treebank:

LING 001 Introduction to Linguistics, Spring Human parsing While most sentences are ambiguous in some way, people rarely notice these ambiguities. Instead, they only seem to see one interpretation for a sentence. Lexical subcategorization preferences: The women kept the dogs on the beach. The women kept the dogs which were on the beach.5% The women kept them (the dogs) on the beach.95% The women discussed the dogs on the beach. The women discussed the dogs which were on the beach. 90% The women discussed them (the dogs) while on the beach. 10% (keep has a preference for VP -> V NP PP, discuss has a preference for VP -> V NP) Part-of-speech preferences: The complex houses married and single students and their families. (houses is more likely to be a noun)

LING 001 Introduction to Linguistics, Spring Head lexicalization of PCFGs The head word of a phrase gives a good representation of the phrase’s structure and meaning. Puts the properties of words back into a PCFG. Lexicalized Probabilistic Context-Free Grammars perform much better than PCFGs (88% vs. 73% accuracy).

LING 001 Introduction to Linguistics, Spring Part of Speech tagging Part of Speech (POS) tagging: Input: the lead paint is unsafe Output: the/Det lead/N paint/N is unsafe/Adj Uses of POS tagging: Text-to-speech: how do we pronounce “lead”? which words bear a pitch accent? It can differentiate word senses that involve part of speech differences (what is the meaning of “interest”)? Tagged text helps linguists find interesting syntactic constructions in texts (“google”, “ssh”, etc. used as a verb). POS tagging is not parsing. It is highly accurate, state-of-the-art is 97% accuracy. But the baseline is already 90%: 1. Tag every word with its most frequent tag; 2. Tag unknown words as nouns.

LING 001 Introduction to Linguistics, Spring Part of Speech tagging Penn Treebank has 45 different POS tags, which is most widely used.

LING 001 Introduction to Linguistics, Spring Part of Speech tagging Percentage of the words accented (“stressed”) under each part-of- speech category in different speech genres:

LING 001 Introduction to Linguistics, Spring Hidden Markov Model POS tagger HMM model has been widely used in many fields: Natural language processing, speech synthesis/recognition, Computer vision, Biology, Economics, Climatology, etc. Top row is unobserved states (hidden states), interpreted as POS tags, bottom row is observed output (words). Find the most likely hidden state sequences (POS tag sequence) given an observation sequence (word sequence).

LING 001 Introduction to Linguistics, Spring Hidden Markov Model POS tagger Representation for Paths (hidden state sequences): Trellis

LING 001 Introduction to Linguistics, Spring HAL