Thoughts on Treebanks Christopher Manning Stanford University.

Slides:



Advertisements
Similar presentations
Unit 4 Education: School.
Advertisements

Welcome Back to School!!! Mr. Sortina.
Evidence & Preference: Bias in Scoring TEDS-M Scoring Training Seminar Miami Beach, Florida.
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Version 1.1 Feb 2001© Banxia Software Ltd An introduction to cognitive mapping.
THERE IS NO GENERAL METHOD OR FORMULA WHICH IS ‘CORRECT’. YOU CAN PROBABLY IGNORE SOME OF THIS ADVICE AND STILL WRITE A GOOD ESSAY… BUT FOLLOWING IT MAY.
MAKING GOOD DECISIONS Somik Raha 4/16/2011
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
DS-to-PS conversion Fei Xia University of Washington July 29,
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
Perceptions of the Role of Feedback in Supporting 1 st Yr Learning Jon Scott, Ruth Bevan, Jo Badge & Alan Cann School of Biological Sciences.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Chapter 11 – Grammar: Finding a Balance
Discussion examples Andrea Zhok.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Introducing CLT While Avoiding Classroom Culture Shock Marla Yoshida.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
New Millennium English 10 Lessons 1-2 Unit 8 COMPUTERS What the future hold?
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
SET Brain pop jr. Time writing/sentence/subjectandverbagree ment/preview.weml
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
CS 4720 Usability and Accessibility CS 4720 – Web & Mobile Systems.
Assignment 2: remarks FIRST PART Please don’t make a division of labor so blatantly obvious! 1.1 recode - don't just delete everything that looks suspicious!
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
AP Language SYNTHESIS Test Strategy Olson and Bailey.
Informative Speech Scoring Guide Category4321 Body language and rate of speech Uses positive body language including movement and gestures to aid in understanding.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
Notes on Pinker ch.7 Grammar, parsing, meaning. What is a grammar? A grammar is a code or function that is a database specifying what kind of sounds correspond.
What does “assertiveness” mean?. In this lesson you will learn: The meaning of “being assertive” The difference between being assertive and being aggressive.
.. SAN Distance Learning Project Student Survey 2002 – 2003 School Year BOCES Distance Learning Program Quality Access Support.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
How to Study a Foreign Language On your mark, get set- GO!!!!!!!!
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Supertagging CMSC Natural Language Processing January 31, 2006.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
ATLAS Metadata Interface Campaign Definition in AMI S.Albrand 23/02/2016ATLAS Metadata Interface1.
Persuasive Letter Scoring Guide Category4321 Audience Demonstrates a clear understanding of the potential reader and uses appropriate vocabulary and arguments.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Almost at the end … “If you don’t remember anything else, remember this”
How To Be a Star How do I write an Exciting Expository Essay? First, consider the PROMPT carefully Do not rewrite or write about quote. Do not write.
Designing classes How to write classes in a way that they are easily understandable, maintainable and reusable 6.0.
Happy Wednesday! Please take a highlighter from the table. Please get out your bias test from yesterday. You will need a clean sheet of paper. You should.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
RAT Restate Answer the question Tell why WHY RAT? Systematic way for answering questions Provides consistency across grade-level Gives students a structure.
Natural Language Processing Vasile Rus
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
Constraining Chart Parsing with Partial Tree Bracketing
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Presentation transcript:

Thoughts on Treebanks Christopher Manning Stanford University

Q1: What do you really care about when you're building a parser? Completeness of information There’s not much point in having a treebank if really you’re having to end up doing unsupervised learning You want to be giving human value add Classic bad example: Noun compound structure in the Penn English Treebank Consistency of information If things are annotated inconsistently, you lose both in training (if it is widespread) and in evaluation Bad example Long ago constructions: as long ago as …; not so long ago Mutual information Categories should be as mutually informative as possible

Q3: What info (e.g., function tags, empty categories, coindexation) is useful, what is not? Information on function is definitely useful Should move to always having typed dependencies. Clearest example in Penn English Treebank: temporal NPs Empty categories don ’ t necessarily give much value in the dumbed-down world of Penn English Treebank parsing work Though it should be tried again/more But definitely useful if you want to know this stuff! Subcategorization/argument structure determination Natural Language Understanding!! Cf. Johnson, Levy and Manning, etc. work on long distance dependencies I ’ m sceptical that there is a categorical argument adjunct distinction to be make Leave it to the real numbers This means that subcategorization frames can only be statistical Cf. Manning (2003) I ’ ve got some more slides on this from another talk if you want …

Q3: What info (e.g., function tags, empty categories, coindexation) is useful, what is not? Do you prefer a more refined tagset for parsing? Yes. I mightn ’ t use it, but I often do The transform-detransform framework: RawInput  TransformedInput  Parser  TransformedOutput  DesiredOutput I think everyone does this to some extent Some like Johnson, Klein and Manning have exploited it very explicitly: NN-TMP, IN^T, NP-Poss, VP-VBG, NP-v, Everyone else should think about it more It ’ s easy to throw away too precise information, or to move information around deterministically (tag to phrase or vice versa), if it ’ s represented completely and consistently!

Q4: How does grammar writing interact with treebanking? In practice, they often haven’t interacted much I’m a great believer that they should Having a grammar is a huge guide to how things should be parsed and to check parsing consistency It also allows opportunities for analysis updating, etc. Cf. the Redwoods Treebank, and subsequent efforts The inability to automatically update treebanks is a growing problem Current English treebanking isn’t having much impact because of annotation differences with original PTB Feedback from users has only rarely been harvested

Q5: What methodological lessons can be drawn for treebanking? Good guidelines (loosely, a grammar!) Good, trained people Annotator buy-in Ann Bies said all this … I strongly agree! I think there has been a real underexploitation of technology for treebank validation Doing vertical searches/checks almost always turns up inconsistencies Either these or a grammar should give vertical review

Q6: What are advantages and disadvantages of pre-processing the data to be treebanked with an automatic parser? The economics are clear You reduce annotation costs The costs are clear The parser places a large bias on the trees produced Humans are lazy/reluctant to correct mistakes Clear e.g.: I think it is fair to say that many POS errors in the Penn English Treebank can be traced to the POS tagger E.g., sentence initial capitalized Separately, Frankly, Currently, Hopefully analyzed as NNP Doesn’t look like a human being’s mistakes to me. The answer: More use of technology to validate and check humans

Q7: What are the advantages of a phrase-structure and/or a dependency treebank for parsing? The current split in the literature between “ phrase-structure ” and “ dependency ” parsing is largely bogus (in my opinion) The Collins/Bikel parser operates largely in the manner of a dependency parser The Stanford parser contains a strict (untyped) dependency parser Phrase structure parsers have the advantage of phrase structure labels A dependency parser is just a phrase structure parser where you cannot refer to phrasal types or conditional on phrasal span This extra info is useful; it ’ s silly not to use it Labeling phrasal heads=dependencies is useful. Silly not to do it Automatic “ head rules ” should have had their day by now!! Scoring based on dependencies is much better than Parseval !!! Labeling dependency types is useful Especially, this will be the case in free-er word order languages