Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.

Slides:

Advertisements

Similar presentations

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Advertisements

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.

® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

Drexel – 4/22/13 1/39 Treebank Analysis Using Derivation Trees Seth Kulick

Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.

A COMPARISON OF APPROACHES FOR VERIFYING SOUTHWEST REGIONAL GAP VERTEBRATE-HABITAT DISTRIBUTION MODELS J. Judson Wynne, Charles A. Drost and Kathryn A.

1/13 Parsing III Probabilistic Parsing and Conclusions.

NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 Released 12/2007 Unified Linguistic Annotation Workshop Adam Meyers New York University March 18,

1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented.

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Automated Essay Evaluation Martin Angert Rachel Drossman.

Mining and Summarizing Customer Reviews

Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓

Computer-Aided Language Processing Ruslan Mitkov University of Wolverhampton.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.

NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.

Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová

Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

1 DISTRIBUTED SYSTEMS RESEARCH GROUP CHARLES UNIVERSITY IN PRAGUE Faculty of Mathematics and Physics 2 INTERNATIONAL INSTITUTE.

Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.

CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.

C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.

Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.

Developing An Emotion Corpus Sophia Lee, Hongzhi Xu and Chu-Ren Huang The Hong Kong Polytechnic University March 11, 2014 VariAMU Workshop.

NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder.

NLP. Parsing Manually label a set of instances. Split the labeled data into training and testing sets. Use the training data to find patterns. Apply.

Supertagging CMSC Natural Language Processing January 31, 2006.

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.

Natural Language Processing Vasile Rus

Language Identification and Part-of-Speech Tagging

[A Contrastive Study of Syntacto-Semantic Dependencies]

David Mareček and Zdeněk Žabokrtský

Semantic Parsing for Question Answering

Sample Selection for Statistical Parsing

Social Knowledge Mining

Topics in Linguistics ENG 331

Probabilistic and Lexicalized Parsing

Topic Scotland and the impact of the Great

iSRD Spam Review Detection with Imbalanced Data Distributions

Dialogue State Tracking & Dialogue Corpus Survey

Presentation transcript:

Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia University 1

Background Most standard techniques for text analysis rely on existing annotated data LDC and ELRA provide annotated data for many tasks But systems do poorly when applied to text from a different domain or genre 2 Can annotation tasks be extended to new genres at low cost?

Experiment Determine whether annotators without formal linguistic training can do as well as linguists: Task: Identify the correct attachment point for a given prepositional phrase (PP) Annotators: workers on Amazon Mechanical Turk Evaluation: Comparison with Penn Treebank 3

Approach Automatic extraction of PPs plus correct and plausible attachment points from Penn Treebank Creation of multiple choice questions for each PP to post on Mechanical Turk Comparison of worker responses to Treebank 4

Outline Related Work Extracting PPs and attachment points User Studies Evaluation and analysis 5

Related Work Recent work in PP attachment achieved 83% accuracy on formal genres (Agirre et al 2008) PP attachment training typically done on RRR dataset (Ratnaparkhi et al 1994) – Presumes the presence of an oracle to extract 2 hypotheses Previous research has evaluated workers for other smaller scale tasks (Snow 2008) 6

Extracting PPs and Attachment Points The meeting, which is expected to draw 20,000 to Bangkok, was going to be held at the Central Plaza Hotel, but the government balked at the hotel’s conditions for undertaking the necessary expansions. 7

Extracting PPs and Attachment Points 8 PPs are found through tree traversal The closest left sibling is the correct attachment Verbs or NPs to left are plausible attachments

9

User Studies Pilot Study – 20 PP attachment cases – Experimented with 3 question wordings – Selected wording with most accurate responses (16/20) Full Study – Ran question extraction on 3000 Penn Treebank sentences – Selected first 1000 for questions avoiding Similar sentences (e.g. “University of Pennsylvania” “University of Colorado”) Complex constructions where tree structure didn’t identify answer (e.g., “The decline was even steeper than in November.’’) Forward modification – Workers self-identified as US residents – Each question posed to 3 workers 10

Full Study Statistics Average time/task: 49 seconds 5 hours and 25 min to complete entire task Total expense: $135 – $120 on workers – $15 on mechanical turk fee 11

Results BasisPercent Correct Attachment Points 3000 individual responses86.7% Unanimous agreement for 1000 responses 71.8% Majority agreement for 1000 responses 92.2% 12

Error Analysis Manual analysis of incorrect cases (78) Difficulty when correct attachment point a verb or adj – The morbidity rate is a striking finding among many of us No problem when correct attachment point a noun System incorrectly handled conjunction as attachment point – Workers who chose the first constituent marked incorrect – The thrift holding company said it expects to obtain regulatory approval and complete the transaction by year-end. 13

14 Number of Questions When 3/3 agree, response is correct 97% of the time When just 2/3 agree, response is correct 82% of the time When no agreement, the answer is always wrong

Conclusions Non-experts capable of disambiguating PP attachment in Wall Street Journal Accuracy increases by 15% from agreement between 2 to 3 workers -> possible higher accuracy with more Methodology for obtaining large corpora for new genres and domains What’s next? See our paper in the NAACL Workshop on Amazon Mechanical Turk Presents a method and results for collecting PP attachment on blogs without parsing 15