Desiderata for Annotating Data to Train and Evaluate Bootstrapping Algorithms Ellen Riloff School of Computing University of Utah.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

Comparing Methods to Improve Information Extraction System using Subjectivity Analysis Prepared by: Heena Waghwani Guided by: Dr. M. B. Chandak.

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae.

Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,

Introduction to Machine Learning Approach Lecture 5.

Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 9 Subjective Test Items.

Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.

A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,

McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Active Learning for Class Imbalance Problem

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.

Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Presenter: Shanshan Lu 03/04/2010

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.

인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.

Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Presented By- Shahina Ferdous, Student ID – , Spring 2010.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.

Analysis of Bootstrapping Algorithms Seminar of Machine Learning for Text Mining UPC, 18/11/2004 Mihai Surdeanu.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh.

Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.

Using Semantic Relations to Improve Information Retrieval

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Sentiment analysis algorithms and applications: A survey

Multimedia Information Retrieval

Clustering Algorithms for Noun Phrase Coreference Resolution

Statistical NLP: Lecture 9

CS246: Information Retrieval

Statistical NLP : Lecture 9 Word Sense Disambiguation

KnowItAll and TextRunner

Presentation transcript:

Desiderata for Annotating Data to Train and Evaluate Bootstrapping Algorithms Ellen Riloff School of Computing University of Utah

Outline Overview of Bootstrapping Paradigm Diversity of Seeding Strategies Criteria for Generating Seed (Training) Data Annotating Data for Evaluation Conclusions

The Bootstrapping Era Unannotated Texts + Manual Annotation + or Automatic Annotation via Seeding

Why Bootstrapping? Manually annotating data: –is time-consuming and expensive –is deceptively difficult –often requires linguistic expertise NLP systems benefit from domain-specific training –it is not realistic to expect manually annotated data for every domain and task. –domain-specific training is sometimes essential

Additional Benefits of Bootstrapping Dramatically easier and faster system development time. – Allows for free-wheeling experimentation with different categories and domains. Encourages cross-resource experimentation. – Allows for more analysis across domains, corpora, genres, and languages.

Outline Overview of Bootstrapping Paradigm Diversity of Seeding Strategies Criteria for Generating Seed (Training) Data Annotating Data for Evaluation Conclusions

Automatic Annotation with Seeding A common goal is to avoid the need for manual text annotation. The system should be trainable with seeding that can be done by anyone! Seeding is often is done using “stand-alone” examples or rules. Fast, less expertise required, but noisier!

Seeding strategies seed words seed patterns seed rules seed heuristics seed classifiers

Seed Nouns for Semantic Class Bootstrapping Unannotated Texts Ex: anthrax, ebola, cholera, flu, plague Ex:, a virus that … diseases such as Ex: smallpox, tularemia, botulism noun Co-Occurrence Statistics Nouns

Seed Words for Extraction Pattern Learning [Riloff & Jones, 1999] Unannotated Texts Ex: anthrax, ebola, cholera, flu, plague Ex: infected with Ex: smallpox, tularemia, botulism noun Best Pattern Pattern Dict. Best Nouns Noun Dict.

Seed Words for Word Sense Disambiguation [Yarowsky, 1995] Yarowsky’s best WSD performance came from a list of top collocates, manually assigned to the correct sense. Ex: “life” and “manufacturing” for “plant” SenseTraining Example A…zonal distribution of plant life from the… A …many dangers to plant and animal life… ? …union responses to plant closures… ? …company said the plant still operating… B…copper manufacturing plant found that… B…keep a manufacturing plant profitable… Also exploited “one sense per discourse” heuristic.

Seed Patterns for Extraction Pattern Bootstrapping [Yangarber et al. 2000] best pattern Ranked Candidate Patterns Pattern Set bootstrapping RelevantIrrelevant Unannotated pattern

Seed Patterns for Relevant Region Classifier Bootstrapping [Patwardhan & Riloff, 2007] Unlabeled Sentences Relevant SVM Classifier Self-training SVM Training Irrelevant Sentences Irrelevant Pattern-based Classifier Relevant Sentences pattern

Seed Rules for Named Entity Recognition Bootstrapping [Collins & Singer, 1999] Full-string=New_YorkLocation Full-string=CaliforniaLocation Full-string=U.S.Location Contains(Mr.)Person Contains(Incorporated)Organization Full-string=MicrosoftOrganization Full-string=I.B.M.Organization rule SPELLING RULES rule CONTEXTUAL RULES Unannotated

Seed Heuristics for Coreference Classifiers [Bean & Riloff, 2005; Bean & Riloff, 1999] Existential NP Learning Existential NP Knowledge heuristic Unannotated Contextual Role Learning Contextual Role Knowledge Candidate Antecedent Evidence Gathering Resolution Decision Model heuristic New Text Reliable Case Resolutions Non-Referential NP Classifier Reliable Case Resolutions Caseframe Generation & Application

Example Coreference Seeding Heuristics Non-Anaphoric NPs: noun phrases that appear in the first sentence of a document are not anaphoric. Anaphoric NPs: Reflexive pronouns with only 1 NP in scope. The regime gives itself the right… Relative pronouns with only 1 NP in score. The brigade, which attacked… Simple appositives of the form “NP, NP” Mr. Cristiani, president of the country…

Seed Classifiers for Subjectivity Bootstrapping [Wiebe & Riloff, 2005] rule-based subjective sentence classifier rule-based objective sentence classifier subjective & objective sentences unlabeled texts subjective clues

Outline Overview of Bootstrapping Paradigm Diversity of Seeding Strategies Criteria for Generating Seed (Training) Data Annotating Data for Evaluation Conclusions

The Importance of Good Seeding Poor seeding can lead to a variety of problems: Bootstrapping learns only low-frequency cases high precision but low recall Bootstrapping thrashes only subsets learned Bootstrapping goes astray / gets derailed the wrong concept learned Bootstrapping sputters and dies nothing learned

General Criteria for Seed Data Seeding instances should be FREQUENT Want as much coverage and contextual diversity as possible! Bad animal seeds: coatimundi, giraffe, terrier

Common Pitfall #1 Assuming you know what phenomena are most frequent. … except yours! Seed instances must be frequent in your training corpus. Never assume you know what is frequent! Something may be frequent in nearly all domains and corpora

General Criteria for Seed Data Seeding instances should be UNAMBIGUOUS Bad animal seeds: bat, jaguar, turkey Ambiguous seeds create noisy training data.

Common Pitfall #2 Careless inattention to ambiguity. Something may seem like a perfect example at first, but further reflection may reveal other common meanings or uses. If a seed instance does not consistently represent the desired concept (in your corpus), then bootstrapping can be derailed.

General Criteria for Seed Data Seeding instances should be: REPRESENTATIVE Want instances that: cover all of the desired categories are not atypical category members Bad bird seeds: penguin, ostrich, hummingbird Why? Bootstrapping is fragile in its early stages…

Common Pitfall #3 Insufficient coverage of different classes or contexts. It is easy to forget that all desired classes and types need to be adequately represented in the seed data. Seed data for negative instances may need to be included as well!

Bootstrapping a Single Category

Bootstrapping Multiple Categories

General Criteria for Seed Data Seeding instances should be: DIVERSE Bad animal seeds: cat, cats, housecat, housecats… Want instances that cover different regions of the search space. Bad animal seeds: dog, cat, ferret, parakeet

Commont Pitfall #4 Need a balance between coverage and diversity. Diversity is important, but need to have critical mass representing different parts of the search space. One example from each of several wildly different classes may not provide enough traction.

Outline Overview of Bootstrapping Paradigm Diversity of Seeding Strategies Criteria for Generating Seed (Training) Data Annotating Data for Evaluation Conclusions

Evaluation A key motivation for bootstrapping algorithms is to create easily retrainable systems. And yet …bootstrapping systems are usually evaluated on just a few data sets. –Bootstrapping models are typically evaluated no differently than supervised learning models! Manually annotated evaluation data is still a major bottleneck.

The Road Ahead: A Paradigm Shift? Manual annotation efforts have primarily focused on the need for large amounts of training data. As the need for training data decreases, we have the opportunity to shift the focus to evaluation data. The NLP community has a great need for more extensive evaluation data! –from both a practical and scientific perspective

Need #1: Cross-Domain Evaluations We would benefit from more analysis of cross- domain performance. –Will an algorithm behave consistently across domains? –Can we characterize the types of domains that a given technique will (or will not) perform well on? –Are the learning curves similar across domains? –Does the stopping criterion behave similarly across domains?

Need #2: Seeding Robustness Evaluations Bootstrapping algorithms are sensitive to their initial seeds. We should evaluate systems using different sets of seed data and analyze: –quantative performance –qualitative performance –learning curves Different bootstrapping algorithms should be evaluated on the same seeds!

Standardizing Seeding? Since so many seeding strategies exist, it will be hard to generate “standard” training seeds. My belief: training and evaluation regimens shouldn’t have to be the same. Idea: domain/corpus analyses of different categories and vocabulary could help researchers take a more principled approach to seeding. Example: analysis of most frequent words, syntactic constructions, and semantic categories in a corpus.

If you annotate it, they will come People are starved for data! If annotated data is made available, people will use it. If annotated data exists for multiple domains, then evaluation expectations will change. Idea: create a repository of annotated data for different domains, coupled with well-designed annotation guidelines – Other researchers can use the guidelines to add their own annotated data to the repository.

Problem Child: Sentiment Analysis Hot research topic du jour -- everyone wants to do it! But everyone wants to use their own favorite data. Manual annotations are often done simplistically, by a single annotator. –impossible to know the quality of the annotations –impossible to compare results across papers Making annotated data available for a suite of domains might help. Making standardized annotation guidelines available might help and encourage people to share data.

Conclusions Many different seeding strategies are used, often for automatic annotation. Bootstrapping methods are sensitive to their initial seeds. Want seed cases that are: –frequent, unambiguous, representative, and diverse A substantial need exists for manually annotated evaluation data! –To better evaluate claims of portability –To better understand bootstrapping behavior