Semi-supervised Training of Statistical Parsers CMSC 35100 Natural Language Processing January 26, 2006.

Slides:

Advertisements

Similar presentations

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Advertisements

1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.

Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

Confidence-Weighted Linear Classification Mark Dredze, Koby Crammer University of Pennsylvania Fernando Pereira Penn  Google.

Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.

Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

1/13 Parsing III Probabilistic Parsing and Conclusions.

Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Overview of Search Engines

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Ensembles of Classifiers Evgueni Smirnov

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Semi-supervised Dialogue Act Recognition Maryam Tavafi.

Report on Semi-supervised Training for Statistical Parsing Zhang Hao

Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.

Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.

Supertagging CMSC Natural Language Processing January 31, 2006.

Semi-automatic Product Attribute Extraction from Store Website

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.

Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.

Semi-Supervised Clustering

David Mareček and Zdeněk Žabokrtský

Authorship Attribution Using Probabilistic Context-Free Grammars

Relation Extraction CSCI-GA.2591

Sample Selection for Statistical Parsing

Bidirectional CRF for NER

Asymmetric Gradient Boosting with Application to Spam Filtering

Probabilistic and Lexicalized Parsing

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Presentation transcript:

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006

Roadmap Motivation: –Resource Bottleneck Co-training Co-training with different parsers –CFG & LTAG Experiments: –Initial seed set size –Parse selection –Domain porting Results and discussion

Motivation: Issues Current statistical parsers –Many grammatical models –Significant progress: F-score ~ 93% Issues: –Trained on ~1M words Penn WSJ treebank Annotation: significant investment: time & money –Portability: Single genre – business news Later treebanks – smaller, still news –Training resource bottleneck

Motivation: Approach Goal: –Enhance portability, performance without large amounts of additional training data Observations: –“Self-training”: Train parser on own output Very small improvement (better counts for heads) Limited to slightly refining current model –Ensemble methods, voting: useful Approach: Co-training

Co-Training Co-Training (Blum & Mitchell 1998) –Weakly supervised training technique Successful for basic classification –Materials Small “seed” set of labeled examples Large set of unlabeled examples –Training: Evidence from multiple models –Optimize degree of agreement b/t models on unlabeled data Train several models on seed data Run on unlabeled data Use new “reliable” labeled examples to train others Iterate

Co-training Issues Challenge: –Picking reliable novel examples No guaranteed, simple approach Rely on heuristics –Intersection: Highly ranked by other; low by self –Difference: Score by other exceeds self by some margin Possibly employ parser confidence measures

Experimental Structure Approach (Steedman et al, 2003) –Focus here: Co-training with different parsers Also examined reranking, supertaggers &parsers Co-train CFG (Collins) & LTAG Data: Penn Treebank WSJ, Brown, NA News Questions: –How select reliable novel samples? –How does labeled seed size affect co-training? –How effective in co-training w/in, across genre?

System Architecture Two “different” parsers –“Views” – can be different by feature space Here Collins CFG & LTAG –Comparable performance, different formalisms Cache Manager –Draws labeled sentences for parsers to label –Selects subset of newly labeled to training set

Two Different Parsers Both train on treebank input –Lexicalized, head information percolated Collins-CFG –Lexicalized CFG parser “Bi-lexical”: each pair of non-terminals leads to bigram relation b/t pair of lexical items Ph= head percolation; Pm=modifiers of head dtr LTAG: –Lexicalized TAG parser Bigram relations b/t trees Ps=substitution probability; Pa=adjunction probability Different in tree creation and lexical reln depth

Selecting Labeled Examples Scoring the parse –Ideal – true – score impossible F-prob: trust the parser; F-norm-prob: norm by len F-entropy: Diff b/t parse score distr and uniform –Baseline: # of parses, sentence length Selecting (newly labeled) sentences –Goal: minimize noise, maximize training utility S-base: n highest scores (both parsers use same) Asymmetric: teacher/student –S-topn: teacher’s top n –S-intersect: sentences in teacher’s top n, student’s bottom n –S-diff: teacher’s score higher than student’s by some amount

Experiments: Initial Seed Size Typically evaluate after all training Consider convergence rate –Initial rapid growth – tailing off w/more –Largest improvement: instances Collins-CFG plateaus at 40K (89.3) LTAG still improving –Will benefit from additional training Co-training w/500 vs 1000 instances –Less data, greater benefit Enhance coverage –However, 500 seed doesn’t reach level of 1000 seed

Experiments: Parse Selection Contrast: –Select-all newly labeled vs S-intersect (67%) Co-training experiments: –500 seed set –LTAG performs better w/S-intersect Reduces noise, LTAG sensitive to noisy trees –CFG performs better w/S-select-all CFG needs to increase coverage, more samples

Experiments: Cross-domain Train on Brown corpus seed –Co-train on WSJ –CFG, w/s-intersect improves, 76.6-> 78.3 Mostly in first 5 iterations –Lexicalizing for new domain vocab Train on Brown WSJ seed –Co-train on other WSJ –Base improves to 78.7, co-train to 80 Gradual improvement, new constructs?

Summary Semi-supervised parser training –Co-training Two different parse formalisms provide diff’t views –Enhances effectiveness Biggest gains with small seed sets Cross-domain enhancement –Selection methods dependent on Parse model, amount of seed data

Findings Co-training enhances parsing when trained on small datasets: sentences Co-training aids genre porting w/o labels Co-training improved w/ANY labels for genre Approaches for crucial sample selection