Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

Random Forest Predrag Radenković 3237/10
On-line learning and Boosting
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Analysis of Semi-supervised Learning with the Yarowsky Algorithm
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Distributed Representations of Sentences and Documents
Advisor: Prof. Tony Jebara
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Active Learning for Class Imbalance Problem
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Natural Language Processing
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.
CS 391L: Machine Learning: Ensembles
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Benk Erika Kelemen Zsolt
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
A MIXED MODEL FOR CROSS LINGUAL OPINION ANALYSIS Lin Gui, Ruifeng Xu, Jun Xu, Li Yuan, Yuanlin Yao, Jiyun Zhou, Shuwei Wang, Qiaoyun Qiu, Ricky Chenug.
1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Report on Semi-supervised Training for Statistical Parsing Zhang Hao
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Analysis of Bootstrapping Algorithms Seminar of Machine Learning for Text Mining UPC, 18/11/2004 Mihai Surdeanu.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Classification using Co-Training
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Adaptive Cluster Ensemble Selection Javad Azimi, Xiaoli Fern {azimi, Oregon State University Presenter: Javad Azimi. 1.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Data Mining Practical Machine Learning Tools and Techniques
Raymond J. Mooney University of Texas at Austin
Advanced data mining with TagHelper and Weka
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Semi-supervised Machine Learning Gergana Lazarova
Machine Learning: Ensembles
Boosting Nearest-Neighbor Classifier for Character Recognition
Data Mining Practical Machine Learning Tools and Techniques
System Combination LING 572 Fei Xia 01/31/06.
Perceptron Learning for Chinese Word Segmentation
Three steps are separately conducted
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
CS 391L: Machine Learning: Ensembles
Presentation transcript:

Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering National Taiwan Normal University

2 Outline Introduction Co-training Data Selecting Parsers for Co-training Word Segmentation and Parsing Experimental Results

3 Introduction Parsing is an important research area in natural language processing(NLP), aiming at resolving structural ambiguity. In this paper, we will explore weakly-supervised learning approaches on parsing Chinese BN and BC transcripts and examine some Chinese parsing related issues such as parsing unsegmented character sequences rather than words and the effect of word segmentation on parsing accuracy.

4 Co-training General co-training algorithm Informally, co-training can be described as picking multiple classifiers (”views”) of a classification problem. Build models for each view and train these models on a small set of labeled data, then on a large set of unlabeled data, sample a subset, label them using the models, select examples from the labeled results, add them to the training pool. And iterate this procedure until the unlabeled set is all labeled.

5 Co-training Example selection approaches for co-training

6 In Algorithm 1, when calling the classifier that provides additional training data for the opposite classifier the teacher and the opposite classifier the student, since the labeled output from both classifiers and is noisy. An important question is which newly labeled examples from the teacher should be added to the training data pool of the student. This issue of example selection plays an important role in the learning rate of co-training and the performance of resulting classifiers.

7 Co-training Example selection approaches for co-training Naive co-training –which simply adds all examples in the cache labeled by the teacher to the training data pool of the student. Agreement-based co-training –select the subset of the labeled cache that maximizes the agreement of the two classifiers on unlabeled data. The student classifier is the one being retrained and the teacher classifier is the one remaining static. –Hence, this approach aims to improve the performance of the two classifiers alternatively, instead of simultaneously.

8 Co-training Example selection approaches for co-training Max-score –select the top n examples with the highest scores (based on a scoring function) when labeled by the teacher to add to the training pool of the student. Max-t-min-s –selects examples with scores within the m percent of top high-scoring labeled examples by the teacher and within the n percent of bottom low- scoring labeled examples by the student.

9 Data wordsSentences CTB-training CTB-dev CTB-test BN-test31K1565 BC-test11K1482

10 Selecting Parsers for Co-training We investigated four publicly available parsers –Charniak’s maximum-entropy inspired parser with the MaxEnt reranker –Stanford unlexicalized parser –Berkeley parser –Dan Bikel’s reimplementation of Michael Collins’ Model 2Parser To select two from them in our co-training setup, we considered two important factors, accuracy and mutual complementariness.

11 Selecting Parsers for Co-training To evaluate parser accuracy, we consider the F-measure

12 Selecting Parsers for Co-training The co-training principle requires the two views to be conditionally independent or weakly conditionally independent. To measure the structural complementariness between parsers, we adapted the measure of structural consistency between parsers and modified the objective function for maximizing the structural complementariness between parsers to be selecting parsers with the minimal structural consistency.

13 Selecting Parsers for Co-training Average crossing brackets (ACB) Since we need to achieve the best combination of maximizing parsers’ accuracy and their mutual complementariness, we selected Charniak’s parser and Berkeley parser for co-training. ABACB AB CharniakStanford2.11 BerkelyStanford2.09 CharniakBikel2.05 BerkelyBikel2.01 CharniakBerkely1.99 BikelStanford1.87

14 Word Segmentation and Parsing We examined this character-based parsing strategy on Charniak’s parser and Berkeley parser on the converted character-based CTB. Results shown in Table 1 demonstrated that parsing unsegmented text will loose about 8% absolutely on F-measure compared to parsing the original word-segmented treebank. We also found that it is essential to ensure consistent word segmentations between the treebank used for training parsers and the word-segmented text data for parsing.

15 Experimental Results

16 Experimental Results

17 Experimental Results In conclusion, we have shown that co-training can be effectively applied to bootstrap parsers for parsing Mandarin BN and BC transcripts by combining labeled and unlabeled data. We also found that parsing unsegmented text is still quite inferior to parsing on the word level and it is essential to use a consistent word segmentation model for training the parsers and applying them for parsing text.