Part C: Parsing the web and domain adaptation. In-domain vs out-domain Annotated data in Domain A A Parser Training Parsing texts in Domain A Parsing.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.
Semantic Role Labeling Abdul-Lateef Yussiff
1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.
Deep Learning in NLP Word representation and how to use it for Parsing
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen {zhli13, minzhang, Soochow University, China Coupled Sequence.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
1 CS546: Machine Learning and Natural Language Multi-Class and Structured Prediction Problems Slides from Taskar and Klein are used in this lecture TexPoint.
Part D: multilingual dependency parsing. Motivation A difficult syntactic ambiguity in one language may be easy to resolve in another language (bilingual.
Ling 570 Day 17: Named Entity Recognition Chunking.
The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
New Results in Parsing Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L.
Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages Dan Garrette, Jason Mielens, and Jason Baldridge Proceedings of ACL 2013.
Part E: conclusion and discussions. Topics in this talk Dependency parsing and supervised approaches Single model Graph-based; Transition-based; Easy-first;
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Tokenization & POS-Tagging
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.
1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.
Supertagging CMSC Natural Language Processing January 31, 2006.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University
CSC 594 Topics in AI – Natural Language Processing
PRESENTED BY: PEAR A BHUIYAN
Tools for Natural Language Processing Applications
David Mareček and Zdeněk Žabokrtský
Authorship Attribution Using Probabilistic Context-Free Grammars
Aspect-based sentiment analysis
Soft Cross-lingual Syntax Projection for Dependency Parsing
محمدصادق رسولی rasooli.ms{#a#t#}gmail.com
Overview of Machine Learning
Extracting Information from Diverse and Noisy Scanned Document Images
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

Part C: Parsing the web and domain adaptation

In-domain vs out-domain Annotated data in Domain A A Parser Training Parsing texts in Domain A Parsing texts in Domain B In-domain Out-domain

Motivation Few or no labeled resources exist for parsing text of the target domain. Unsupervised grammar induction? Lots of work Accuracies significantly lag behind those of supervised systems Only on short sentences or assuming the existence of gold POS tags Build strong parsers by exploring labeled resources of existing domains plus unlabeled data for the target domain.

Outline Three shared tasks for parsing out-domain text Approaches for parsing out-domain text news domain web data

Shared tasks CoNLL 2007 shared task on domain adaptation CoNLL 2009 shared task on domain adaptation SANCL 2012 parsing the web

CoNLL 2007 shared task on domain adaptation Setup for the domain adaptation track Data Train: Large-scale labeled data for the source domain (WSJ) Development: labeled data for biomedical abstracts Test: labeled data for chemical abstracts Unlabeled: large-scale unlabeled data for each train/dev/test. The goal is to use the labeled data of the source domain, plus any unlabeled data, to produce accurate parsers for the target domains.

CoNLL 2009 shared task on domain adaptation Setup for the domain adaptation track Czech, German, English (Brown corpus) No unlabeled data Provide initial out-of-domain results for the three languages.

SANCL 2012: Parsing the web Data Setup (Petrov and McDonald, 2012) Labeled data Train: WSJ-train Development: s, weblogs, WSJ-dev Test: answers, newsgroups, reviews, WSJ-test Unlabeled data Large-scale unlabeled data for all domains The goal is to build a single system that can robustly parse all domains.

Data sets for SANCL 2012

Approaches for parsing canonical out- domain text (CoNLL07) Feature-based approaches Only include features that transfer well (Dredze+, 07) Structural corresponding learning: transform features from source domain to target domain (Shimizu and Nakagawa, 07) Ensemble-based approaches Stacking (Dredze+, 07) Co-training (Sagae and Tsujii, 07) Variant of self-training (Watson and Briscoe, 07)

Approaches for parsing canonical out- domain text (CoNLL07) Other approaches Tree revision rules for target domain (Attardi+, 07) Training instance weighting (Dredze+, 07) Hybrid: use the output of a Constraint Grammar parser (Bick, 07) Use collocations and relational nouns from unlabeled target domain data (Schneider+, 07)

Frustratingly hard domain adaptation (Dredze+, 2007) Theoretical work on domain adaptation attributes adaptation loss to two sources (Ben-David+, 2006) Difference in the distribution between domains Difference in labeling functions The error analysis of Dredze+ (2007) suggests that the primary source of errors is the difference in annotation guidelines between treebanks.

Frustratingly hard domain adaptation (Dredze+, 2007) Challenges for adaptation from WSJ (90%) to BIO (84%) Annotation divergences between BIO and WSJ Unlike WSJ, BIO contains many long sequence of digits. Complex noun phrases Appositives WSJ uses fine-grained POS tags such as NNP, while BIO uses NN. Long list of failed attempts

Frustratingly hard domain adaptation (Dredze+, 2007) Feature manipulation Remove features less likely to transfer Add features more likely to transfer Using word clustering features Parser diversity Ensemble of parsers (similar to stacking and bagging) Target focused learning Assign higher weights to instances similar to the target when training

Domain + non-canonical text differences Domain A Canonical text Domain B Non-canonical text

Parsing non-canonical out-domain text (SANCL) What is new? Inconsistent usage of punctuation and capitalization Lexical shift due to increased use of slang, technical jargon, or other phenomena. Spelling mistakes and ungrammatical sentences Some syntactic structures are more frequently used in web texts than in newswire Questions, imperatives, long lists of names, sentence fragments…

Examples Plz go there. I like it very much!!!!!! Gooooooooo …

Approaches for parsing non-canonical out-domain text (SANCL)

Approaches Domain A Canonical text Domain B Canonical text Domain B Non-canonical text Text normalizationDomain adaptation

Approaches for parsing non-canonical out-domain text (SANCL) Main approaches Text normalization (preprocessing) Ensemble of parsers Self-training for constituent parsing Word clustering/embedding Co/tri-training (unsuccessful) Instance weighting and genre classification

Text normalization Preprocessing the data leads to better POS tagging and parsing performance. (Foster, 2010; Gadde+, 2011; Roux and Foster+, 2012)

Text normalization The preprocessing rules of (Roux, Foster+, 2012) Emoticon => comma or full stop address, URL => generic strings Uppercased words => lowercased Abbreviations, spelling variants (plz, ppl) => standard form nt; s => n’t; ’s Repeated punctuation (!!!) => collapsed into one List items (# 2) => removed

Text normalization The preprocessing rules of (Seddah+, 2012) An Ontonote/PTB token normalization stage Smileys, URLs, addresses, similar entities Correct tokens or token sequences Spelling error patterns Lowercasing Rewriting rules for dealing with frequent amalgams (gonna or im)

Text normalization The preprocessing rules of (McClosky+, 2012) High-precision text replacements 1,057 spelling auto-correction rules (yuo => you) from Pidgin instant messaging client 151 common Internet abbreviations (LOL => “laughing out loud”) Limited gain Such spelling errors are infrequent in the unlabeled data.

Ensemble of parsers Product-of-experts (Alpage, DCU-Paris13) Stacking (IMS, Stanford, UPenn) Voting (CPH-Trento, DCU-Paris, HIT) Bagging (HIT) Up-training (IMS) Re-ranking (DUC-Paris13, IMS, Stanford) Model merging (OHSU, Stanford) Obtain large improvement gain. More like improvement in in-domain parsing Contribution to domain adaptation?

Exploring unlabeled data Self-training (successful for constituent parsers) Two-stage generative model and reranker (Charniak and Johnson, 2005) Generative PCFG-LA model (Petrov and Klein, 2007) Word clusters or embeddings Co/tri-training (unsuccessful for dependency parsers)

Why self-training is unsuccessful for dependency parsing? Generative models suffer less from the over-fitting problem during training. Current dependency parsing models are commonly discriminative. Linear models with online training, no probabilistic explanation. Generative models leads to unsatisfactory accuracy.

Evaluation results Top 4 systems of SANCL on POS tagging Tagging performance is very important TeamAnswersNewsgroupsReviewsWSJAveraged DCU-Paris (Roux, Foster+) (1) HIT (Zhang+) (2) IMS (Bohnet+) (3) Stanford (McClosky+) (4)

Which one is the best/most important? Main approaches Text normalization (preprocessing) Ensemble of parsers Self-training for constituent parsing Word clustering/embedding Co/tri-training (unsuccessful) Instance weighting and genre classification

End of Part C

G. Attardi, F. Dell’Orletta, M. Simi, A. Chanev, and M. Ciaramita Multilingual dependency parsing and domain adaptation using desr. In Proc. of the CoNLL 2007 Shared Task. Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira Analysis of representations for domain adaptation. In NIPS E. Bick Hybrid ways to improve domain independence in an ML dependency parser. In Proc. of the CoNLL 2007 Shared Task. Eugene Charniak and Mark Johnson Coarse-to-Fine n-Best Parsing andMaxEnt Discriminative Reranking. Proceedings of ACL 2005, pp173–180. D. Das and S. Petrov Unsupervised part-of-speech tagging with bilingual graph- based projections. In Proc. of ACL-HLT. M. Dredze, J. Blitzer, P. P. Talukdar, K. Ganchev, J. Graca, and F. Pereira Frustratingly hard domain adaptation for dependency parsing. In Proc. of the CoNLL 2007 Shared Task. Matthew S. Dryer and Martin Haspelmath, editors The World Atlas of Language Structures Online. Munich: Max Planck Digital Library. References

Jennifer Foster “cba to check the spelling” investigating parser performance on discussion forum posts. In Proceedings of HLT NAACL. Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux, Stephen Hogan, Joakim Nivre, Deirdre Hogan, and Josef van Genabith # hardtoparse: Pos tagging and parsing the twitterverse. In Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence. Phani Gadde, L. V. Subramaniam, and Tanveer A. Faruquie Adapting a wsj trained part-of-speech tagger to noisy text: Preliminary results. InProceedings of Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data. Jan Hajič, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jan Štepánek, Pavel Straňák, Mihai Surdeanu, Nianwen Xue, Yi Zhang The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages. In Proceedings of CONLL 2009 shared task, pp1–18 Terry Koo, Xavier Carreras, Michael Collins Simple semi-supervised dependency parsing. In Proceedings of ACL 2008, pp 595–603 References

David McClosky, Wanxiang Che, Marta Recasens, Mengqiu Wang, Richard Socher, and Christopher D. Manning Stanford’s System for Parsing the EnglishWeb. In Notes of the First Workshop on SANCL. Ryan McDonald, Slav Petrov, and Keith Hall Multi-source transfer of delexicalized dependency parsers. In Proceedings of EMNLP. Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, Deniz Yuret The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of EMNLP-CONLL 2007, pp915–932 S. Petrov, D. Das, and R. McDonald A universa part-of-speech tagset. In ArXiv: Slav Petrov and Dan Klein Improved inference for unlexicalized parsing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages404–411 Slav Petrov and Ryan McDonald Overview of the 2012 Shared Task on Parsing the Web. In Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL). References

Joseph Le Roux, Jennifer Foster, Joachim Wagner, Rasul Samad Zadeh Kaljahi, and Anton Bryl Dcuparis13 systems for the sancl 2012 shared task. In Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL). K. Sagae and J. Tsujii Dependency parsing and domain adaptation with LR models and parser ensembles. In Proc. of the CoNLL 2007 Shared Task. G. Schneider, K. Kaljurand, F. Rinaldi, and T. Kuhn Pro3Gres parser in the CoNLL domain adaptation shared task. In Proc. of the CoNLL 2007 Shared Task. Djame Seddah, Benoit Sagot, and Marie Candito Robust pre-processing and semi- supervised lexical bridging for user-generated content parsing. In Notes of the First Workshop on SANCL. N. Shimizu and H. Nakagawa Structural correspondence learning for dependency parsing. In Proc. of the CoNLL 2007 Shared Task. Oscar Täckström, Ryan McDonald, and Joakim Nivre Target Language Adaptation of Discriminative Transfer Parsers. In Proc. of NAACL. Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of NAACL-HLT. References

R. Watson and T. Briscoe Adapting the RASP system for the CoNLL07 domain- adaptation task. In Proc. of the CoNLL 2007 Shared Task. Meishan Zhang, Wanxiang Che, Yijia Liu, Zhenghua Li, Ting Liu HIT dependency parsing: Bootstrap aggregating heterogeneous parsers. In Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL) References