Part D: multilingual dependency parsing. Motivation A difficult syntactic ambiguity in one language may be easy to resolve in another language (bilingual.

Slides:



Advertisements
Similar presentations
Combining Word-Alignment Symmetrizations in Dependency Tree Projection David Mareček Charles University in Prague Institute of.
Advertisements

Machine learning continued Image source:
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.
Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.
Ensemble Learning: An Introduction
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Dependency Parsing with Reference to Slovene, Spanish and Swedish Simon Corston-Oliver Anthony Aue Microsoft Research.
Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
An SVMs Based Multi-lingual Dependency Parsing Yuchang CHENG, Masayuki ASAHARA and Yuji MATSUMOTO Nara Institute of Science and Technology.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al,
An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen {zhli13, minzhang, Soochow University, China Coupled Sequence.
Part C: Parsing the web and domain adaptation. In-domain vs out-domain Annotated data in Domain A A Parser Training Parsing texts in Domain A Parsing.
Morphosyntactic correspondence: a progress report on bitext parsing Alexander Fraser, Renjing Wang, Hinrich Schütze Institute for NLP University of Stuttgart.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
1 CS546: Machine Learning and Natural Language Multi-Class and Structured Prediction Problems Slides from Taskar and Klein are used in this lecture TexPoint.
The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
Inductive Dependency Parsing Joakim Nivre
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
A Language Independent Method for Question Classification COLING 2004.
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Part E: conclusion and discussions. Topics in this talk Dependency parsing and supervised approaches Single model Graph-based; Transition-based; Easy-first;
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
CLASSIFICATION: Ensemble Methods
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.
Supertagging CMSC Natural Language Processing January 31, 2006.
Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank Prudhvi Kosaraju, Bharat Ram Ambati, Samar Husain Dipti Misra Sharma,
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Cross-lingual Models of Word Embeddings: An Empirical Comparison
Lecture 9: Part of Speech
CSC 594 Topics in AI – Natural Language Processing
David Mareček and Zdeněk Žabokrtský
--Mengxue Zhang, Qingyang Li
Soft Cross-lingual Syntax Projection for Dependency Parsing
Deep Learning based Machine Translation
Universal Dependencies
Universal Dependencies
Statistical Machine Translation Papers from COLING 2004
Natural Language Processing
Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006
Presentation transcript:

Part D: multilingual dependency parsing

Motivation A difficult syntactic ambiguity in one language may be easy to resolve in another language (bilingual constraints) A more accurate parser on one language may help a less accurate one on another language (semi- supervised) Rich labeled resources in one language can be transferred to build parsers of another language (unsupervised)

Obstacles Syntactic non-isomorphism across languages Different annotation choices (guideline) Partial (incomplete) parse trees resulted from projection Parsing errors on the source side Word alignment errors

Research map (two perspectives) Methods Delexicalized (not rely on bitext) Projection (rely on bitext) Project hard edges Project edge weights (distributions) As training objective (bilingual constraints) Supervision (existence of target treebank) Semi-supervised Unsupervised

Bilingual word reordering features (Huang+ 09) Supervised with bilingual constraints Use the word reordering information in source language as extra features when parsing target text (using automatic word alignment as bridge) Two characteristics Bilingual text with target-side labels No parse tree in the source side

Build local classifiers via projection (Jiang & Liu 10) Semi-supervised; project edges Step 1: projection to obtain dependency/non-dependency classification instances Step 2: build a target-language local dependency/non- dependency classifier Step 3: feed the outputs of the classifier into a supervised parser as extra weights during test phase.

Prune projected structures with target-language marginal (Li+ 14) Semi-supervised; project edges from EN to CH The goal is to acquire extra large-scale high- quality labeled training instances from bitext Step 1: parse English sentences of bitext Step 2: project English trees into Chinese, and filter unlikely dependencies with target-language marginal probabilities (handling cross-language non-isomorphism, or parsing and alignment errors) Step 3: train with extra training instances with partial annotations.

Prune projected structures with target-language marginal (Li+ 14)

Delexicalized transfer (no bitext) (McDonald+ 11) Unsupervised; delexicalized Direct delexicalized transfer (Zeman & Resnik 08, Sogaard 11, Cohen+ 11) Train a delexicalized parser on source treebank, ignoring words, only using universal POS tags (Petrov+ 11)

Delexicalized transfer (no bitext) (McDonald+ 11) Direct delexicalized transfer Step 1: replace the language-specific POS tags in source treebank with the universal POS tags (Petrov+ 11) NOUN (nouns); VERB (verbs); ADJ (adjectives); ADV (adverbs) PRON (pronouns); DET (determiners); ADP (pre/postpositions); NUM (numerals); CONJ (conjunctions); PRT (particles); PUNC (punctuation marks); X (a catch-all tag) Step 2: train a delexicalized source-language parser Only use POS tag features (no lexical features) 89.3 (lex) => 82.5 (delex) on English test set Step 3: parse target-language sentences with universal POS tags

Multi-source delexicalized transfer Use multiple source treebanks (besides English), and concatenate them as a large training data. Can improve parsing accuracy. Delexicalized transfer (no bitext) (McDonald+ 11) Concatenated delexicalized data English Greek German … Train a delexicalized parser for Danish

Delexicalized + projected transfer (with bitext) (McDonald+ 11) Unsupervised; delexicalized + project (as training objective) Workflow (Danish as target language) Step 1: parse a set of Danish sentences with the delexicalized transfer parser (trained on En treebank). Step 2: train a lexicalized Danish parser using the predicted parses as gold-standard. Step 3: improve the lexicalized Danish parser on bitext Extrinsic objective: the parser’s outputs are more consistent with the outputs of the lexicalized English parser (training with multiple objectives, Hall+ 11).

Delexicalized transfer with cross- lingual word clusters (Tackstrom+ 12) Unsupervised; delexicalized + cross-lingual word clusters In delexicalized dependency parsers Only POS tag features are used In other previous work on dependency parsing Features based on word clusters work very well

Delexicalized transfer with cross-lingual word clusters (Tackstrom+ 12) Word clusters derived from large-scale unlabeled data can alleviate data sparseness. POS tags < word clusters < words Accuracy (LAS) of supervised English Parser

Delexicalized transfer with cross-lingual word clusters (Tackstrom+, 2012) Re-lexicalize the delexicalized transfer parser by using cross-lingual word clusters. Cross-lingual word clusters need to be consistent across languages. A cross-lingual clustering algorithm that leverages large amounts of monolingual unlabeled data, and parallel data through which the cross-lingual word cluster constrains are enforced.

Delexicalized transfer with cross- lingual word clusters (Tackstrom+ 12)

Selectively multi-source delexicalized transfer (Tackstrom+ 13) Unsupervised; delexicalized (also in Naseem+ 12) Some delexicalized features do not transfer well across typologically different languages Selectively parameter sharing based on typological and language-family features (Dryer and Haspelmath 11)

Selectively multi-source delexicalized transfer (Tackstrom+ 13) Selectively use features for different source and target language pairs Indo-European (Bulgarian, Catalan, Czech, Greek, English, Spanish, Italian, Dutch, Portuguese, Swedish) Altaic languages (Japanese, Turkish)

Ambiguity-aware transfer (Tackstrom+ 13) Unsupervised; delexicalized Relexicalize delexicalized parsers using target- language unlabeled text. First, parse the unlabeled data with the base delexicalized parser, and use the arc-marginals to construct ambiguous labels (forest) for each sentence. Then, train a lexicalized parser on the unlabeled data via ambiguity-aware training.

Ambiguity-aware transfer (Tackstrom+ 13)

Ambiguity-aware self-training Ambiguity-aware ensemble training Add the outputs of another parser (Naseem+, 2012) into the ambiguous label set. Can improve accuracy.

Syntax projection as posterior regularization (Ganchev+ 09) Unsupervised; project high-probability edges (used in training objective) Key points Filter unlikely projected dependencies according to source-side dependency probabilities and alignment probabilities (only project high-confidence dependencies) Give the model more flexibility via posterior regularization

Syntax projection as posterior regularization (Ganchev+ 09) Posterior regularization After training, the model should satisfy that the expectation of the number of conserved edges (among the projected edges) is at least 90%

Syntax projection as posterior regularization (Ganchev+ 09) log-likelihood of the target-side bitext Model prior The set of distributions satisfying the desired posterior constrains Model posterior

Syntax projection as posterior regularization (Ganchev+ 09) Create several simple rules to handle different annotation decisions (annotation guideline). E.g., Main verbs should dominate auxiliary verbs. Can largely improve parsing accuracy.

Combining unsupervised and projection objectives (Liu+ 13) Unsupervised; project edges Derive local dependency/non-dependency instances from projection (Jiang+, 10)

Combining unsupervised and projection objectives (Liu+ 13) Workflow

Combining unsupervised and projection objectives (Liu+ 13) Joint objective: linear interpolation of Unsupervised objective Projection objective

Transfer distribution as bilingual guidance (Ma & Xia 14) Unsupervised; hybrid of projection and delexicalized; transfer edge weights (distribution) Projected weights (lexicalized) Delexicalized weights

Transfer distribution as bilingual guidance (Ma & Xia 14) Use the target-language sentences to train a parser, by minimizing the KL divergence from the transferred distribution to the distribution of the learned model.

Transfer distribution as bilingual guidance (Ma & Xia 14) Use unlabeled data by minimizing entropy. Not very helpful

Summary of Part D Help target-language parsing with source- language treebanks and bitext Semi-supervised vs. unsupervised Projection vs. delexicalized Project hard edges, transfer edge weights, or use bilingual constraints

Summary of Part D Future thoughts Word alignment errors (probabilities) Source-language parsing errors (probabilities) How to capture cross-language non-isomorphism Joint word alignment and bilingual parsing ?

Matthew S. Dryer and Martin Haspelmath, editors The World Atlas of Language Structures Online. Munich: Max Planck Digital Library. Kuzman Ganchev, Jennifer Gillenwater, and Ben Taskar Dependency grammar induction via bitext projection constraints. In Proceedings of the 47th ACL. Liang Huang, Wenbin Jiang, and Qun Liu Bilingually-constrained (monolingual) shift-reduce parsing. In Proceedings of EMNLP, pages 1222–1231. Wenbin Jiang and Qun Liu Dependency parsing and projection based on word- pair classification. In ACL, pages 897–904. Zhenghua Li, Min Zhang and Wenliang Chen Soft Cross-lingual Syntax Projection for Dependency Parsing. In Proceedings of COLING, pages Soft Cross-lingual Syntax Projection for Dependency Parsing Kai Liu, Yajuan Lü, Wenbin Jiang, and Qun Liu Bilingually-guided monolingual dependency grammar induction. In Proceedings of ACL. Xuezhe Ma and Fei Xia Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization. In ACL Ryan McDonald, Slav Petrov, and Keith Hall Multi-source transfer of delexicalized dependency parsers. In Proceedings of EMNLP. References

S. Petrov, D. Das, and R. McDonald A universa part-of-speech tagset. In ArXiv: Anders Søgaard Data point selection for cross-language adaptation of dependency parsers. In Proceedings of ACL 2011, pages 682–686. Kathrin Spreyer and Jonas Kuhn Data-driven dependency parsing of new languages using incomplete and noisy training data. In CoNLL, pages 12–20. Oscar Täckström, Ryan McDonald, and Joakim Nivre Target Language Adaptation of Discriminative Transfer Parsers. In Proc. of NAACL. Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of NAACL-HLT. References