Statistical NLP Spring 2011

Slides:



Advertisements
Similar presentations
Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Sequence Assembly for Single Molecule Methods Steven Skiena, Alexey Smirnov Department of Computer Science SUNY at Stony Brook {skiena,
Silhouette-based Object Phenotype Recognition using 3D Shape Priors Yu Chen 1 Tae-Kyun Kim 2 Roberto Cipolla 1 University of Cambridge, Cambridge, UK 1.
Molecular Evolution Revised 29/12/06
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
The Sequence Memoizer Frank Wood Cedric Archambeau Jan Gasthaus Lancelot James Yee Whye Teh Gatsby UCL Gatsby HKUST Gatsby TexPoint fonts used in EMF.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley.
Explanation. -Status of linguistics now and before 20 th century - Known as philosophy in the past, now new name – Linguistics - It studies language in.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
HPC in linguistic research Andrew Meade University Of Reading
The Sequence Memoizer Frank Wood Cedric Archambeau Jan Gasthaus Lancelot James Yee Whye Teh UCL Gatsby HKUST Gatsby TexPoint fonts used in EMF. Read the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Melody Recognition with Learned Edit Distances Amaury Habrard Laboratoire d’Informatique Fondamentale CNRS Université Aix-Marseille José Manuel Iñesta,
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Natural Language Processing Vasile Rus
Learning to Align: a Statistical Approach
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
Statistical NLP: Lecture 7
Guillaume-Alexandre Bilodeau
David Mareček and Zdeněk Žabokrtský
Probabilistic Data Management
Statistical NLP: Lecture 13
Reconstruction on trees and Phylogeny 1
Statistical NLP Spring 2011
Multiple Alignment and Phylogenetic Trees
A special case of calibration
CAP 5636 – Advanced Artificial Intelligence
CS 154, Lecture 6: Communication Complexity
Variational Knowledge Graph Reasoning
CSc4730/6730 Scientific Visualization
Weakly Learning to Match Experts in Online Community
CS 188: Artificial Intelligence Fall 2008
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CS621/CS449 Artificial Intelligence Lecture Notes
CS 188: Artificial Intelligence
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Speech recognition, machine learning
University of Illinois System in HOO Text Correction Shared Task
Using decision trees to improve signature-based intrusion detection
Word embeddings (continued)
Statistical NLP Spring 2011
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Stable and Practical AS Relationship Inference with ProbLink
Discrete Mathematics and Its Applications
The Nature of learner language
Speech recognition, machine learning
Presentation transcript:

Statistical NLP Spring 2011 Lecture 28: Diachronics Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

Evolution: Main Phenomena Mutations of sequences Time Speciation

Tree of Languages Challenge: identify the phylogeny Much work in biology, e.g. work by Warnow, Felsenstein, Steele… Also in linguistics, e.g. Warnow et al., Gray and Atkinson… http://andromeda.rutgers.edu/~jlynch/language.html

Statistical Inference Tasks Inputs Outputs Modern Text focus Ancestral Word Forms fuego feu FR IT PT ES fuego oeuf  Cognate Groups / Translations huevo feu Phylogeny Grammatical Inference les faits sont très clairs

les faits sont très clairs Outline focus Ancestral Word Forms fuego feu fuego oeuf  Cognate Groups / Translations huevo feu Grammatical Inference les faits sont très clairs

Language Evolution: Sound Change Eng. camera from Latin, “camera obscura” camera /kamera/ Latin Deletion: /e/ Change: /k/ .. /tʃ/ .. /ʃ/ Insertion: /b/ chambre /ʃambʁ/ French Eng. chamber from Old Fr. before the initial /t/ dropped

Diachronic Evidence Yahoo! Answers [2009] Appendix Probi [ca 300] tonight not tonite tonitru non tonotru

Synchronic (Comparative) Evidence

Simple Model: Single Characters

A Model for Words Forms focus /fokus/ /fogo/ fuoco /fwɔko/ [BGK 07] focus /fokus/ LA /fogo/ IB fuoco /fwɔko/ fuego /fweɣo/ fogo /fogo/ IT ES PT

Contextual Changes … # f o /fokus/ # f w ɔ /fwɔko/

Changes are Systematic /fokus/ /fweɣo/ /fogo/ /fwɔko/ /fokus/ /fweɣo/ /fogo/ /fwɔko/ /kentrum/ /sentro/ /tʃɛntro/

Experimental Setup Data sets Small: Romance Large: Oceanic FR IT PT ES French, Italian, Portuguese, Spanish 2344 words Complete cognate sets Target: (Vulgar) Latin Large: Oceanic 661 languages 140K words Incomplete cognate sets Target: Proto-Oceanic [Blust, 1993] FR IT PT ES

Data: Romance

Learning: Objective /fokus/ /fweɣo/ /fogo/ /fwɔko/

Learning: EM M-Step E-Step Find parameters which fit (expected) sound change counts Easy: gradient ascent on theta E-Step Find (expected) change counts given parameters Hard: variables are string-valued /fokus/ /fweɣo/ /fogo/ /fwɔko/ /fokus/ /fweɣo/ /fogo/ /fwɔko/

Computing Expectations [Holmes 01, BGK 07] Computing Expectations Standard approach, e.g. [Holmes 2001]: Gibbs sampling each sequence call it ‘grass’

A Gibbs Sampler ‘grass’

A Gibbs Sampler ‘grass’

A Gibbs Sampler ‘grass’

Getting Stuck ? How to jump to a state where the liquids /r/ and /l/ have a common ancestor?

Getting Stuck the sampler needs to wander in a valley of lower pr before it can gets to the other peak

Solution: Vertical Slices [BGK 08] Solution: Vertical Slices Single Sequence Resampling Ancestry Resampling

Details: Defining “Slices” The sampling domains (kernels) are indexed by contiguous subsequences (anchors) of the observed leaf sequences anchor Correct construction section(G) is non-trivial but very efficient

Results: Alignment Efficiency Depth of the phylogenetic tree Sum of Pairs Setup: Fixed wall time Synthetic data, same parameters Results: Alignment Efficiency Is ancestry resampling faster than basic Gibbs? Hypothesis: Larger gains for deeper trees now that I have shown that we get better alignment, the next question is whether we also get them more quickly our hypo here is that... and this is because the random walk that gibbs needs to go thru gets longer and longer the setup here is that we fix a wall time budget, and create synthetic dataset along deeper and deeper phylogenetic tree for each one, we compute sum of pair

Results: Romance

Learned Rules / Mutations

Learned Rules / Mutations

Comparison to Other Methods Evaluation metric: edit distance from a reconstruction made by a linguist (lower is better) Us Oakes Comparison to system from [Oakes, 2000] Uses exact inference and deterministic rules Reconstruction of Proto-Malayo-Javanic cf [Nothefer, 1975] The way we evaluate performance is by measuring the averaged levenhstein edit distance b/w the proposed reconstruction and those proposed by a linguist Let me start with some results comparing our model to other methods System called Prague, designed by Robert Oakes As you can see in this graph, our system outperforms this previous approach

Data: Oceanic Proto-Oceanic

Data: Oceanic http://language.psy.auckland.ac.nz/austronesian/research.php

Result: More Languages Help Distance from [Blust, 1993] Reconstructions Mean edit distance Number of modern languages used

Results: Large Scale -here is the tree we have used, a tree based on the full austronesian language dataset

Visualization: Learned universals all of those links are actually very reasonable sound changes *The model did not have features encoding natural classes

Regularity and Functional Load In a language, some pairs of sounds are more contrastive than others (higher functional load) Example: English “p”/“b” versus “t”/”th” “p”/“b”: pot/dot, pin/din, dress/press, pew/dew, ... “t”/”th”: thin/tin for example, one open question in linguistics is whether sounds that are less contrastive in a language would be more vulnerable to sound changes here is what I mean by constrastive: consider for example .. functional load quantifies this concept, and would give a low functional load to the t/th pair, and a high funct. load to the other pair

Functional Load: Timeline Functional Load Hypothesis (FLH): sounds changes are less frequent when they merge phonemes with high functional load [Martinet, 55] Previous research within linguistics: “FLH does not seem to be supported by the data” [King, 67] Caveat: only four languages were used in King’s study [Hocket 67; Surandran et al., 06] Our work: we reexamined the question with two orders of magnitude more data [BGK, under review] there is a very interesting open question in linguistics related to FL: the conjecture, made by Martinet in 55, is that sound changes are less freq when they merge phonemes with high FL

Regularity and Functional Load Data: only 4 languages from the Austronesian data Each dot is a sound change identified by the system here is the kind of plot we are interested in: the x axis is the functional load as computed by king, and the y axis is the merger posterior pr for example if it’s close to 1, it’s a systematic sound change, lower values represent context-dep changes each point in this plot is a sound change Merger posterior probability Functional load as computed by [King, 67]

Regularity and Functional Load Data: all 637 languages from the Austronesian data computing FL from this tree gave us a much clearer view on the relationship b/w FL and sound change, in particular, the data seems to support the functional load hypothesis, that is sounds changes are more frequent when they merge phonemes with low functional load Merger posterior probability Functional load as computed by [King, 67]

les faits sont très clairs Outline focus Ancestral Word Forms fuego feu fuego oeuf  Cognate Groups / Translations huevo feu Grammatical Inference les faits sont très clairs

Cognate Groups ‘fire’ /fwɔko/ /vɛrbo/ /tʃɛntro/ /sentro/ /berβo/ /fweɣo/ /vɛrbo/ /fogo/ /sɛntro/

Model: Cognate Survival + - IT ES PT IB LA omnis - ogni IT ES PT IB LA

Results: Grouping Accuracy Fraction of Words Correctly Grouped Method [Hall and Klein, in submission]

Semantics: Matching Meanings EN day tag EN Occurs with: “name” “label” “along” Occurs with: “night” “sun” “week” DE tag Occurs with: “nacht” “sonne” “woche”

les faits sont très clairs Outline focus Ancestral Word Forms fuego feu fuego oeuf  Cognate Groups / Translations huevo feu Grammatical Inference les faits sont très clairs

Grammar Induction Task: Given sentences, infer grammar (and parse tree structures) congress narrowly passed the amended bill les faits sont très clairs

Shared Prior congress narrowly passed the amended bill les faits sont très clairs

Results: Phylogenetic Prior Dutch Danish Swedish Spanish Portuguese Slovene Chinese English WG NG RM G IE GL Avg rel gain: 29%

Conclusion Phylogeny-structured models can: Accurately reconstruct ancestral words Give evidence to open linguistic debates Detect translations from form and context Improve language learning algorithms Lots of questions still open: Can we get better phylogenies using these high- res models? What do these models have to say about the very earliest languages? Proto-world?

Thank you! nlp.cs.berkeley.edu