Morphological Segmentation Inside-Out

Slides:

Advertisements

Similar presentations

Complexity. P=NP? Who knows? Who cares? Lets revisit some questions from last time – How many pairwise comparisons do I need to do to check if a sequence.

Advertisements

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.

Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Machine learning continued Image source:

Computer vision: models, learning and inference Chapter 8 Regression.

Welcome to Morphology Monday Everything you need to know before the fun begins!!!

The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.

1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov.

Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:

“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Bottom-up parsing Goal of parser : build a derivation

Crash Course on Machine Learning

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.

Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.

Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

M ORPHOLOGY Lecturer/ Najla AlQahtani. W HAT IS MORPHOLOGY ? It is the study of the basic forms in a language. A morpheme is “a minimal unit of meaning.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

MORPHOLOGY definition; variability among languages.

III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.

Morphology Talib M. Sharif Omer Asst. Lecturer December 10,

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Ryan Cotterell and Hinrich Schütze

Morphological Smoothing and Extrapolation of Word Embeddings

Lecture 7: Constrained Conditional Models

CIS, Ludwig-Maximilians-Universität München Computational Morphology

Learning Deep Generative Models by Ruslan Salakhutdinov

Sentiment analysis algorithms and applications: A survey

Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.

CSC 594 Topics in AI – Natural Language Processing

Morphology: Meaning Matters!

عمادة التعلم الإلكتروني والتعليم عن بعد

Adversarial Learning for Neural Dialogue Generation

Done Done Course Overview What is AI? What are the Major Challenges?

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

These 12 words can help you “BEAT” the MCT2!

Chapter 6 Morphology.

Part I: Basics and Constituency

Neural Machine Translation By Learning to Jointly Align and Translate

Teacher: “Jack is rubbish at sociology. He can’t do it.”

CSCI 5832 Natural Language Processing

Learning to Combine Bottom-Up and Top-Down Segmentation

Objective of This Course

CSCI 5832 Natural Language Processing

Artificial Intelligence Lecture No. 28

12 POWERFUL WORDS.

Two Paths Diverge in the Lectures

April 27th Clean Air Council Meeting Upcoming 2018 CAC Meetings

Learning From Observed Data

Machine learning overview

Discriminative Probabilistic Models for Relational Data

Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.

SVMs for Document Ranking

A Joint Model of Orthography and Morphological Segmentation

Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Standard Normal Table Area Under the Curve

CS249: Neural Language Model

Presentation transcript:

Morphological Segmentation Inside-Out Ryan Cotterell, Arun Kumar, Hinrich Schütze

Old Idea: Surface Morphological Segmentation We are going to give examples in English, but other languages are far more complex!

unachievability un achiev abil ity Segment PREFIX STEM SUFFIX SUFFIX One common way of processing morphology is what we are going to call *surface* morphological segmentation. The goal, roughly speaking, is to separate a surface form of a word into its sequence of morphemes. Perhaps with a labeling. This task has attracted a lot of attention over the years with a number supervise and unsupervised methods being proposed. PREFIX STEM SUFFIX SUFFIX

Semi-New Idea: Canonical Morphological Segmentation

unachievability unachieveableity un achieve able ity Restore Segment DEFINE UNDERLYING FORM This work focuses on a different formulation of the task: canonical segmentation. The goal here is to map the surface form into an underlying form and *then* segment it. To point out the differences, compared to the last slide, we have added an "e" to "achieve" and mapped "abil" to "able". un achieve able ity PREFIX STEM SUFFIX SUFFIX

Why is canonicalization useful? Here's why you should care about this problem. Segmenting words alone is not enough. We eventually need to reason about the relationships between words. When we perform canonical segmentation, it becomes immediately clear, which words share morphemes.

unachievability achievement underachiever achieves Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

un achiev abil ity achieve ment under achiev er achieve s

Are they the same morpheme??? un achiev abil ity achieve ment under achiev er achieve s

unachievability achievement underachiever achieves Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

unachieveableity achievement underachieveer achieves

un achieve able ity achieve ment under achieve er achieve s

Canonical segmentations are standardized across words un achieve able ity achieve ment under achieve er Better preprocessing, e.g., more meaningful reduction in sparsity and reasoning about compositionality achieve s

unachievability thinkable accessible untouchable Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon untouchable

unachieveableity thinkable accessable untouchable Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon untouchable

un achieve able ity think able access able un touch able

un achieve able ity think able access able un touch able

New Idea: Morphology as Parsing

unachievability achievement underachiever achieves Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

unachieveableity achievement underachieveer achieves

un achieve able ity achieve ment under achieve er achieve s

un achieve able ity achieve ment under achieve er achieve s

under achieve er

under achieve er

under achieve er

under achieve er

PREFIX STEM SUFFIX under achieve er

Why are trees useful? Here's why you should care about this problem. Segmenting words alone is not enough. We eventually need to reason about the relationships between words. When we perform canonical segmentation, it becomes immediately clear, which words share morphemes.

Reason 1: Words are ambiguous! Tree Captures Ambiguity! SUFFIX PREFIX STEM SUFFIX PREFIX STEM un lock able un lock able “capable of being unlocked” “incapable of being locked” PREFIX STEM SUFFIX un lock able “???” Flat Segmentation Doesn’t!

Reason 2: Model Order of Affixation Path of Derivation achieve underachieve underachiever Encoded As Tree More Features PREFIX STEM SUFFIX under achieve er

New Resource To the best of our knowledge, the fully supervised version of this task has never been considered before in the literature so introduce a novel joint probability model.

Morphological Tree Bank English Size

A Joint Model To the best of our knowledge, the fully supervised version of this task has never been considered before in the literature so introduce a novel joint probability model.

Canonical Segmentation Parse Tree unachieveableity Underlying Form unachievability un achieve able ity Canonical Segmentation Parse Tree unachieveableity unachievability un achieve able ity We model the probability of a canonical segmentation – CLICK and an underlying form – CLICK given the surface form of a word – CLICK CLICKThe first factor scores a canonical segmentation underlying form pair. Basically, it asks how good is this pair? For example, un - achieve - able -ity and achieavility. This a structured factor and can be seem as the score of a semi-Markov model.CLICKThe second factror scores a surface segmentation, underlying form pair. Basically, it asks how good is this pair? Now, this notation belies a bit of the complexity. This factor is, again, structured. In fact, in general we have to encoder all possible alignmenet between the two strings. Luckily, we can encode this as a weighted finite-state machine. The paper explains this in detail.CLICKWe put them all together and we get our model. The remaining details such as the feature templates can be found in the paper.PAUSECLICK Word (Surface Form)

(s=un achieve able ity, u=unachieveableity) How good is the tree- underlying form pair? (s=un achieve able ity, u=unachieveableity) How good is the underlying form-word pair? We define this model as being proportional the exponential of a linear model. We can see this as being composed of two difference factors. (u=unachieveableity, w=unachievability)

Inference and Learning Inference is intractable! Approximate inference with importance sampling Decoding also with importance sampling Learning AdaGrad (Duchi et al. 2011) Unfortunately, marginal inference in our model is intractable! We explain why in the paper. As the model is globally normalized, even computing a gradient requires inference. To solve this, we rely on an approximation known as importance sampling. At a high-level, importance sampling takes smaples from an easy-distribution and lets the model rescore them. Decoding a.k.a. MAP infernece also intractable, but, again, we can approximately solve this with importance sampling.Once we get our approximate gradient, using importance sampling, we train the model with AdaGrad.CLICK

Experimental Results Key Point: Do trees help segmentation accuracy? Baseline: flat segmentation model New Task:

Results

Fin. Thank You!