Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research.

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 11.
A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;
Word-level Dependency-structure Annotation to Corpus of Spontaneous Japanese and Its Application Kiyotaka Uchimoto* Yasuharu Den † *National Institute.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Midterm Review CS4705 Natural Language Processing.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Feature Selection for Regression Problems
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
LING 388: Language and Computers Sandiway Fong Lecture 27: 11/30.
Three kinds of learning
Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Dependency Parsing with Reference to Slovene, Spanish and Swedish Simon Corston-Oliver Anthony Aue Microsoft Research.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Some Advances in Transformation-Based Part of Speech Tagging
National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
A Language Independent Method for Question Classification COLING 2004.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN.
Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Tokenization & POS-Tagging
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.
2007/4/201 Extracting Parallel Texts from Massive Web Documents Chikayama Taura lab. M2 Dai Saito.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran – Dikkala Sai Nishanth – Ashwin P. Paranjape
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
Classification and Regression Trees
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
LING 388: Language and Computers Sandiway Fong Lecture 23.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
RELATIVE CLAUSES Adjectival Clauses/Modifiers. RELATIVE CLAUSES A relative clause is the part of a sentence which describes a noun Eg. The cake (which)
Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics,
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.
PRESENTED BY: PEAR A BHUIYAN
Conditional Random Fields for ASR
CS4705 Natural Language Processing
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
CS4705 Natural Language Processing
Dependency Model Using Posterior Context
Presentation transcript:

Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University

Outline FBackground FProbability model for estimating dependency likelihood FExperiments and discussion FConclusion

Background l Preparing a dependency matrix l Finding an optimal set of dependencies for the entire sentence dependency 太郎は赤いバラを買いました。 Taro bought a red rose. 太郎は 赤い バラを 買いました。 Taro_wabara_wokai_mashita Tarorosebought 太郎 はバラ を買い ました。 赤 い Aka_i red bunsetsu FJapanese dependency structure analysis

Background (2) FApproaches to preparing a dependency matrix l Rule-based approach Several problems with handcrafted rules –Coverage and consistency –The rules have to be changed according to the target domain. l Corpus-based approach

Background (3) FCorpus-based approach l Learning the likelihoods of dependencies from a tagged corpus (Collins, 1996; Fujio and Matsumoto, 1998; Haruno et al., 1998) l Probability estimation based on the maximum entropy models (Ratnaparkhi, 1997) FMaximum Entropy model l learns the weights of given features from a training corpus

Probability model FAssigning one of two tags l Whether or not there is a dependency between two bunsetsus l Probabilities of dependencies are estimated from the M. E. model. FOverall dependencies in a sentence l Product of probabilities of all dependencies Assumption: Dependencies are independent of each other.  or :bunsetsu dependency

M. E. model

Feature sets FBasic features (expanded from Haruno’s list (Haruno, 1998)) l Attributes on a bunsetsu itself Character strings, parts of speech, and inflection types of bunsetsu l Attributes between bunsetsus Existence of punctuation, and the distance between bunsetsus FCombined features

abcd e Anterior bunsetsu Posterior bunsetsu Taro_wabara_wokai_mashita Tarorosebought 太郎 はバラ を買い ました。 dependency 赤 い Aka_i red Feature sets FBasic features: a, b, c, d, e FCombined features l Twin: (b,c), Triplet: (b,c,e), Quadruplet: (a,b,c,d), Quintuplet: (a,b,c,d,e) “Head” “Type”

Algorithm FDetect the dependencies in a sentence by analyzing it backwards (from right to left). l Characteristics of Japanese dependencies Dependencies are directed from left to right Dependencies do not cross A bunsetsu, except for the rightmost one, depends on only one bunsetsu In many cases, the left context is not necessary to determine a dependency FBeam search

Experiments FUsing the Kyoto University text corpus (Kurohashi and Nagao, 1997) l a tagged corpus of the Mainichi newspaper l Training: 7,958 sentences (Jan. 1st to 8th) l Testing: 1,246 sentences (Jan. 9th) FThe input sentences were morphologically analyzed and their bunsetsus were identified correctly.

Results of dependency analysis When analyzing a sentence backwards, the previous context has almost no effect on the accuracy.

Relationship between the number of bunsetsus and accuracy The accuracy does not significantly degrade with increasing sentence length.

abcd e Anterior bunsetsu Posterior bunsetsu “Head” “Type” Features and accuracy FExperiments without the feature sets l Useful basic features Type of the anterior bunsetsu (-17.41%) and the part-of-speech tag of the head word on the posterior bunsetsu (-10.99%) Distance between bunsetsus (-2.50%), the existence of punctuation in the bunsetsu (-2.52%), and the existence of brackets (-1.06%) è preferential rules with respect to the features

Features and accuracy FExperiments without the feature sets l Combined features are useful (-18.31%). è Basic features are related to each other.

Lexical features and accuracy FExperiment with the lexical features of the head word l Better accuracy than that without them (-0.84%) l Many idiomatic expressions They had high dependency probabilities. –“ 応じて (oujite, according to)--- 決める (kimeru, decide)” –“ 形で (katachi_de, in the form of) --- 行われる (okonawareru, be held)” More training data è Expect to collect more of such expressions

Number of training data and accuracy Accuracy of 81.84% even with 250 sentences M. E. framework has suitable characteristics for overcoming the data sparseness problem.

Comparison with related works

Comparison with related works (2) FCombining a parser based on a handmade CFG and a probabilistic dependency model (Shirai, 1998) l Using several corpora: the EDR corpus, RWC corpus, and Kyoto University corpus. FAccuracy achieved by our model was about 3% higher than that of Shirai’s model. l Using a much smaller set of training data.

Comparison with related works (3) FM. E. model (Ehara, 1998) l Set of similar kinds of features to ours Only the combination of two features l Using TV news articles for training and testing Average sentence length = 17.8 bunsetsus cf. 10 in the Kyoto University corpus FDifference in the combined features l We also use triplet, quadruplet, and quintuplet features (+5.86%). l Accuracy of our system was about 10% higher than Ehara’s system.

Comparison with related works (4) FMaximum Likelihood model (Fujio, 1998) FDecision tree models and a boosting method (Haruno, 1998) l Set of similar kinds of features to ours l Using the EDR corpus for training and testing EDR corpus is ten times as large as our corpus. l Accuracy was around 85%, which is slightly worse than ours.

Comparison with related works (5) FExperiments with Fujio’s and Haruno’s feature sets l The important factor in the statistical approaches is feature selection.

Future work FFeature selection l Automatic feature selection (Berger, 1996, 1998; Shirai, 1998) FConsidering new features l How to deal with coordinate structures Taking into account a wide range of information

Conclusion FJapanese dependency structure analysis based on the M. E. model. l Dependency accuracy of our system 87.2% using the Kyoto University corpus l Experiments without feature sets Some basic and combined features strongly contribute to improve the accuracy. l Number of training data and accuracy Good accuracy even with a small set of training data M. E. framework has suitable characteristics for overcoming the data sparseness problem.