Dependency Model Using Posterior Context

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

An Interactive-Voting Based Map Matching Algorithm
Word-level Dependency-structure Annotation to Corpus of Spontaneous Japanese and Its Application Kiyotaka Uchimoto* Yasuharu Den † *National Institute.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research.
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Some Advances in Transformation-Based Part of Speech Tagging
Graphical models for part of speech tagging
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Understanding the Chinese Firewall (continued) Dr. Crandall, Leif, Tony, Ronnie, Veronika Review, Chinese Firewall Maximum Entropy Point Feature Comparison.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Automatic Readability Evaluation Using a Neural Network Vivaek Shivakumar October 29, 2009.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
John Lafferty Andrew McCallum Fernando Pereira
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Application of Maximum Entropy Principle to software failure prediction Wu Ji Software Engineering Institute BeiHang University.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
The Process The Results The Repository of Assessment Documents (ROAD) Project Sample Characteristics (“All” refers to all students enrolled in ENGL 1551)
1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Automatic Writing Evaluation
Statistical Machine Translation Part II: Word Alignments and EM
Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita
Chapter 3: Maximum-Likelihood Parameter Estimation
Autobiography vs. Biography
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
CRF &SVM in Medication Extraction
Ch3: Model Building through Regression
For Evaluating Dialog Error Conditions Based on Acoustic Information
Background & Overview Proposed Model Experimental Results Future Work
CS4705 Natural Language Processing
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
AN ANALYSIS OF TWO COMMON REFERENCE POINTS FOR EEGS
Expectation-Maximization Algorithm
POSTERIOR VIEW SKELETON ANTERIOR.
ALI assignment – see amended instructions
Introduction to Text Analysis
Using the z-Table: Given an Area, Find z
Preposition error correction using Graph Convolutional Networks
DESIGN OF EXPERIMENTS by R. C. Baker
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
CS249: Neural Language Model
Presentation transcript:

Dependency Model Using Posterior Context Kiyotaka Uchimoto † Masaki Murata † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory, Japan ‡ New York University, USA

Background Japanese dependency structure analysis 太郎は赤いバラを買いました。 Taro bought a red rose. dependency 太郎は 赤い バラを 買いました。 Taro_wa bara_wo kai_mashita Taro rose bought 太郎 は バラ を 買い ました。 赤 い Aka_i red bunsetsu Preparing a dependency matrix Finding an optimal set of dependencies for the entire sentence

Conventional (old) model :bunsetsu dependency  Statistical approach Each element in the dependency matrix is estimated as a probability. Assigning one of two tags, a “1” or a “0,” to each relationship between two bunsetsus Whether or not there is a dependency between two bunsetsus Considers only the relationship between two bunsetsus. or “1” “0”

New model using posterior context dependent between beyond  or or “1” “0” “0” “2” A relationship between two bunsetsus The anterior bunsetsu can depend on “0”: a bunsetsu between the two “1”: the posterior bunsetsu “2”: a bunsetsu beyond the posterior one The dependency probability of two bunsetsus Product of the probabilities of the relationship between the left bunsetsu and those to its right in a sentence Overall dependencies in a sentence Product of the probabilities of all the dependencies Identified by analyzing a sentence from right to left

Bunsetsu :Current bunsetsu 1 2 3 4 5 :Modifiee candidates Normalized dependency probability dpnd btwn 0.4 × 0.1 × 1.0 × 1.0 × 0.6 = 0.155 18.0% bynd dpnd btwn 0.6 × 0.3 × 1.0 × 1.0 × 0.6 = 0.329 38.1% Candidate Beyond (bynd) Dependent (dpnd) Between (btwn) 1 0.6 0.4 2 0.3 0.1 3 0.5 0.2 4 5 bynd dpnd btwn 0.6 × 0.6 × 0 × 1.0 × 0.6 = 0 bynd dpnd btwn 0.6 × 0.6 × 1.0 × 0 × 0.6 = 0 bynd dpnd 0.6 × 0.6 × 1.0 × 1.0 × 0.4 = 0.379 43.9%

Experiments Implemented the models within a maximum entropy framework Features: basically some attributes of a bunsetsu itself or those between bunsetsus Using the Kyoto University text corpus (Kurohashi and Nagao, 1997) a tagged corpus of the Mainichi newspaper Training: 7,958 sentences (Jan. 1st to 8th) Testing: 1,246 sentences (Jan. 9th) The input sentences were morphologically analyzed and their bunsetsus were identified correctly.

Results of dependency analysis The accuracy of the new model was about 1% better than that of the old model and there was a 3% improvement in sentence accuracy even using exactly the same features.

Relationship between the number of bunsetsus and accuracy The accuracy of the new model is almost always better than that of the old model.

Amount of training data and accuracy The accuracy of the new model is about 1% higher than that of the old model for any size of training data.

Conclusion A new model for dependency structure analysis Learns the relationship between two bunsetsus as three categories; “between,” “dependent,” and “beyond.” Estimates the dependency likelihood by considering not only the relationship between two bunsetsus but also the relationship between the left bunsetsu and all of the bunsetsus to its right. The dependency accuracy of the new model was Almost always better than that of the old model for any sentence length. About 1% higher than that of the old model for any size of training data used. Future work Applying the similar model to English sentences