Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September.

Slides:



Advertisements
Similar presentations
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Advertisements

Deep Learning in NLP Word representation and how to use it for Parsing
Finite State Transducers
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
FSA and HMM LING 572 Fei Xia 1/5/06.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
1 Sequence Labeling Raymond J. Mooney University of Texas at Austin.
Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen {zhli13, minzhang, Soochow University, China Coupled Sequence.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Intro to NLP - J. Eisner1 Part-of-Speech Tagging A Canonical Finite-State Task.
Ronan Collobert Jason Weston Leon Bottou Michael Karlen Koray Kavukcouglu Pavel Kuksa.
1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.
Graphical models for part of speech tagging
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Introduction to CL & NLP CMSC April 1, 2003.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
ICS 482: Natural language Processing Pre-introduction
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
Yuya Akita , Tatsuya Kawahara
CSA3202 Human Language Technology HMMs for POS Tagging.
Supertagging CMSC Natural Language Processing January 31, 2006.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Learning From Measurements in Exponential Families Percy Liang, Michael I. Jordan and Dan Klein ICML 2009 Presented by Haojun Chen Images in these slides.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Lecture 7: Constrained Conditional Models
Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita
Part-of-Speech Tagging
PRESENTED BY: PEAR A BHUIYAN
Juicer: A weighted finite-state transducer speech decoder
Web News Sentence Searching Using Linguistic Graph Similarity
An overview of decoding techniques for LVCSR
CSCI 5832 Natural Language Processing
Automatic Speech Recognition: Conditional Random Fields for ASR
CS4705 Natural Language Processing
Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^
A Joint Model of Orthography and Morphological Segmentation
Presentation transcript:

Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September 18 th, 2008 A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis ECML PKDD 2008, Antwerp

Speech and Hearing Research Center, Peking University Cascaded Subtasks in NLP Chunking and Parsing Word Segmentation and Named Entity Recognition POS Tagging Word Sense Disambiguation Drawbacks:  Errors introduced by earlier subtasks propagate through the pipeline and will never be recovered in downstream subtasks.  The information sharing among different subtasks is prohibited by this pipeline manner.

Speech and Hearing Research Center, Peking University Researchers’ Efforts on Joint Processing  Reranking (Shi, 2007; Sutton, 2005; Zhang, 2003) As an approximation of joint processing, it may miss the true optimal result, which often lies out of the k-best list.  Taking multiple subtasks as a single one (Luo, 2003; Miller, 2000; Yi, 2005; Nakagawa, 2007, Ng, 2004) The obstacle is the requirement of corpus annotated with multi-level information.  Unified probabilistic models (Sutton, 2004, Duh, 2005) Dynamical Conditional Random Fields (DCRFs) and Factorial Hidden Markov Models (FHMMs), which are trained jointly and performs the subtasks all at once. Both DCRFs and FHMMs suffer from the absence of multi-level data annotation.

Speech and Hearing Research Center, Peking University A Unified Framework for Joint Processing  A WFSTs based approach is presented to jointly perform a cascade of segmentation and labeling tasks, which holds two remarkable features as below: WFST offers a unified framework that can represent many widely used models, like lexical constraints, n-gram language model and Hidden Markov Models (HMMs), and thus a unified transducer representation for modeling multiple knowledge sources can be achieved. Multiple WFSTs can be integrated into a fully composed single WFST, which makes it possible to perform a cascade of subtasks with a one-pass decoding.

Speech and Hearing Research Center, Peking University Weighted Finite State Transducers (WFSTs)  The WFST is the generalization of the finite state automata, which is capable of realizing a weighted relation between strings.  Composition operation Example of WFSTs composition. Two simple WFSTs are showed in (a) and (b), in which states are represented by circles and labeled with their unique numbers. The bold circles represented initial states and double circles of final states. The input and output labels as well as weight of transition t are marked as in(t):out(t)/weight(t). In (c), the composition of (a) and (b) is illustrated.

Speech and Hearing Research Center, Peking University Joint Chinese Lexical Analysis  The WFST based approach Uniform Representation for Multiple Subtask Models. Integration of Multiple Models.  Tasks word segmentation, part-of-speech tagging, and person and location names recognition.

Speech and Hearing Research Center, Peking University Multiple Subtasks Modeling  An n-gram language model based on word classes is adopted for word segmentation.  Hidden Markov Models (HMMs) are adopted both for names recognition and POS tagging.  In names recognition, both Chinese characters and words are considered as model units, and it is performed with word segmentation simultaneously

Speech and Hearing Research Center, Peking University The Pipeline System vs. The Joint System Pipeline BaselineIntegrated Analyzer Decode Compose The Best Segmentation Output Decode Compose Output

Speech and Hearing Research Center, Peking University Simulation Setup  Corpus: People’s Daily of China annotated by the Institute of Computational Linguistics of Peking University 01-05(98) is used as the training set 06(98) is the test set The first 2000 sentences of the test set are taken as the development set System Word Segmentation F1(%) POS Tagging F1(%) Person Names Recognition F1(%) Place Names Recognition F1(%) Pipeline Baseline Integrated Analyzer

Speech and Hearing Research Center, Peking University The Statistical Significance Test  The approximate randomization approach (Yeh, 2000) is adopted to test the performance improvement produced by the joint processing. The evaluation metric F1-value of word segmentation is tested. The responses for each sentence produced by two systems are shuffled and equally resigned to each system, and then the significance level is computed based on the shuffled results 10 sets, 500 sentences for each, are randomly selected and tested. For all the selected 10 sets, the significance level p-values are all far smaller than

Speech and Hearing Research Center, Peking University Discussions  This approach holds the full search space and chooses the optimal results based on the multi-level sources, rather than reranking the k- best candidates.  The models for each level subtask are trained separately, while the decoding is conducted jointly. Accordingly, it avoids the necessary of corpus annotated with multi-level information.  In the case when a segmentation task precedes a labeling task, the WFSTs based approach naturally ensures the consistency restriction imposed by the segmentation.  The unified framework of WFSTs provides the opportunity to easily apply the presented analyzer in other natural language related applications which are also based on WFSTs, such as speech recognition and machine translation

Speech and Hearing Research Center, Peking University Conclusion  In this research, within the unified framework of WFSTs, a joint processing approach is presented to perform a cascade of segmentation and labeling subtasks.  It has been demonstrated that the joint processing is superior to the traditional pipeline manner.  The finding suggests two directions for future research More linguistic knowledge will be integrated in the analyzer, such as organization names recognition and shallow parsing. Since rich linguistic knowledge will play an important role for the tough tasks, such as ASR and MT, incorporating our integrated analyzer may lead to a promising performance improvement.

Speech and Hearing Research Center, Peking University Thank you for your attention!

Speech and Hearing Research Center, Peking University Uniform Representation (1) Lexicon WFSTs. (a) is the FSA representing an input example; (b) is the FST representing a toy dictionary.

Speech and Hearing Research Center, Peking University Uniform Representation (2) The WFSA representing a toy bigram language model, where un(w1) denotes the unigram of w1; bi(w1;w2) and back(w1) respectively denotes the bigram of w2 and the backoff weight given the word history w1. ClassesDescription wiwi The i th word listed in the dictionary CNAMEChinese person names TNAMETranslated person names LOCLocation names NUMNumber expressions LETTERLetter strings NON Other non Chinese character strings BEGINBeginnings of sentences ENDEnds of sentences

Speech and Hearing Research Center, Peking University Uniform Representation (3) POS WFSTs. (a) is the WFST representing the relationship between the word and the pos; (b) is the WFSA representing a toy bigram of POS surname the first character of the given name The second character of the given name CNAME

Speech and Hearing Research Center, Peking University The Statistical Significance Test  The approximate randomization approach (Yeh, 2000). The responses for each sentence produced by two systems are shuffled and equally resigned to each system, and then the significance level is computed based on the shuffled results. The shuffle times is fixed as: Since in our test set there are more than 21,000 sentences, the use of 2 20 shuffles to approximate shuffles turns unreasonable any more. Thus, 10 sets, 500 sentences for each, are randomly selected and tested. For all the selected 10 sets, the significance level p-values are all far smaller than