Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

Slides:

Advertisements

Similar presentations

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.

Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity Kjelgaard & Speer 1999 Kent Lee Ψ 526b 16 March 2006.

Results: Word prominence detection models Each feature set increases accuracy over the 69% baseline accuracy. Word Prominence Detection using Robust yet.

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.

1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,

Natural Language Understanding

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Conditional Topic Random Fields Jun Zhu and Eric P. Xing ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica.

1 Determining query types by analysing intonation.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

Generation of F 0 Contours Using a Model-Constrained Data- Driven Method Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Nobuaki Minematsu.

Introduction to Neural Networks and Example Applications in HCI Nick Gentile.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.

National Taiwan University, Taiwan

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Institute of Information Science, Academia Sinica 12 July, IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.

Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.

Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

Data-Driven Intonation Modeling Using a Neural Network and a Command Response Model Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Nobuaki.

Language Identification and Part-of-Speech Tagging

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Pick samples from task t

Research on the Modeling of Chinese Continuous Speech Recognition

Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Bidirectional LSTM-CRF Models for Sequence Tagging

Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Presentation transcript:

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP REPORTER: HUANG-WEI CHEN

Introduction(1/2)  Proper prosodic break prediction would make the synthesized speech sound more natural and more intelligent in terms of intonation and rhythm without losing or destroying the original meaning of the text.  Previous break prediction studies mainly focused on the following two issues: 1.Design or utilization of prediction model(N-gram, ANN, Markov, CART). 2.Utilization of features(POS, word length, sentence length, position in sentence).  This paper focuses on the second issue to propose a statistical linguistic feature called punctuation confidence which is motivated by automatic Chinese punctuation generation and linguistic characteristic of Chinese punctuation system.

Introduction(2/2)  In this study, a condition random field (CRF)-based automatic punctuation generation model is constructed to predict punctuation and generate the associated confidence measure, referred to as punctuation confidence, from punctuation mark (PM)-removed word/POS sequences.  The punctuation confidence can be regarded as a statistical linguistic feature to measure the likelihood of inserting a major PM into the text.  It is reasonable to hypothesize that word junctures which are more likely to be inserted with major PMs in text, are more likely to be inserted with pause breaks in utterance.

Prosody Labeling of Speech Corpus(1/3)  Prosody labeling of the speech:  Seven break types: {B0, B1, B2-1, B2-2, B2-3, B3, B4}, delimit an utterance into a four types of prosodic units, namely syllable (SYL), prosodic word (PW), prosodic phrase (PPh), and breathe group/prosodic phrase group (BG/PG).

Prosody Labeling of Speech Corpus(2/3)  Prosody labeling of the speech:  B4 is defined as a major break accompanying long phrase and apparent F0 reset across adjacent syllables.  B3 is a major break with medium pause and medium F0 reset.  B0 and B1 represent respectively non-breaks of tightly-coupling syllable juncture and normal syllable boundary within a PW, which have no identifiable pause between SYLs.  B2 is a minor break with three variants: F0 reset (B2-1), short pause (B2-2), or pre-boundary syllable duration lengthening.

Prosody Labeling of Speech Corpus(3/3)  Analysis on prosody labeling:  Among various types of prosodic-acoustic features, pause duration is the most salient sue to specify boundaries of prosodic units. Therefore, this study aims to improve the break prediction accuracy of those pause- related break types, i.e. B4, B3, and B2-2.  For seven break types, this study defines four break prediction targets, including (1) B4, (2) B3, (3) B2-2, and (4) non-pause break type (NPB).

Relationship between the labeled break types and PM types(1/2)  It is generally agreed that pause breaks co-occur with PMs.  It can be seen from the table that most PM locations co-occur with pause-related break type (B2-2, B3, and B4), while most intra-word locations map to NPB.  In-between PM and intra-word, non-PM inter-word locations co-occur with NPB, B2-2, and B3.

Relationship between the labeled break types and PM types(2/2)  About 40% of prosodic phrase boundaries (B3s) and over of B2-2 come from non-PM inter-word junctures.  It can be found from that table that the major PM set is highly correlated with major breaks. This implies that a word juncture which tends to insert major PM in text is more likely to be a major break in utterance.

The proposed method(1/2)  MLP = Multi-Layer Perceptrons

CRF-based Punctuation Generator(1/2)  The task of the CRF-based Punctuation Generator can be viewed as a label-tagging problem that labels each lexical word juncture with the presence or absence of a major PM, Y, by using features of lexical word, W, and POS, S.  Where N(W, S) is a normalization factor to ensure that the summation of P(Y|W, S) equals to 1; t stands for lexical word index; Yt represents the presence or absence of a major PM between t-th and (t+1)-th lexical words; I represents the number of feature functions; and fi is a feature function defined by

CRF-based Punctuation Generator(2/2)

Context Analyzer and MLP Break Predictor(1/2)  The Context Analyzer is used to form a basic contextual linguistic feature vector for the break prediction task. 1.6 Boolean flags for 6 PM types 2.5 real numbers for the length (in syllable) of the current/following/previous sentence (delimited by the major PMs), and distances (in syllable) to the previous/following major PMs. 3.(m+n) ×82 POS Boolean flags which consist of 1.Level-3 POS of m previous/n following words: 47 categories proposed by CKIP. 2.Level-2 POS of m previous/n following words: 23 categories merged from Level-3 POS. 3.Level-1 POS of m previous/n following words: substantial word or function word. 4.(m+n) ×5 word length flags which consist of 1.Word lengths of m previous/n following words: word length or 1, 2, 3, 4 and ≧ 5 syllable(s).

Context Analyzer and MLP Break Predictor(2/2)  The input feature vector of the MLP Break Predictor includes: 1.The predicted punctuation 2.The punctuation confidence 3.Basic contextual linguistic feature of (m+n) ×87+11 dimensions.  The MLP Break Predictor is of 3-layer structure with one hidden layer. The output layer consist of five nodes corresponding to NPB, B2-2, B3, B4 and Be (end of the utterance). Both hidden layer and output layer use standard sigmoid functions. The training algorithm for the MLP is the error backpropagation algorithm.

Experimental results(1/4)  The corpus used for training the CRF-based Punctuation Generator was the Academia Sinica Balanced Corpus of Modern Chinese V.4.0 which consists of 9,454,734 word s (or 31,126 paragraphs).  The number of target punctuation classes is 2: one for major PMs { 。, ！, ？, ；, ：, ， } and another for all others.  The CRF model was trained by the CRF++ with a cutoff setting of features that occurred no less than 3 times in the given training data.

Experimental results(2/4)  From table 4, the punctuation prediction is quit well, but, we still need to check the effects of insertion and deletion errors of the punctuation prediction on break prediction.  Interestingly, the generated punctuations hit with major PMs, were more likely to correlate with B4s while those of deletions were more likely to correlate with B3s.  This implies that the generated punctuation has some ability to disambiguate B3 and B4.

Experimental results(3/4)  It can clearly observed that the values of punctuation confidence were higher as target breaks are longer in pause duration.  This shows that the punctuation confidence can be used to help to disambiguate various break targets on both PM and non-PM word junctures.

Experimental results(4/4)

Conclusion  Two statistical linguistic features, predicted punctuation and punctuation confidence, are introduced in this paper to help to improve the performance of prosodic break prediction.  In future words, it is worthwhile to incorporate the punctuation confidence with the durational information of prosodic units for further improvement on prosodic break prediction.