National Taiwan University, Taiwan

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Supervised Learning Recap
REDUCED N-GRAM MODELS FOR IRISH, CHINESE AND ENGLISH CORPORA Nguyen Anh Huy, Le Trong Ngoc and Le Quan Ha Hochiminh City University of Industry Ministry.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Speaker: Yun-Nung Chen 陳縕儂 Advisor: Prof. Lin-Shan Lee 李琳山 National Taiwan University Automatic Key Term Extraction and Summarization from Spoken Course.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Goal: Goal: Learn to automatically  File s into folders  Filter spam Motivation  Information overload - we are spending more and more time.
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Introduction to Machine Learning Approach Lecture 5.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Keyphrase Extraction in Scientific Documents Thuy Dung Nguyen and Min-Yen Kan School of Computing National University of Singapore Slides available at.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Text mining.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Summary  The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Japanese Spontaneous Spoken Document Retrieval Using NMF-Based Topic Models Xinhui Hu, Hideki Kashioka, Ryosuke Isotani, and Satoshi Nakamura National.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Non-Photorealistic Rendering and Content- Based Image Retrieval Yuan-Hao Lai Pacific Graphics (2003)
Tokenization & POS-Tagging
Chapter 23: Probabilistic Language Models April 13, 2004.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Performance Comparison of Speaker and Emotion Recognition
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus Gakuto KURATA, Shinsuke MORI, Masafumi NISHIMURA IBM Research, Tokyo.
Predicting Voice Elicited Emotions
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Sentiment analysis algorithms and applications: A survey
Tools for Natural Language Processing Applications
11.0 Spoken Document Understanding and Organization for User-content Interaction References: 1. “Spoken Document Understanding and Organization”, IEEE.
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Intent-Aware Semantic Query Annotation
Handwritten Characters Recognition Based on an HMM Model
Extracting Why Text Segment from Web Based on Grammar-gram
Automatic Prosodic Event Detection
Presentation transcript:

National Taiwan University, Taiwan Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features Speaker: 黃宥、陳縕儂 Hello, everybody. I am Vivian Chen, coming from National Taiwan University. Today I’m going to present my work about automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features.

Outline Introduction Proposed Approach Experiments & Evaluation Key Term Extraction, NTU Outline Introduction Proposed Approach Branching Entropy Feature Extraction Learning Method Experiments & Evaluation Conclusion First I will define what is key term. A key term is a term that has higher term frequency and includes core content. There are two types of key terms. One of them is phrase, we call it key phrase. For example, “language model” is a key phrase. Another type is single word, we call it keyword, like “entropy”. Then there are two advantages about key term extraction. They can help us index and retrieve. We also can construct the relationships between key terms and segment documents. Here’s an example.

Key Term Extraction, NTU Introduction

Definition Key Term Two types Advantage Higher term frequency Key Term Extraction, NTU Definition Key Term Higher term frequency Core content Two types Keyword Key phrase Advantage Indexing and retrieval The relations between key terms and segments of documents First I will define what is key term. A key term is a term that has higher term frequency and includes core content. There are two types of key terms. One of them is phrase, we call it key phrase. For example, “language model” is a key phrase. Another type is single word, we call it keyword, like “entropy”. Then there are two advantages about key term extraction. They can help us index and retrieve. We also can construct the relationships between key terms and segment documents. Here’s an example.

Introduction Key Term Extraction, NTU We can show some key terms related to acoustic model. If the key term and acoustic model co-occurs in the same document, they are relevant, so that we can show them for users.

Introduction acoustic model language model hmm n gram phone Key Term Extraction, NTU Introduction acoustic model language model hmm n gram phone hidden Markov model Then we can construct the key term graph to represent the relationships between these key terms like this.

Target: extract key terms from course lectures Key Term Extraction, NTU Introduction bigram hmm acoustic model language model n gram hidden Markov model phone Similarly, we can also construct the relation between language model and other terms. Then we can show the whole graph to know the organization of key terms. Target: extract key terms from course lectures

Key Term Extraction, NTU Proposed Approach

Automatic Key Term Extraction Key Term Extraction, NTU Automatic Key Term Extraction ▼ Original spoken documents Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Branching Entropy Feature Extraction ASR speech signal Here’s the flow chart. Now there are a lot of spoken documents.

Automatic Key Term Extraction Key Term Extraction, NTU Automatic Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Branching Entropy Feature Extraction ASR speech signal Here’s the flow chart. Now there are a lot of spoken documents.

Automatic Key Term Extraction Key Term Extraction, NTU Automatic Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Branching Entropy Feature Extraction ASR speech signal Here’s the flow chart. Now there are a lot of spoken documents.

Automatic Key Term Extraction Key Term Extraction, NTU Automatic Key Term Extraction Phrase Identification Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Branching Entropy Feature Extraction ASR speech signal Here’s the flow chart. Now there are a lot of spoken documents. First using branching entropy to identify phrases

Automatic Key Term Extraction Key Term Extraction, NTU Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Branching Entropy Feature Extraction ASR speech signal Key terms entropy acoustic model : Here’s the flow chart. Now there are a lot of spoken documents. Learning to extract key terms by some features

Automatic Key Term Extraction Key Term Extraction, NTU Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Branching Entropy Feature Extraction ASR speech signal Key terms entropy acoustic model : Here’s the flow chart. Now there are a lot of spoken documents.

Branching Entropy hidden Markov model Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : hidden Markov model “hidden” is almost always followed by the same word The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.

Branching Entropy hidden Markov model Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : hidden Markov model “hidden” is almost always followed by the same word “hidden Markov” is almost always followed by the same word The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.

Define branching entropy to decide possible boundary Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : hidden Markov model boundary “hidden” is almost always followed by the same word “hidden Markov” is almost always followed by the same word “hidden Markov model” is followed by many different words The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov. Define branching entropy to decide possible boundary

Branching Entropy hidden Markov model X xi Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : xi hidden Markov model X Definition of Right Branching Entropy Probability of children xi for X Right branching entropy for X The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.

Branching Entropy hidden Markov model X Decision of Right Boundary Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : hidden Markov model X boundary Decision of Right Boundary Find the right boundary located between X and xi where The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.

Branching Entropy hidden Markov model is represent of is in can : : Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : hidden Markov model The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.

Branching Entropy hidden Markov model is represent of is in can : : Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : hidden Markov model The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.

Branching Entropy hidden Markov model is represent of is in can : : Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : hidden Markov model The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.

Using PAT Tree to implement Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? is of in : represent is can : hidden Markov model X boundary Decision of Left Boundary Find the left boundary located between X and xi where X: model Markov hidden The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov. Using PAT Tree to implement

Branching Entropy Implementation in the PAT tree X Key Term Extraction, NTU Branching Entropy How to decide the boundary of a phrase? Implementation in the PAT tree Probability of children xi for X Right branching entropy for X hidden X : hidden Markov x1: hidden Markov model x2: hidden Markov chain state 5 Markov variable 4 X Then in the PAT tree, we can compute right branching entropy for each node. We can take an example to explain p(xi). X is the node representing a phrase hidden Markov, x_1 is X’s child hidden Markov model, x_2 is X’s another child hidden Markov chain. Previously described p of x_i is shown as this one, and then compute right branching entropy of this node. We compute H_r of X for all X in PAT tree and H_l of X bar for all X bar in the reverse PAT tree. We can co chain 3 model 2 distribution 6 1 x2 x1

Automatic Key Term Extraction Key Term Extraction, NTU Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Branching Entropy Feature Extraction ASR speech signal Key terms entropy acoustic model : Here’s the flow chart. Now there are a lot of spoken documents. Extract some features for each candidate term

Speaker tends to use longer duration to emphasize key terms Key Term Extraction, NTU Feature Extraction Prosodic features For each candidate term appearing at the first time Speaker tends to use longer duration to emphasize key terms Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) using 4 values for duration of the term duration of phone “a” normalized by avg duration of phone “a” For each word, we also compute some prosodic features. First, we believe that lecturer would use longer duration to emphasize key terms. For the candidate term first appearing, we compute the duration for each phone in each candidate term. Then we normalize the duration of specific phone by the average duration of this phone. In this feature, we only use four values to represent this term. We can compute maximum, minimum, mean and range over all phones in a single term to be the features.

Higher pitch may represent significant information Key Term Extraction, NTU Feature Extraction Prosodic features For each candidate term appearing at the first time Higher pitch may represent significant information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) We believe that higher pitch may represent important information. So we can extract the pitch contour like this.

Higher pitch may represent significant information Key Term Extraction, NTU Feature Extraction Prosodic features For each candidate term appearing at the first time Higher pitch may represent significant information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 The method is like duration, but the segment unit is changed to single frame. We also use these four values to represent the features.

Higher energy emphasizes important information Key Term Extraction, NTU Feature Extraction Prosodic features For each candidate term appearing at the first time Higher energy emphasizes important information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 Similarly, we think that higher energy may represent important information. We also can extract energy for each frame in a candidate term.

Higher energy emphasizes important information Key Term Extraction, NTU Feature Extraction Prosodic features For each candidate term appearing at the first time Higher energy emphasizes important information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 Energy energy The features are like pitch. The first set of features is shown in this.

Feature Extraction Lexical features Key Term Extraction, NTU Feature Extraction Lexical features Feature Name Feature Description TF term frequency IDF inverse document frequency TFIDF tf * idf PoS the PoS tag Using some well-known lexical features for each candidate term The second set of features is lexical features. These features are well-known, which may indicate the importance of term. We just use these features to represent each term.

Key terms tend to focus on limited topics Key Term Extraction, NTU Feature Extraction Semantic features Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability Key terms tend to focus on limited topics Di: documents Tk: latent topics tj: terms Third set of features is semantic features. The assumption is that key terms tend to focus on limited topics. We use PLSA to compute some semantic features. We can get the probability of each topic given a candidate term.

Feature Extraction Semantic features Key Term Extraction, NTU Feature Extraction Semantic features Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability Key terms tend to focus on limited topics non-key term key term How to use it? Third set of features is semantic features. The assumption is that key terms tend to focus on limited topics. We use PLSA to compute some semantic features. We can get the probability of each topic given a candidate term. Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) describe a probability distribution

Feature Extraction Semantic features Key Term Extraction, NTU Feature Extraction Semantic features Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance Within-topic to out-of-topic ratio Key terms tend to focus on limited topics non-key term within-topic freq. key term out-of-topic freq. Third set of features is semantic features. The assumption is that key terms tend to focus on limited topics. We use PLSA to compute some semantic features. We can get the probability of each topic given a candidate term. Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)

Feature Extraction Semantic features Key Term Extraction, NTU Feature Extraction Semantic features Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance Within-topic to out-of-topic ratio Key terms tend to focus on limited topics non-key term within-topic freq. key term out-of-topic freq. Third set of features is semantic features. The assumption is that key terms tend to focus on limited topics. We use PLSA to compute some semantic features. We can get the probability of each topic given a candidate term. Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

Feature Extraction Semantic features Key Term Extraction, NTU Feature Extraction Semantic features Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy Key terms tend to focus on limited topics non-key term key term Third set of features is semantic features. The assumption is that key terms tend to focus on limited topics. We use PLSA to compute some semantic features. We can get the probability of each topic given a candidate term. Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)

Feature Extraction Semantic features Key Term Extraction, NTU Feature Extraction Semantic features Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy Key terms tend to focus on limited topics non-key term Higher LTE key term Lower LTE Third set of features is semantic features. The assumption is that key terms tend to focus on limited topics. We use PLSA to compute some semantic features. We can get the probability of each topic given a candidate term. Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) LTE term entropy for latent topic

Automatic Key Term Extraction Key Term Extraction, NTU Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Branching Entropy Feature Extraction ASR speech signal Key terms entropy acoustic model : Here’s the flow chart. Now there are a lot of spoken documents. Using learning approaches to extract key terms

Learning Methods Unsupervised learning K-means Exemplar Key Term Extraction, NTU Learning Methods Unsupervised learning K-means Exemplar Transform a term into a vector in LTS (Latent Topic Significance) space Run K-means Find the centroid of each cluster to be the key term The terms in the same cluster focus on a single topic The first one is an unsupervised learning, K-means Examplar. First, we can transform each word into a vector in latent topic significance space, as this equation. With these vectors, we run K-means. The terms in the same cluster focus on a single topic. So we can extract the centroid of each cluster to be the key term, because the terms in the same group are related to this key term. The term in the same group are related to the key term The key term can represent this topic

Learning Methods Supervised learning Key Term Extraction, NTU Learning Methods Supervised learning Adaptive Boosting Neural Network Automatically adjust the weights of features to produce a classifier We also use two supervised learning, adaptive boosting and neural network to automatically adjust the weights of features to produce a classifier.

Experiments & Evaluation Key Term Extraction, NTU Experiments & Evaluation

Experiments 我們的solution是viterbi algorithm Key Term Extraction, NTU Experiments Corpus NTU lecture corpus Mandarin Chinese embedded by English words Single speaker 45.2 hours 我們的solution是viterbi algorithm (Our solution is viterbi algorithm) Then we do some experiments to evaluate our approach. The corpus is NTU lecture, which includes Mandarin Chinese and some English words, like this example. This sentence is ~~, which means our solution is viterbi algorithm, The lecture is from a single speaker, and total corpus is about 45 hours.

Out-of-domain corpora Key Term Extraction, NTU Experiments ASR Accuracy some data from target speaker CH EN SI Model Bilingual AM and model adaptation AM Out-of-domain corpora Background trigram interpolation LM In-domain corpus Adaptive In the ASR system, we train two acoustic models for chinese and english, and use some data to adapt, finally getting a bilingual acoustic model. Language model is trigram interpolation of out-of-domain corpora and some in-domain corpus. Here’s ASR accuracy. Language Mandarin English Overall Char Acc (%) 78.15 53.44 76.26

Experiments Reference Key Terms Key Term Extraction, NTU Experiments Reference Key Terms Annotations from 61 students who have taken the course If the k-th annotator labeled Nk key terms, he gave each of them a score of , but 0 to others Rank the terms by the sum of all scores given by all annotators for each term Choose the top N terms form the list (N is average Nk) N = 154 key terms 59 key phrases and 95 keywords To evaluate our result, we need to generate reference key term list. The reference key terms are from students’ annotations, and these students have taken the course. Then we sort all terms and decide top N to be key terms. N is the average numbers of key terms extracted by students, and N is 154. The reference key term list includes 59 key phrases and 95 keywords.

Experiments Evaluation Unsupervised learning Key Term Extraction, NTU Experiments Evaluation Unsupervised learning Set the number of key terms to be N Supervised learning 3-fold cross validation Finally we evaluate the results. For unsupervised learning, we set the number of key terms to be N. And using 3-fold cross validation evaluates supervised learning.

Experiments Prosodic features and lexical features are additive Key Term Extraction, NTU Experiments Feature Effectiveness Neural network for keywords from ASR transcriptions F-measure 56.55 48.15 42.86 35.63 20.78 At this experiment, we are going to see feature effectiveness. The results only include keywords, and it is from neural network using ASR transcriptions. Row a, b, c perform F1 measure from 20% to 42%. Then row d shows that prosodic and lexical features are additive. Row e proves that adding semantic features can further improve the performance so that three sets of features are all useful. Pr: Prosodic Lx: Lexical Sm: Semantic Prosodic features and lexical features are additive Each set of these features alone gives F1 from 20% to 42% Three sets of features are all useful

Experiments K-means Exempler outperforms TFIDF Key Term Extraction, NTU Experiments AB: AdaBoost NN: Neural Network Overall Performance F-measure 67.31 62.39 55.84 Conventional TFIDF scores w/o branching entropy stop word removal PoS filtering 51.95 23.38 Finally, we show the overall performance for manual and ASR transcriptions. These are baseline, conventional TFIDF without extracting phrases using branching entropy. From the better performance, we can see that branching entropy is very useful. This proves our assumption that the term with higher branching entropy is more likely to be key term. And the best results are from supervised learning using neural network, achieving F1 measure of 67 and 62. Branching entropy performs well Supervised approaches are better than unsupervised approaches K-means Exempler outperforms TFIDF

Experiments Overall Performance Key Term Extraction, NTU Experiments AB: AdaBoost NN: Neural Network Overall Performance F-measure 67.31 62.39 62.70 57.68 55.84 51.95 52.60 43.51 23.38 20.78 Finally, we show the overall performance for manual and ASR transcriptions. These are baseline, conventional TFIDF without extracting phrases using branching entropy. From the better performance, we can see that branching entropy is very useful. This proves our assumption that the term with higher branching entropy is more likely to be key term. And the best results are from supervised learning using neural network, achieving F1 measure of 67 and 62. The performance of ASR is slightly worse than manual but reasonable Supervised learning using neural network gives the best results

Key Term Extraction, NTU Conclusion

Conclusion We propose the new approach to extract key terms Key Term Extraction, NTU Conclusion We propose the new approach to extract key terms The performance can be improved by Identifying phrases by branching entropy Prosodic, lexical, and semantic features together The results are encouraging From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.

Thanks for your attention!  Q & A Key Term Extraction, NTU Thanks for your attention!  Q & A NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lecture