Download presentation
Presentation is loading. Please wait.
Published byEdwin Patterson Modified over 9 years ago
1
12.0 Spoken Document Understanding and Organization References: 1. “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept. 2005, Special Issue on Speech Technology in Human-Machine Communication 2. “Speech-to-text and Speech-to-speech Summarization of Spontaneous Speech”, IEEE Transactions on Speech and Audio Processing, Dec. 2004 3. “Multi-layered Summarization of Spoken Document Archives by Information Extraction and Semantic Structuring”, Interspeech 2006, Pittsburg, USA
2
Multi-media Content in the Future Network Era Integrating All Knowledge, Information and Services Globally Most Attractive Form of the Network Content will be in Multi-media, which usually Includes Speech Information The Speech Information, if Included, usually Tells the Subjects, Topics and Concepts of the Multi-media Content, thus Becomes the Key for Indexing, Retrieval and Browsing Future Integrated Networks Real–time Information – weather, traffic – flight schedule – stock price – sports scores Electronic Commerce – virtual banking – on–line transactions – on–line investments Knowledge Archieves – digital libraries – virtual museums Intelligent Working Environment – e–mail processors – intelligent agents – teleconferencing – distant learning Private Services – personal notebook – business databases – home appliances – network entertainments
3
Network Content Indexing/Retrieval/Browsing in the Future Era of Wireless Multi-media voice information Private and Personal Services Public Information and Services Future Networks voice input/ output Spoken Document Retrieval text information Text-to-Speech Synthesis Spoken Dialogue Multi-media Content Indexed/Retrieved/Browsed Based on the Speech Information User Instructions in either Text or Speech Form Network Access is Primarily Text-based today, but almost all Roles of Texts can be Replaced by Speech in the Future Text-based Retrieval
4
Multi-media/Spoken Document Understanding and Organization ( Ⅰ ) Written Documents are Better Structured and Easier to Browse ---in paragraphs with titles ---easily shown on the screen ---easily decided at a glance if it is what the user is looking for Multi-media/Spoken Documents are just Video/Audio Signals ---not easy to be shown on the screen ---the user can’t go through each one from the beginning to the end during browsing ---better approaches for understanding/organization of multi- media/spoken documents becomes necessary
5
Multi-media/Spoken Document Understanding and Organization ( Ⅱ ) Key Term/Named Entity Extraction from Multi-media/Spoken Documents — personal names, organization names, location names, event names — very often keywords in the multi-media/spoken documents — very often out-of-vocabulary (OOV) words, difficult for recognition Multi-media/Spoken Document Segmentation — automatically segmenting a multi-media/spoken document into short paragraphs, each with a central topic Information Extraction for Multi-media/Spoken Documents — extraction of key information such as who, when, where, what and how for the information described by multi-media/spoken documents. — very often the relationships among the key terms/named entities Summarization for Multi-media/Spoken Documents — automatically generating a summary (in text or speech form) for each short paragraph Title Generation for Multi-media/Spoken Documents — automatically generating a title (in text or speech form) for each short paragraph — very concise summary indicating the topic area Topic Analysis and Organization for Multi-media/Spoken Documents — analyzing the subject topics for the short paragraphs — clustering and organizing the subject topics of the short paragraphs into graphic structures giving the relationships among them for easier access
6
Integration Relationships among the Involved Technology Areas Keyterms/Named Entity Extraction from Spoken Documents Semantic Analysis Information Indexing, Retrieval And Browsing Key Term Extraction from Spoken Documents
7
Key Term Selection ( 1/2 ) Topic Entropy - carries less topical information - carries more topical information
8
Named Entity Extraction HMM-based Approaches Rule-based Approaches Special Approaches Used –context information among different sentences in the same document properly considered –matching with automatically retrieved relevant text news to identify out-of-vocabulary (OOV) words –multi-layered Viterbi search to handle a long named entity composed of several named entities of different types
9
Named Entity Extraction Context Information Extracted — some named entities may not be easily identified from a single sentence, but can be extracted when information in several sentences jointly considered 遊戲橘子高階人事異動 …… … 對於遊戲橘子企圖跨足研發領域 …… 遊戲橘子董事長表示 …… Named Entity Matching using Retrieved Text News Corpus to Identify Some Out-of- Vocabulary (OOV) Words 娜莉 颱風 重創 花蓮縣 壽豐鄉 ( 那裡 ) ( 受封 ) Multi-layered Viterbi Search — handling the situation that a named entity may be the concatenation of several named entities of different types 台北市中正紀念堂是一個熱門的旅遊景點 Confidence Measure Threshold Google Text News Corpora
10
Spoken Document Segmentation Training Phase Segmentation Phase — dividing the word sequence into sentences(s 1, s 2, s 3...) by pause duration — Viterbi search over the Hidden Markav Model of clusters — transition from a cluster C i into another C j is a proper segmentation point Training Corpora (text form, short paragraphs) K-means clustering P (s | C j ) for all clusters C j by N-gram probabilities s: a sentence L clusters, each with a topic, including many short paragraphs C1C1 C2C2 C3C3 P1P1 P1P1 P1P1 P2P2 P2P2 P2P2 P2P2 P 1,P 2 : may be modified by — story length modeling — pause duration modeling P(s|C 1 ) P(s|C 2 ) P(s|C 3 ) d= s 1, s 2, s 3, s 4, s 5, …… ……
11
Spoken Document Summarization Selecting Important Sentences to be Concatenated into a Summary — sentence scoring — given a summarization ratio Selected Sentences Collectively Represent Some Concepts Closest to those of the Complete Document — removing the concepts already mentioned previously — concepts presented smoothly
12
Title Generation for Spoken Documents (1/2) Training Phase Generation Phase For Training Phase — developing statistical relationships between words in the training documents and their human-generated titles For New Spoken Documents — transcribing into term sequences — identifying suitable terms, and using them to generate a readable title Training Documents D={d j,j=1,2,…N} (text form) Computer-generated Titles of the New Spoken Documents T={ t i, i=1,2,…M} (text form, speech form) Human-generated Titles of Training Documents T={t j,j=1,2,…N} (text form) New Spoken Documents D={d i, i=1,2,…M} (speech form)
13
Scored Viterbi Training corpus Term Ordering Model Term Selection Model Title Length Model Spoken document Automatic Summarization Viterbi Algorithm Output Title Summary Title Generation for Spoken Documents (2/2)
14
Topic Analysis and Organization for Spoken Documents Example Approach : on Probabilistic Latent Semantic Analysis (PLSA) — terms (words, syllable pairs, etc.)/documents analyzed by probabilities considering a set of latent topics — trained by EM algorithm — related documents don’t have to share common sets of terms, and related terms don’t have to co-exist in the same set of documents Broadcast News Clustered by the Latent Topics and Organized in a Two-dimensional Tree Structure, or as a Two-layer Map — news stories in the same cluster or in closely located clusters usually address related topics — clusters labeled by terms with highest probabilities — easier to browse related news stories within a cluster or across nearby clusters
15
User’s Query Produces many Retrieved Spoken Documents — Difficult to display on the screen Better User/System Interaction — The system may provide better information about the semantic structure of the retrieved documents to the user — The user may then enter a more precise query to the system Topic Hierarchy User Multi-modal Dialogue Retrieved Documents Spoken Document Archive Retrieval System Query/ Instruction Query-based Local Semantic Structuring for Retrieved Spoken Documents
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.