12.0 Spoken Document Understanding and Organization References: 1. “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept.

12.0 Spoken Document Understanding and Organization References: 1. “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept. 2005, Special Issue on Speech Technology in Human-Machine Communication 2. “Speech-to-text and Speech-to-speech Summarization of Spontaneous Speech”, IEEE Transactions on Speech and Audio Processing, Dec. 2004 3. “Multi-layered Summarization of Spoken Document Archives by Information Extraction and Semantic Structuring”, Interspeech 2006, Pittsburg, USA

Multi-media Content in the Future Network Era Integrating All Knowledge, Information and Services Globally Most Attractive Form of the Network Content will be in Multi-media, which usually Includes Speech Information The Speech Information, if Included, usually Tells the Subjects, Topics and Concepts of the Multi-media Content, thus Becomes the Key for Indexing, Retrieval and Browsing Future Integrated Networks Real–time Information – weather, traffic – flight schedule – stock price – sports scores Electronic Commerce – virtual banking – on–line transactions – on–line investments Knowledge Archieves – digital libraries – virtual museums Intelligent Working Environment – e–mail processors – intelligent agents – teleconferencing – distant learning Private Services – personal notebook – business databases – home appliances – network entertainments

Network Content Indexing/Retrieval/Browsing in the Future Era of Wireless Multi-media voice information Private and Personal Services Public Information and Services Future Networks voice input/ output Spoken Document Retrieval text information Text-to-Speech Synthesis Spoken Dialogue Multi-media Content Indexed/Retrieved/Browsed Based on the Speech Information User Instructions in either Text or Speech Form Network Access is Primarily Text-based today, but almost all Roles of Texts can be Replaced by Speech in the Future Text-based Retrieval

Multi-media/Spoken Document Understanding and Organization ( Ⅰ ) Written Documents are Better Structured and Easier to Browse ---in paragraphs with titles ---easily shown on the screen ---easily decided at a glance if it is what the user is looking for Multi-media/Spoken Documents are just Video/Audio Signals ---not easy to be shown on the screen ---the user can’t go through each one from the beginning to the end during browsing ---better approaches for understanding/organization of multi- media/spoken documents becomes necessary

Multi-media/Spoken Document Understanding and Organization ( Ⅱ ) Key Term/Named Entity Extraction from Multi-media/Spoken Documents — personal names, organization names, location names, event names — very often keywords in the multi-media/spoken documents — very often out-of-vocabulary (OOV) words, difficult for recognition Multi-media/Spoken Document Segmentation — automatically segmenting a multi-media/spoken document into short paragraphs, each with a central topic Information Extraction for Multi-media/Spoken Documents — extraction of key information such as who, when, where, what and how for the information described by multi-media/spoken documents. — very often the relationships among the key terms/named entities Summarization for Multi-media/Spoken Documents — automatically generating a summary (in text or speech form) for each short paragraph Title Generation for Multi-media/Spoken Documents — automatically generating a title (in text or speech form) for each short paragraph — very concise summary indicating the topic area Topic Analysis and Organization for Multi-media/Spoken Documents — analyzing the subject topics for the short paragraphs — clustering and organizing the subject topics of the short paragraphs into graphic structures giving the relationships among them for easier access

Integration Relationships among the Involved Technology Areas Keyterms/Named Entity Extraction from Spoken Documents Semantic Analysis Information Indexing, Retrieval And Browsing Key Term Extraction from Spoken Documents

Key Term Selection （ 1/2 ） Topic Entropy - carries less topical information - carries more topical information

Named Entity Extraction HMM-based Approaches Rule-based Approaches Special Approaches Used –context information among different sentences in the same document properly considered –matching with automatically retrieved relevant text news to identify out-of-vocabulary (OOV) words –multi-layered Viterbi search to handle a long named entity composed of several named entities of different types

Named Entity Extraction Context Information Extracted — some named entities may not be easily identified from a single sentence, but can be extracted when information in several sentences jointly considered 遊戲橘子高階人事異動 …… … 對於遊戲橘子企圖跨足研發領域 …… 遊戲橘子董事長表示 …… Named Entity Matching using Retrieved Text News Corpus to Identify Some Out-of- Vocabulary (OOV) Words 娜莉颱風重創花蓮縣壽豐鄉 ( 那裡 ) ( 受封 ) Multi-layered Viterbi Search — handling the situation that a named entity may be the concatenation of several named entities of different types 台北市中正紀念堂是一個熱門的旅遊景點 Confidence Measure Threshold Google Text News Corpora

Spoken Document Segmentation Training Phase Segmentation Phase — dividing the word sequence into sentences(s 1, s 2, s 3...) by pause duration — Viterbi search over the Hidden Markav Model of clusters — transition from a cluster C i into another C j is a proper segmentation point Training Corpora (text form, short paragraphs) K-means clustering P (s | C j ) for all clusters C j by N-gram probabilities s: a sentence L clusters, each with a topic, including many short paragraphs C1C1 C2C2 C3C3 P1P1 P1P1 P1P1 P2P2 P2P2 P2P2 P2P2 P 1,P 2 : may be modified by — story length modeling — pause duration modeling P(s|C 1 ) P(s|C 2 ) P(s|C 3 ) d= s 1, s 2, s 3, s 4, s 5, …… ……

Spoken Document Summarization Selecting Important Sentences to be Concatenated into a Summary — sentence scoring — given a summarization ratio Selected Sentences Collectively Represent Some Concepts Closest to those of the Complete Document — removing the concepts already mentioned previously — concepts presented smoothly

Title Generation for Spoken Documents (1/2) Training Phase Generation Phase For Training Phase — developing statistical relationships between words in the training documents and their human-generated titles For New Spoken Documents — transcribing into term sequences — identifying suitable terms, and using them to generate a readable title Training Documents D={d j,j=1,2,…N} (text form) Computer-generated Titles of the New Spoken Documents T={ t i, i=1,2,…M} (text form, speech form) Human-generated Titles of Training Documents T={t j,j=1,2,…N} (text form) New Spoken Documents D={d i, i=1,2,…M} (speech form)

Scored Viterbi Training corpus Term Ordering Model Term Selection Model Title Length Model Spoken document Automatic Summarization Viterbi Algorithm Output Title Summary Title Generation for Spoken Documents (2/2)

Topic Analysis and Organization for Spoken Documents Example Approach ： on Probabilistic Latent Semantic Analysis (PLSA) — terms (words, syllable pairs, etc.)/documents analyzed by probabilities considering a set of latent topics — trained by EM algorithm — related documents don’t have to share common sets of terms, and related terms don’t have to co-exist in the same set of documents Broadcast News Clustered by the Latent Topics and Organized in a Two-dimensional Tree Structure, or as a Two-layer Map — news stories in the same cluster or in closely located clusters usually address related topics — clusters labeled by terms with highest probabilities — easier to browse related news stories within a cluster or across nearby clusters

User’s Query Produces many Retrieved Spoken Documents — Difficult to display on the screen Better User/System Interaction — The system may provide better information about the semantic structure of the retrieved documents to the user — The user may then enter a more precise query to the system Topic Hierarchy User Multi-modal Dialogue Retrieved Documents Spoken Document Archive Retrieval System Query/ Instruction Query-based Local Semantic Structuring for Retrieved Spoken Documents

12.0 Spoken Document Understanding and Organization References: 1. “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept.

Similar presentations

Presentation on theme: "12.0 Spoken Document Understanding and Organization References: 1. “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

12.0 Spoken Document Understanding and Organization References: 1. “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept.

Similar presentations

Presentation on theme: "12.0 Spoken Document Understanding and Organization References: 1. “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept."— Presentation transcript:

Similar presentations

About project

Feedback