Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human – Network Voice Interface in A Wireless Era

Similar presentations


Presentation on theme: "Human – Network Voice Interface in A Wireless Era"— Presentation transcript:

1 Human – Network Voice Interface in A Wireless Era

2 Future Integrated Networks
Information–related Activities, Applications and Services in Future Network Era Future Integrated Networks Real–time Information weather, traffic flight schedule stock price sports scores Electronic Commerce virtual banking on–line transactions on–line investments Knowledge Archieves digital libraries virtual museums Intelligent Working Environment e–mail processors intelligent agents teleconferencing distant learning Private Services personal notebook business databases home appliances network entertainments Multi–media, Multi–lingual, Multi–functionalities • Cross–cultures, Cross–domains, Cross–regions • Integrating All Knowledge Systems and Information–related Activities and Services Globally Multiple User Terminals telephone set, hand set, PDA, vehicular electronics, home appliance, personal computer, etc.

3 Wireless Access of Global Multi–media Information
At Any Time, from Anywhere As Handset Size Shrinks While Required Functionalities Grows and the User Environment Changes, Voice Interface will be Useful for all User Terminals Examples voice retrieval,voice browser, voice portal, voice web spoken dialogue based access to intelligent agents

4 Scenario for Network Information Access
speech information Text-to-speech Synthesis Public Services/ Information/Knowledge text information Spoken Dialogue speech Information Retrieval Internet Private Services/ Databases/ Applications text, image, video, speech, …

5 Convergence of PSTN and Internet
PSTN(for Voice) and Internet(for Data and Multi-media Contents) are Converging handsets Internet PSTN PCs servers telephones Driving Force for the Convergence “anywhere, any time” of wireless services voice provides the most convenient and natural interaction interface attractive contents over the Internet contents(human information) are why the Internet is attractive, while voice directly carries human information Speech-enabled Access of Web-based Applications

6 Voice Interface for Human-network Interaction
– huge volumes of data disseminated across the globe by optical fiber networks – any time, from anywhere by wireless terminals – vehicular electronics, PDA, handset, home appliance, etc. new platforms accessing the global network information/services – traditional keyboard/mouse not adequate any longer size shrinkage, different user environment, etc. desired functionalities/human–network interactions increasing – voice interface will be one out of the few most important, natural, user friendly, attractive interface – examples: voice retrieval, voice browser, voice portal, voice web voice–based web–user interaction voice–based web tools/Application Interfaces, etc. – voice interface is the only major “missing link” in the “semi–mature” technology chain

7 Functionalities for Voice Interface
Core Technologies / Functionalities for Voice Interface

8 Speech Recognition as a pattern recognition problem
Feature Extraction unknown speech signal Pattern Matching Decision Making x(t) W X output word feature vector sequence Reference Patterns y(t) Y training speech

9 Basic Approach for Large Vocabulary Speech Recognition
A Simplified Block Diagram Example Input Sentence this is speech Acoustic Models (th-ih-s-ih-z-s-p-ih-ch) Lexicon (th-ih-s) → this (ih-z) → is (s-p-iy-ch) → speech Language Model (this) – (is) – (speech) P(this) P(is | this) P(speech | this is) P(wi|wi-1) bi-gram language model P(wi|wi-1,wi-2) tri-gram language model,etc Front-end Signal Processing Acoustic Models Lexicon Feature Vectors Linguistic Decoding and Search Algorithm Output Sentence Speech Corpora Model Training Language Construction Text Lexical Knowledge-base Input Speech ICG Grammar

10 Speech Recognition Technologies, Applications and Problems
Word Recognition voice command/instructions Keyword Spotting identifying the keywords out of a pre-defined keyword set from input voice utterances Large Vocabulary Continuous Speech Recognition entering longer texts remote dictation Speaker Dependent/Independent/Adaptive Acoustic Reception/Background Noise/Channel Distortion Read/Spontaneous/Conversational Speech

11 Text Analysis and Letter-to-sound Conversion
Text-to-speech Synthesis Transforming any input text into corresponding speech signals /Web page reading Prosodic modeling Basic voice units/rule-based, non-uniform units/corpus-based Text Analysis and Letter-to-sound Conversion Prosody Generation Signal Processing and Concatenation Lexicon and Rules Prosodic Model Voice Unit Database Input Text Output Speech Signal

12 Speaker Verification Verifying the speaker as claimed
Applications requiring verification Text dependent/independent Integrated with other verification schemes input speech Feature Extraction Verification yes/no Speaker Models

13 Information Retrieval Including Voice
Text Documents/Instructions Speech Documents/Instructions Voice Personal Notebook/Private Database speech instruction 我想找有關新政府組成的新聞? text instruction d1 text documents d2 d3 speech documents 總統當選人陳水扁今天早上…

14 Multi-lingual Functionalities
Code-Switching Problem English words/phrases inserted in Spoken Chinese sentences 人人都用Computers,家家都上Internet the whole sentence switched to English 準備好了嗎?Let’s go! Cross-language Network Information Processing globalized network with multi-lingual content/users cross-language network information processing with spoken Chinese language input as an example Chinese Dialects/Accents Taiwanese, Cantonese, Shanghainese, etc. hundreds of Chinese dialects code-switching problem─dialects mixed with Mandarin(or plus English) Mandarin with a variety of strong accents Language Dependent/Independent Technologies

15 Speech Recognition and Understanding
Spoken Dialogue Systems Almost all human-network interactions can be made by spoken dialogue Speech understanding System/user/mixed initiatives Reliability/efficiency, dialogue modeling/flow control Databases Sentence Generation and Speech Synthesis Output Speech Input Speech Dialogue Manager Speech Recognition and Understanding User’s Intention Discourse Context Response to the user Internet Networks Users Dialogue Server


Download ppt "Human – Network Voice Interface in A Wireless Era"

Similar presentations


Ads by Google