Download presentation
Presentation is loading. Please wait.
Published byOswin Lambert Modified over 9 years ago
1
Some Voice Enable Component Group member: CHUAH SIONG YANG 499410001 LIM CHUN HEAN 400415001 Advisor: Professor MICHEAL Project Purpose: For the developers, a more general, kindly user interface for application is important to a product. In the definition of user interface, is the system by which users interact with a machine, and includes hardware(physical) and software(logical) components. Generally, the goal of improvement is to provide a minimal input to achieve the desired output, and also minimizes undesired output. And we are working on the VUI(voice user interface), which makes human interaction with computer possible through a voice/speech platform in order to initiate an automated service or process. A accurate output for users is important, and the VUI must respond quickly, people do not have patient to wait a few seconds for the results, so when we use the VUI at some mobile devices, a lightly VUI is more useful. Because mobile devices mostly have no keyboard, they often use a touch screen keyboard to replace the input method of keyboard, but button-pressing on devices with such small buttons can be tedious and inaccurate, so an easy-to-use, accurate, and reliable VUI would potentially be a major breakthrough in the ease of use. We choose the pocketsphinx, which is a product of CMUSphinx(Carnegie Mellon University's Sphinx), and is a free open source for us to use. Speech Recognition: We use the pocketsphinx to achieve this part, to translate the spoken words into text. The process is also known as “automatic speech recognition(ASR)”, or “speech to text(STT)”. Speech recognition applications include voice user interfaces such as voice dialling, call routing, domotic appliance control, search, simple data entry, preparation of structured documents, speech-to-text processing, and aircraft. The performance of speech recognition systems is usually evaluated in terms of accuracy and speed. Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR). Both acoustic modeling and language modeling are important parts of modern statistically-based speech recognition algorithms. Hidden Markov models (HMMs) are widely used in many systems. Language modeling is also used in many other natural language processing applications such as document classification or statistical machine translation. Basic concept of Speech: Speech is a complex phenomenon. The naive perception is often that speech is built with words, and each word consists of phones. Speech is a dynamic process without clearly distinguished parts. That means that there are no certain boundaries between units, or between words. Speech to text translation and other applications of speech are never 100% correct. Speech is a continuous audio stream where rather stable states mix with dynamically changed states. Words are understood to be built of phones, but this is certainly not true. The acoustic properties of a waveform corresponding to a phone can vary greatly depending on many factors - phone context, speaker, style of speech and so on. The so called coarticulation makes phones sound very different from their “canonical” representation. Next, since transitions between words are more informative than stable regions, developers often talk about diphones - parts of phones between two consecutive phones. Sometimes developers talk about subphonetic units - different substates of a phone. Our Project: We choose a application of a restaurant’s menu ordering system at Android. Except the interface of touch, also the voice user interface, we can use the voice to order the menu and their amounts. And also a management system is provided to manage the database of menu and the result of the orders. Login interface Login database Ordering system Management system Change menu Add Change Delete Touch user interface Voice user interface Receive Order Check order Order database Menu database References: http://cmusphinx.sourceforge.net/ http://cmusphinx.sourceforge.net/ http://en.wikipedia.org/wiki/User_interface#Types http://en.wikipedia.org/wiki/Voice_user_interface http://blog.csdn.net/zouxy09/article/details/7941585 http://en.wikipedia.org/wiki/Speech_recognition#Algorithms http://en.wikipedia.org/wiki/Speech_perception
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.