Voice Recognition Lawrence Pan Syen Hassan Jamme Tan.

Slides:



Advertisements
Similar presentations
VoiceXML: A Field Evaluation By: Kristy Bradnum Supervisor: Peter Clayton Presented in partial fulfilment of the CS Honours Project.
Advertisements

Collaborative Customer Relationship Management (CCRM) User Group June 23 rd, 2004.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
“Time to live” New levels of Retail Customer convenience Petr SMIDA, CEO US Russia Business Council Annual Meeting October 3-4, 2005.
Mobile Commerce and Ubiquitous Computing
The History of Computers By: Casey Walsh. Introduction Computer history can be broken down into five generations of change. Computer history can be broken.
Your Interactive Guide to the Digital World Discovering Computers 2012.
Biometrics: Voice Recognition
Dragon Naturally Speaking Tutorial What is Dragon Naturally Speaking? Dragon is a dictation software, students can dictate a paper rather than type it.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
The Q Group PLC : Handy Quartet : Quartet Online Quartet as Blended Educational System WELCOME !
Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
1 “ Speech ” EMPOWERED COMPUTING Greenfield Business Centre, 20 th September, 2006.
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 21 – LOCATION-BASED SERVICES SEAN J. TAYLOR.
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
Speaking to Computers Alex Acero Manager, Speech Research Group Microsoft Research Feb 14 th 2003.
Conversational Applications Workshop Introduction Jim Larson.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Modes of Communication © 2015 albert-learning.com.
Creating Speaking Web Pages: The Text-to-Speech Integrated Development Environment (TTS-IDE) David C. Gibbs Department of Mathematics and Computing University.
PHILIPS SPEECH PROCESSING Voic Association Vienna, Reimund Schmald Regional Sales Director GSM
Module 3: Business Information Systems Chapter 8: Electronic and Mobile Commerce.
Integrating VoiceXML with SIP services
Introduction to IT Presented by: Ishan Agarwal ABV-IIITM, Gwalior.
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.
The Digital Revolution and The Global E-Marketplace Chapter 25 Matakuliah: J0474 International Marketing Tahun: 2009.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Voice User Interface
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
 LAN ◦ A LAN (Local Area Network) is a system whereby individual PCs are connected together within a company or organization.  WAN ◦ A WAN (Wide Area.
Chapter -08 Process technology. PROCESS TECHNOLOGY In general process technologies are devices or machines that we use every day in operations. Two key.
Listener Controlled Navigation of VoiceXML Documents Gopal Gupta N. Annamalai, H. Reddy Dept. of Computer Science UT Dallas.
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003.
Presentation Title 1 1/27/2016 Lucent Technologies - Proprietary Voice Interface On Wireless Applications Protocol A PDA Implementation Sherif Abdou Qiru.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
SoarTech Proprietary Automatic Speech Recognition in Training Systems: An Introduction Presenter: Brian Stensrud, Ph.D. 21 Jan 2016 PAO Approval: 15-ORL
Language Technologies Capability Demonstration Alon Lavie, Lori Levin, Alex Waibel Language Technologies Institute Carnegie Mellon University CATANAL Planning.
E-Commerce & M-Commerce. Introduction Electronic commerce, commonly known as e- commerce, It is a type of industry where buying and selling of product.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Electronic Commerce Semester 1 Term 1 Lecture 7. Introduction to the Web The Internet supports a variety of important tools, such as file transfer, electronic.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better than web.
History of computers By tynan roach Apple  Apple Computer, Inc., is a multinational corporation that creates consumer electronics, personal computers,
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
A seminar by Ramesh Kumar Raju S CSSE 07121A1547.
Natural Language Processing and Speech Enabled Applications
Automatic Speech Recognition
E-Commerce Lecture 8.
Communications Systems
Why Study Spoken Language?
Speech Recognition Application
Database Driven Websites
Developing Innovative Unified Communications Applications
Mobile Commerce and the Internet of Things
Dialog Design 4 Speech & Natural Language
PhoNET Voice based web access ASWIN.P S3 EC ROLL : 24.
Why Study Spoken Language?
11/23/2018 8:30 AM BRK3037 BRK3037: Dive deep on building apps and services with the Office 365 Communications Platform David Newman Senior Program Manager.
Mobile Commerce and Ubiquitous Computing
Principles/Paradigms Of Pervasive Computing
Mobile Commerce and Ubiquitous Computing
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Voice Recognition Lawrence Pan Syen Hassan Jamme Tan

Overview History of voice recognition Why voice recognition? Technology behind voice recognition  Five major steps Common applications Current leaders  Demonstrations  Product Evaluation Implementation of our own voice recognition system  Grade retrieval system for EE3414 Future Challenges

History of Voice Recognition Radio Rex (house trained dog), 1922 U.S Department of Defense, 1940’s  Speech Understanding Research (SUR) program Carnegie Mellon University & MIT  Automatic interception & translation of Russian radio transmissions (FAILURE) Original message: “the spirit is willing but the flesh is weak” Translated message: “the vodka is strong but the meat is disgusting.”

History Cont’d First major achievements  Bell Laboratories, 1952 Successful recognition of numbers 0 to 9, spoken over telephone  MIT, 1959 Successful recognition of vowels with 93% accuracy  Carnegie Mellon University, 1970’s HARPY system: capable of recognizing complete sentences

History Cont’d Obstacles  Computing power: over 50 computers needed for HARPY system to perform  Ability to recognize speech from any person Taking in account different accents, speech tones, etc.  Ability to recognize continuous speech so…we…do…not…have…to…speak…like…this! Commercialization of voice recognition systems

History Cont’d Computation required and computation available in available processors over time Accuracy and task complexity progress over time

Why Voice Recognition? Convenience  Natural user interface: human speech  Improved services for the disabled  Wider range of users Future possibilities and improvements  Internet use over phones through voice portals  Advanced applications implementing voice control in all areas

Technology behind Voice Recognition Five major steps used by speech recognizer

Five major steps in voice recognition Capture and Digitalization  System interacts with the telephony device to capture voice input at 8000 samples/sec Spectral Representation  Voice samples converted to graphical representation Segmentation  Speech signals are broken down into segmented parts.  Improves accuracy  Reduces computation: impossible to process entire signal in real time

Graphical Representations

Acoustic Model Phonemes – smallest phonetic unit in a language  Creates distinction between other words e.g. b in boy and t in toy Allophone – different pronunciations of a phoneme/letter  E.g. t in tab, t in stab, tt in stutter Database (Lexicon) of all words known to the system for a language  Should contain several recordings for certain words E.g. “the” can be pronounced “duh” or “dee”

Acoustic Model Cont’d Trelliss  Data structure made up of all possible combinations of allophones Training of Acoustic models  For single-user systems Text is read by user and recognized by system  For multi-user systems Utterances spoken by many users compiled into a database, then inputted into a recognizer Weights are put on certain allophones

Language Model Languages have structures (i.e. grammar)  Difference between two words can be difficult to understand  Can be distinguished using context E.g. “ours” and “hours” can be determined if previous word is “two”

Common Applications Call Center Automation  Widely used in all industries (consumer interface) Airline companies: booking flights, general info, etc. Banking companies: “pay by phone”, account balances, etc. Delivery Services (FedEx): tracking orders, etc. All general customer service systems Computer Integration of voice recognition  Personal Computers Speech to Text Dictation Accessibility purposes: voice control of computers

Common Applications cont’d Integrated into automobiles:  Visteon Voice Technology™ used in Infiniti Q45  Controls: Climate CD player Navigation system

Competing Standards VoiceXML (extensible markup language)  Partners: AT&T, IBM, Motorola, Lucent Tech.  Used in implementation of most voice portals  Shifting target toward web developers SALT (Speech Application Language Tags)  Partners: Microsoft, Intel, Cisco, SpeechWorks  Targeted toward web developers

Current Leaders Dragon Systems:  Naturally Speaking: P C based user side programs for Automated speech recognition (ASR)  Automotive, Telephony, Mobile, Games, Embedded Chips SpeechWorks: Connects users to industry voice portals  AOLByPhone, FedEx, E*Trade, etc. BeVocal: provides voice portals for Bell South, etc. TellMe: provides voice portals for AT&T, Merrill Lynch, etc. Philips Speech Recognition  Services automotive, mobile device, and consumer electronic industries IBM Via Voice, MS Agent

Demonstrations SpeechWorks TM product line  United Airlines' toll free flight information line (demo) United Airlines' toll free flight information line (demo)  BankWorks Automated Bill Payment (demo) BankWorks Automated Bill Payment (demo)  FedEx Rate Finder (demo) FedEx Rate Finder (demo)  E*Trade Stock (demo) E*Trade Stock (demo)  AOLbyPhone service (demo) AOLbyPhone service (demo) BeVocal solutions

Magical Merlin’s Grade Retrieval System Designed in Visual Basic using Microsoft’s MSAgent MenuRecognized voice commands First ExamFirst Exam, First Test, First Midterm Second Exam Second Exam, Second Test, Second Midterm Quiz GradesQuiz Grades, Grade on Quizzes Homework GradesHomework Grades, Grade on Homework Project GradeProject Grade, Grade on Project Final GradeFinal Grade, Grade for course Main MenuMain menu, Main, Class Click on my belly for a short demonstration

Future Challenges Speech Technology VoiceXML vs. SALT Voice enabling web content Real time access to source data  Stock market, traffic, sports, etc. Clear connection needed for effective use of voice portals Security Issues involved Advertising based revenue

References