ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen.

Slides:



Advertisements
Similar presentations
Technology and Instruction Why is Technology Important? The Complexity of Adaptive Instruction Existing and Promising Technologies for Promoting Literacy.
Advertisements

Map of Human Computer Interaction
Blue Eye T E C H N O L G Y.
Rob Marchand Genesys Telecommunications
Tanmoy Bhattacharya Coordinator Equal Opportunity Cell University of Delhi ICT for PwDs: with Special Reference to Indian Sign Language.
ISTD 2003, Interactive Systems Technical Design Seminar work: Matti Meikäläinen Maija Meikäläinen Seppo Suomalainen.
Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.
Integrating Educational Technology into the Curriculum
Ying Wang EDN 303 Fall Objectives Define curriculum-specific learning Explain the difference between computer, information, and integration literacy.
Objectives Overview Differentiate among laptops, tablets, and servers Describe the purpose and uses of smartphones, digital cameras, portable media players,
Objectives Overview Differentiate among laptops, tablets, and servers Describe the purpose and uses of smartphones, digital cameras, portable media players,
MIT Project Oxygen Pervasive Human-Centered Computing Farrukh Shakil CS575 06/03/06.
MIT Project Oxygen. Vision  “…computation has centered about machines, not people”.  “In the future, computation will be human-centered. It will be.
Component-Based Software Engineering Oxygen Paul Krause.
Discovering Computers: Chapter 1
Lets Talk 9+ Emulator e-Tech for Tots CS590 - Ashok Sahu.
User-System Interaction a challenge for the present and the future Prof. dr. Matthias Rauterberg IPO Center for User-System Interaction Eindhoven University.
Ambient Computational Environments Sprint Research Symposium March 8-9, 2000 Professor Gary J. Minden The University of Kansas Electrical Engineering and.
Stanford hci group / cs376 research topics in human-computer interaction Multimodal Interfaces Scott Klemmer 15 November 2005.
Interactive Systems Technical Design
ISTD 2003, Thoughts and Emotions Interactive Systems Technical Design Seminar work: Thoughts & Emotions Saija Gronroos Mika Rautanen Juha Sunnari.
TAUCHI – Tampere Unit for Computer-Human Interaction Tampere Unit for Human-Computer Interaction University of Tampere Markku Turunen MUMIN workshop, Helsinki,
Smart Space & Oxygen CIS 640 Project By Usa Sammpun
Speech User Interfaces
Smart Home Technologies CSE 4392 / CSE 5392 Spring 2006 Manfred Huber
Find The Better Way Expand Your Voice with VXML May 10 th, 2005.
Assistive Technology Ability to be free. Quick Facts  Assistive technology is technology used by individuals with disabilities in order to perform functions.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
AS ICT.  A portable communication device is a pocket sized device that is carried around by an individual  They typically have a display screen with.
Computer for Health Sciences
Introduction to Multimedia. The beginning ( History )… 1945 : “…a device in which one stores all his books, records and communications, and which is mechanized.
Speech User Interfaces Katherine Everitt CSE 490 JL Section Wednesday, Oct 27.
11.10 Human Computer Interface www. ICT-Teacher.com.
CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus Introduction.
Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.
A context-aware communication system Natalia Marmasse advisor: Chris Schmandt Speech Interface Group MIT Media Lab.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Human-Computer Interaction
Microsoft Assistive Technology Products Brought to you by... Jill Hartman.
CONTENTS INTRODUCTION TO A.I. WORKING OF A.I. APPLICATIONS OF A.I. CONCLUSIONS ON A.I.
Foundation year Lec.3: Computer SoftwareLec.3: Computer Software Lecturer: Dalia Mirghani Year: 2014/2015.
Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.
KAMI KITT ASSISTIVE TECHNOLOGY Chapter 7 Human/ Assistive Technology Interface.
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.
Research Paper: Utilizing Technology for Students with Learning Disabilities Alissa Swartz EDUC 504, Computers and Technology in Education June 19, 2006.
Copyright John Wiley & Sons, Inc. Chapter 3 – Interactive Technologies HCI: Developing Effective Organizational Information Systems Dov Te’eni Jane.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Robotic Assistance. The PROBLEM Providing assistance for the Blind –What do we mean by “Blind?” Stereotypical blindness Visually impaired What assistance.
HP Network and Service Provider Business Unit Sebastiano Tevarotto February 2003.
NCP meeting Jan 27-28, 2003, Brussels Colette Maloney Interfaces, Knowledge and Content technologies, Applications & Information Market DG INFSO Multimodal.
Multimodal and Natural computer interaction Evelina Stanevičienė.
Presented By Sharmin Sirajudeen S7 CS Reg No :
MULTIMODAL AND NATURAL COMPUTER INTERACTION Domas Jonaitis.
Perceptive Computing Democracy Communism Architecture The Steam Engine WheelFire Zero Domestication Iron Ships Electricity The Vacuum tube E=mc 2 The.
Fundamentals of Information Systems, Sixth Edition
Human Computer Interaction (HCI)
11.10 Human Computer Interface
SEMINAR ON PROJECT OXYGEN Presented By: VIJET R HEGDE 3VC05IS056.
Evaluation of a multimodal Virtual Personal Assistant Glória Branco
a context-aware communication system
Pervasive Computing Happening?
Multimodal Human-Computer Interaction New Interaction Techniques 22. 1
A HCL Proprietary Utility for
universal design (web accessibility)
Map of Human Computer Interaction
Human Computer Interaction Lecture 19 Universal Design
Evaluation of a multimodal Virtual Personal Assistant Glória Branco
Presentation transcript:

ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen

ISTD 2003, Audio / Speech Introduction When gathering information about surrounding environment, hearing is one basic sense for humans. Therefore, usage of audio and speech as an alternative input and output method can effort a lot to a user experience in interactive systems and make it more natural.

ISTD 2003, Audio / Speech Motivation Building interactive systems, user interface should behave according to the expectations of the user experiences of the real world. Generally, user interfaces today are mainly based on keyboard and screen. Feedback from system is given basically only in visual form. In computer-based systems, much better user experience can be achieved by offering information using also other basic senses such as hearing, sense of taste, touch and smell.

ISTD 2003, Audio / Speech Implementation Basically two components: Audio playback and speech/audio recognition. Design issues: Audio can be speech / non-speech To whom are you designing for? Different users – different abilities Blind, old and disabled people Human diversity – physical, perceptual, cultural and intellectual differences Mobile computing Limited input, limited output, slow processor, small memory, limited battery life, slow network connection Communication protocol Speech recognition causes major problems Accuracy Usage in critical systems?

ISTD 2003, Audio / Speech Applications MIT Media Lab – Nomadic Radio: Wearable Audio Computing A client-server based messaging infrastructure utilizes spatialized audio, speech synthesis and recognition hourly news broadcasts, voice mail, , calendar reminders, weather forecasts, stock reports are delivered HP Labs – SpeechBot a search engine for audio & video content that is hosted and played from other websites using speech recognition

ISTD 2003, Audio / Speech Nomadic Radio Network Architecture

ISTD 2003, Audio / Speech

Strengths / Advantages Data input possible without keyboard. Mobile devices Excellent for hands/eyes busy – situation.

ISTD 2003, Audio / Speech Strengths / Advantages People with visual or other disabilities Natural way for humans to interface with the environment Increase the bandwidth of communication Devices with limited screen – need for additional output method Technology available now

ISTD 2003, Audio / Speech Limitations / Weaknesses Input is error prone especially in noisy environments Vocabulary size in recognition - Controlling objects and things is limited Communication protocol needed “Computer! Shut down the lights!” Can lead to unnatural experience How to tell user what communication protocol is like: Explicit – tell exactly what to say (“Welcome to library, say “XXX” to...”) Implicit – open ended, potential for errors (“Welcome to library, what would you like to do….”).

ISTD 2003, Audio / Speech Limitations / Weaknesses Speech output sounds unnatural Asymmetrical speech input is faster than typing speech output is slower than reading Feedback & latency User needs to know if recognition was successful Is system processing data or waiting input? Time taken to recognise utterance Pauses

ISTD 2003, Audio / Speech Selected Industrial Players IBM Conversational Biometrics Combines multiple verification sources such as voice biometrics with spoken knowledge. Embedded ViaVoice IBM speech technology to mobile devices Command and control (C&C) Text-to-Speech (TTS) Sony SDR-4X Prototype of entertainment robot using multi-modal human interaction technology Individual person detection by the tone of voice Continuos speech recognition and unknown vocabulary acquisition Speech synthesis and singing voice production

ISTD 2003, Audio / Speech SDR-4X

ISTD 2003, Audio / Speech Selected International Research Groups and Projects The MBROLA Project Develops speech engine which synthesizes written text for many different languages Speech Engine core freely available! Stanford University – Interactive Workspaces Goal is to create interactive space where you can work collaboratively using natural gestures Speech Interface Group, MIT Media Laboratory Major player, numerous projects Example: Nomadic Radio: Wearable Audio Computing

ISTD 2003, Audio / Speech Selected International Research Groups and Projects MIT, PROJECT OXYGEN Pervasive, human-centered computing Integrated software system that will reside in the public domain Speech and vision, provide the main modes of interaction in Oxygen. Multilingual systems support dialog among participants speaking different languages. The SpeechBuilder utility supports development of spoken interfaces.

ISTD 2003, Audio / Speech

Selected Finnish Research Groups and Projects VTT, Interactive Intelligent Electronics (IIE) User interface technologies for future home environments, The Smart-Its Project, Beyond the GUI, … Helsinki University of Technology, Neural Network Research Centre Adaptive Natural Language Processing Tampere University of Technology, Speech-based and Pervasive Interaction Group USIX-Interact, Dumas, Mobile User Interfaces, …

ISTD 2003, Audio / Speech Companies and Research Groups in Oulu MediaTeam Oulu, Language and Audio Technology CBIR – Content Based Information Retrieval Filling of the Semantic Gap in the Retrieval of Audio and Video Recordings Multiparametric prosodic analysis of phonetic and phonological correlates of emotions Vikings

ISTD 2003, Audio / Speech Future Developments Multimodality Multilingual, natural speech interaction Emotional state Biometrics