Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Lecturer: Jim Warren Lecture 15 Based on Chapter 13: Speech and Hearing.

Slides:

Advertisements

Similar presentations

Chapter 5-Sound.

Advertisements

SWE 423: Multimedia Systems Chapter 3: Audio Technology (2)

Some from Heim Chap 13 Prepared By Beryl Plimmer 2011 Sound The sound files are not linked – most should work on windows 7 machines.

Discovering Computers: Chapter 1

Class 6 LBSC 690 Information Technology Human Computer Interaction and Usability.

Interactive Toy Characters as Interfaces for Children Erik Strommen – Interactive Toy Group Microsoft Corp. Reviewed By Jason Burke.

User Interface Design: Methods of Interaction. Accepted design principles Interface design needs to consider the following issues: 1. Visual clarity 2.

CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta.

User interface design Designing effective interfaces for software systems Objectives To suggest some general design principles for user interface design.

Music Processing Roger B. Dannenberg. Overview  Music Representation  MIDI and Synthesizers  Synthesis Techniques  Music Understanding.

Auditory User Interfaces

Input Devices Image Capture Devices, Sound Capture Devices, Remote Controls PREPARED & PRESENTED BY: FAHAD AHMAD KHAN.

Chapter 14 Recording and Editing Sound. Getting Started FAQs: − How does audio capability enhance my PC? − How does your PC record, store, and play digital.

Object Orientated Data Topic 5: Multimedia Technology.

The Internet & The World Wide Web Notes

Skill Area 212 Introduction to Multimedia Internet and MultiMedia for SC 2.

Digital Sound and Video Chapter 10, Exploring the Digital Domain.

Introduction to Interactive Media 10: Audio in Interactive Digital Media.

Some from Heim Chap 13 Sound The sound files are not included– most should work on windows 7 machines you tube.

MIDI. A protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other. Instead of storing actual.

MIDI and YOU Orchestra in a Box. What is MIDI?  Musical Instrument Digital Interface  MIDI is a protocol - a standard by which two electronic instruments.

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

What’s in a Name? ICT for Students with Special Needs.

Song Pro Retro Alex Harper. Contents of Presentation Inspiration Basic Concept Speaker Module.sng file structure Song Pro Retro: Light Song Pro Retro:

Some from Heim Chap 13 Sound you tube Learning outcomes Describe the basics of human hearing Explain the difference between visual and auditory interaction.

Contactforum: Digitale bibliotheken voor muziek. 3/6/2005 Real music libraries in the virtual future: for an integrated view of music and music information.

Input Devices.  Identify audio and video input devices  List the function of the respective devices.

Audio and Video CGS Some Common Audio Formats Format Use Extension MIDI instrumental music.mid MPEG songs.mp3 RealAudio live broadcasts.ra Wave.

Speech User Interfaces Katherine Everitt CSE 490 JL Section Wednesday, Oct 27.

11.10 Human Computer Interface www. ICT-Teacher.com.

Graphical User Interfaces

The ID process Identifying needs and establishing requirements Developing alternative designs that meet those requirements Building interactive versions.

CSCD 487/587 Human Computer Interface Winter 2013 Lecture 3 HCI and Interactive Design.

CMPD 434 MULTIMEDIA AUTHORING Chapter 06 Multimedia Authoring Process IV.

An Overview 1 Pamela Harrod, DMS 546/446 Presentation, March 17, 2008.

Multimedia Elements: Sound, Animation, and Video.

MULTIMEDIA DEFINITION OF MULTIMEDIA

Object Orientated Data Topic 5: Multimedia Technology.

COMPSCI 345 S1 C and SoftEng 350 S1 C Interaction Styles Lecture 4 Chapter 2.3 (Heim)

Multimedia Technology and Applications Chapter 2. Digital Audio

Screen design Week - 7. Emphasis in Human-Computer Interaction Usability in Software Engineering Usability in Software Engineering User Interface User.

AVI/Psych 358/IE 340: Human Factors Interfaces and Interaction September 22, 2008.

Chapter 15 Recording and Editing Sound. 2Practical PC 5 th Edition Chapter 15 Getting Started In this Chapter, you will learn: − How sound capability.

Sound element Week - 11.

Multimedia ITGS. Multimedia Multimedia: Documents that contain information in more than one form: Text Sound Images Video Hypertext: A document or set.

Accessibility IS 101Y/CMSC 101Y November 19, 2013 Carolyn Seaman University of Maryland Baltimore County.

Audio / Sound INTRODUCTION TO MULTIMEDIA SYSTEMS Lect. No 3: AUDIO TECHNOLOGY.

MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.

INTERACTION STYLES (continued) INTERACTION DESIGN PROCESS Textbook: S. Heim, The Resonant Interface: HCI Foundations for Interaction Design [Chapter 2]

1 Human Computer Interaction Week 5 Interaction Devices and Input-Output.

Theme: Multimedia Sound ProductionUFCFY Multimedia Sound Production.

School of Informatics CG087 Time-based Multimedia Assets Auditory displayDr Paul Vickers1 Sound as a communication medium Auditory display, sonification,

MPEG-7 Audio Overview Ichiro Fujinaga MUMT 611 McGill University.

Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.

CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.

Glencoe Introduction to Multimedia Chapter 8 Audio 1 Section 8.1 Audio in Multimedia Audio plays many roles in multimedia. Effective use in multimedia.

Natural Language and Speech (parts of Chapters 8 & 9)

Digital Audio I. Acknowledgement Some part of this lecture note has been taken from multimedia course made by Asst.Prof.Dr. William Bares and from Paul.

Systems and User Interface Software. Types of Operating System  Single User  Multi User  Multi-tasking  Batch Processing  Interactive  Real Time.

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

Garage Band For MAC. What is it? A digital audio workstation that can record and play back multiple tracks of audio. Is a software application for OS.

Decision Support and Business Intelligence Systems (9 th Ed., Prentice Hall) Chapter 12: Artificial Intelligence and Expert Systems.

XP Practical PC, 3e Chapter 14 1 Recording and Editing Sound.

Earcons, Auditory Icons and spearcons

Chapter 15 Recording and Editing Sound

11.10 Human Computer Interface

Human Computer Interaction Lecture 20 Universal Design

CEN3722 Human Computer Interaction Advanced Interfaces

Assist. Lecturer Safeen H. Rasool Collage of SCIENCE IT Dept.

Presentation transcript:

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Lecturer: Jim Warren Lecture 15 Based on Chapter 13: Speech and Hearing

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-2 Chapter 13 Speech and Hearing The Human Perceptual System –Hearing, speech, non-speech Using Sound in Interaction Design Technical Issues Concerning Sound

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-3 The Human Perceptual System Hearing Our ears tell our eyes where to look People become habituated to continuous sounds

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-4 The Human Perceptual System Hearing –We can respond to audio input more quickly than we can to visual stimuli –We have the ability quickly to locate the source of a sound Interaural time difference (ITD) Interaural intensive difference (IID)

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-5 The Human Perceptual System Hearing –Sound plays a vital role in our sense of connectivity to our environment (Auditory Presence) –Immersive and realistic virtual auditory environments require high quality sound To create a more realistic virtual auditory environment, measurements must also be taken of the auditory signals close to the user’s eardrum, yielding Head-related transfer functions (HRTFs)

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-6 The Human Perceptual System Speech –Speech is a significant part of our interaction with the world Advantages of Speech –People gravitate to verbal modes of communication –It is easier to speak than to write Disadvantages of Speech –It requires the knowledge of a language –It is more efficient to read than to listen

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-7 The Human Perceptual System Speech We can speak faster than we can write We can read faster than we can listen The most efficient method of communication depends on the context

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-8 The Human Perceptual System Nonspeech Sound –We monitor our nonspeech auditory environment habitually and, to some degree, unconsciously Advantages of Nonspeech Sound –It informs us about the success of our actions –It can be processed more quickly than speech –It does not depend on the knowledge of a language

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-9 The Human Perceptual System Disadvantages of Nonspeech Sound –It can be ambiguous –It must be learned –It must be familiar –It does not have high discrimination –It is transitory –It can become annoying

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-10 The Human Perceptual System We often judge the success of an action by auditory feedback Auditory stimuli are transitory Sound can be annoying or inappropriate

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-11 Role of sound in interaction design Using Sound in Interaction Design –Redundant Coding Reinforces visual feedback on actions –Positive/Negative Feedback Maybe we should use sound to confirm success more, instead of always using it to indicate failure –Speech Applications E.g., voice tags annotation to a text document –Nonspeech Applications E.g., ‘auditory icons’

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-12 Using Sound in Interaction Design Auditory Icons - Gaver applied the concept of ecological listening to the computer interface –Recordings of everyday sounds –Exploited analogies with real-world objects and events File types related to different materials File size related to volume or pitch SonicFinder –A redundant auditory layer that reinforced essential feedback about tasks

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-13 Using Sound in Interaction Design Benefits of Auditory Icons –Disperses some of the cognitive processing over multiple channels –Allow users to interact simultaneously with screen objects and with objects beyond the view of the screen Or on other users’ screens in a collaborative computing environment

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-14 Using Sound in Interaction Design Concerns for Auditory Icons –Learnability of the mapping between the icon and the object represented “Oink” and “bow wow” have high articulatory directness A swishing sound accompanying a paintbrush tool also has high articulatory directness A system beep carries no information about the error it represents

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-15 Using Sound in Interaction Design Auditory Icons – Formal Guidelines (Mynatt) –Identifiability—The user must be able to recognize the sound’s source. Familiar sounds will be more easily recognized and remembered. –Conceptual Mapping—How well does the sound map to the aspect of the user interface represented by the auditory icon? –Physical Parameters—The physical parameters of the sound, such as length, intensity, sound quality, and frequency range, can affect its usability. No one parameter should be allowed to dominate; the user may infer significance.

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-16 Using Sound in Interaction Design Auditory Icons – Formal Guidelines (Mynatt) –User Preference—How the user responds emotionally to the auditory icon is also important. Is the sound harsh or too cute? –Cohesion—The auditory icons used in an interface must also be evaluated as a cohesive set. For example, each auditory icon must be relatively unique. They should not sound too similar to each other.

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-17 Using Sound in Interaction Design Auditory Icons – Procedural Guidelines (Mynatt) –Use sounds that are: Short Of wide frequency range Equal in length, intensity, and sound quality –Use free-form questions to determine how easy it is to identify the sounds –If it is not easy to identify the sounds, evaluate how easy it is to learn them

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-18 Using Sound in Interaction Design Earcons (Blattner, Sumikawa, and Greenberg,1989) –“Nonverbal audio messages used in the user– computer interface to provide information to the user about some computer object, operation, or interaction” –Short musical phrases that represent system objects or processes –Involve musical listening

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-19 Using Sound in Interaction Design Earcons (Blattner, Sumikawa, and Greenberg,1989) –Earcons can be used to: Reinforce icon family relationships Support menu hierarchies Support navigational structures

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-20 Using Sound in Interaction Design Hierarchical Earcons –Each node in a hierarchy inherits the attributes of the previous level Hierarchical structureHierarchical earcons

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-21 Using Sound in Interaction Design Earcons versus Auditory Icons –Earcons and auditory icons need not be mutually exclusive –Consider the entire structure of an interface, and design its auditory layer with a consistent sound ecology

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-22 Using Sound in Interaction Design Globalization-Localization –Both concrete (real-world) and abstract (Musical) sounds involve cultural biases Musical sounds are culturally biased

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-23 Technical issues Technical Issues Concerning Sound –Sound Waves –Computer-Generated Sound –Speech Recognition

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-24 Technical Issues Concerning Sound Sound Waves –Sound is made up of waves and can be described in terms of frequency (pitch) and amplitude (volume, power, decibels [dB]), as well as waveform (timbre) –The human ear can perceive sound in the range of 20 to 20,000 Hz (20 kHz) With age and noise exposure, though, we tend to lose the upper range ‘oh’ ‘trumpet’‘beep’ ‘buzz’

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-25 Technical Issues Concerning Sound Computer-Generated Sound –Synthesis –Sampling –MIDI –Speech Generation –Speech Recognition

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-26 Technical Issues Concerning Sound Synthesis –Digital signal generators use software to create sound waves –Once the wave is generated, it can be processed to produce an almost unlimited range of sounds –Frequency modulation (FM) synthesis The frequency of one sound wave (the modulator) affects the parameters of a second wave (the carrier) We can also apply filters, e.g., to allow only part of the frequency range However, it is difficult convincingly to imitate acoustic instruments

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-27 Technical Issues Concerning Sound Sampling –High-fidelity sounds can be obtained by using digital samples of actual instruments –A sample is basically a snapshot of a sound wave at a certain point in time that captures its amplitude information The wave must be sampled at twice the rate of its highest frequency (Nyquist-Shannon sampling theorem) CDs are sampled at a rate of 44.1 kHz, slightly greater than twice the human threshold of 20 kHz

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-28 Technical Issues Concerning Sound MIDI (Musical Instrument Digital Interface) –MIDI files are analogous to the piano roll on a player piano –MIDI file contains information about pitch, duration, and intensity –MIDI files contain no timbre information –Small file sizes –Depend on the sounds embedded in the target device

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-29 Technical Issues Concerning Sound Speech Generation –Computers can generate synthetic speech A significant benefit to people with visual handicaps –Applications that convert text to verbal output are called “text to speech” (TTS) systems

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-30 Technical Issues Concerning Sound Speech Generation –TTS systems have been used for: Information access systems that facilitate remote access to databases Transactional systems that process customer orders Global positioning system–based mobile navigation systems that output driving directions Augmentative systems that aid disabled users –The quality of speech generation is getting better (and computers are increasingly easily able to store digitisation of human speech) so speech emanating from computers is increasingly available for UI designers

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Speech Synthesis Markup Language Designed by W3C to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications Take a deep breath then continue. Press 1 or wait for the tone. I didn't hear you! Please repeat. Tags for controlling pitch, pace, emphasis, etc. –See

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-32 Technical Issues Concerning Sound Speech Recognition –Two distinct applications: Transcription Transaction –Automatic speech recognition (ASR) systems allow users to speak in real time and this input is converted into text that is displayed on the screen –E.g., Dragon Systems’ NaturallySpeaking® and IBM’s Via Voice®

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Speech Recognition Full vocabulary speech recognition is still hard –Requires system to be trained to user –Requires user to speak distinctly –Best with optimal mic placement Limited vocabulary is easier –E.g., for limited-domain commands –Can understand unrestricted speakers for simple domains like digits 1-33

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-34 Technical Issues Concerning Sound Speech Recognition Concerns Speech can interfere with problem-solving activities Verbal input can be inappropriate in certain situations

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Speech and Human-to-Human interaction Speech is natural and easy for communication between humans if the function of the software is to support collaboration –Synchronous – Second Life, some network action multiplayer video games –Asynchronous – ‘voice mail’ or voice annotations Problem – can’t search in future like text 1-35

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-36 Technical Issues Concerning Sound Searching Speech –Speech files do not afford easy opportunities for indexing and searching –ASR systems can be used to transcribe speech files and create transcripts that can be searched like any other text file But this is highly error prone – interactive correction, as well as individual training, are key aspects of successful ASR

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-37 Technical Issues Concerning Sound Multimedia Indexing –Large collections of multimedia documents are being created in domains as diverse as medicine, entertainment, and education –Archiving speech according to content can be difficult: The system must not only recognize the meaning of spoken language, it also must create relationships according to content

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1-38 Technical Issues Concerning Sound Multimedia has become a common element in contemporary computing environments, however, we have only begun to understand how to take advantage of its potential

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Summary Hearing is an important human sensory channel providing context and feedback Non-speech sounds can be used like icons to create an auditory language for HCI Speech processing – generation and recognition – is just beginning to enrich our user interfaces 1-39