CS 445/656 Computer & New Media

Slides:



Advertisements
Similar presentations
SEMINAR ON VIRTUAL REALITY 25-Mar-17
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Spatial Perception of Audio J. D. (jj) Johnston Neural Audio Corporation.
SWE 423: Multimedia Systems Chapter 3: Audio Technology (2)
Multimedia Interfaces What is a multimedia interface – Most anything where users do not just interact with text – E.g., audio, speech, images, faces, video,
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Interactions of Waves Chapter 11.3.
The Energy of Waves Physical Science.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
The Beatbox Voice-to-Drum Synthesizer A BSTRACT The Beatbox is a real time voice-to-drum synthesizer intended primarily for the entertainment of small.
Stockman MSU Fall Computing Motion from Images Chapter 9 of S&S plus otherwork.
Using Sound in Games Alex Baumann Outline 3D Spatialization Getting and Editing Sounds Using Sounds in Games Music in Games Example Videos.
A PRESENTATION BY SHAMALEE DESHPANDE
Fast (finite) Fourier Transforms (FFTs) Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com December 5,
Representing Acoustic Information
Audio Retrieval David Kauchak cs458 Fall Administrative Assignment 4 Two parts Midterm Average:52.8 Median:52 High:57 In-class “quiz”: 11/13.
Topics for Today General Audio Speech Music Music management support.
Waves and Sound. Mechanical Waves Waves are created by an energy source making a vibration that moves through a medium. Mechanical waves are disturbances.
Waves. What are waves? Wave: a disturbance that transfers energy from place to place. (Energy from a wave of water can lift a boat.) Medium: –the state.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Go to section Interest Grabber Vibrations A wave is a vibration that carries energy from one place to another. But not all vibrations are waves. Hold a.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Complex Variables & Transforms 232 Presentation No.1 Fourier Series & Transforms Group A Uzair Akbar Hamza Saeed Khan Muhammad Hammad Saad Mahmood Asim.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
GENESIS OF VIRTUAL REALITY  The term ‘Virtual reality’ (VR) was initially coined by Jaron Lanier, founder of VPL Research (1989)..
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Research Projects 6v81 Multimedia Database Yohan Jin, T.A.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Assistive Technology in the Classroom Setting Rebecca Puckett CAE6100 – GQ1 (24494) Dec. 7, 2009.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Soundscapes James Martin. Overview Problem Statement Proposed Solution Solution Created (Modules, Model, Pics) Testing Looking Back See It in Action Q&A.
Autonomous Robots Vision © Manfred Huber 2014.
Waves behave in predictable ways.
CSCI-100 Introduction to Computing Hardware Part II.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Jeopardy Heading1Heading2Heading3Heading4 Heading5 Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final Jeopardy.
Virtual Reality Prepared By Name - Abhilash Mund Regd.No Branch - Comp.sc & engg.
Chapter 8 Sound FX Composition. Chapter 8 Sound FX Composition.
CS 445/656 Computer & New Media
- photometric aspects of image formation gray level images
CS 591 S1 – Computational Audio – Spring 2017
Speech Processing AEGIS RET All-Hands Meeting
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Spoken Digit Recognition
Ubiquitous Computing and Augmented Realities
Data Compression.
FM Hearing-Aid Device Checkpoint 2
Irreconcilable differences: game vs. story Examining game pace
CEN3722 Human Computer Interaction Displays
LANGUAGE TEACHING MODELS
Outline Linear Shift-invariant system Linear filters
Virtual Reality.
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Waves.
TECHNOLOGICAL PROGRESS
Audio and Speech Computers & New Media.
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Sound, language, thought and sense integration
John H.L. Hansen & Taufiq Al Babba Hasan
CHAPTER 4: Virtual Environments
Interactions of Waves Chapter 11.3.
Presentation transcript:

CS 445/656 Computer & New Media Audio, Speech and Music CS 445/656 Computer & New Media

Topics for Monday & Wednesday General Audio Speech Music Music management support

General Audio Mapping audio cues to events Recognizing sounds related to particular events (e.g. gunshot, falling, scream) Mapping events to audio cues Audio debugger to speed up stepping through code Spatialized audio Provides additional geographic/navigational channel

Background Audio in Games Immersion Most successful computer games have one important element in common: the ability to draw players in Sense of being “in a game”, where thoughts, attention and goals are all focused in the game Background Audio All the sound including music and sound effects Communicate aspect of the narrative, convey emotion, and enrich the experience

Background Audio in Games How to measure audio Immersion? Immersion questionnaire Psychological instruments Behavior during gameplay Functional Magnetic Resonance Imaging (fMRI)

Lair of Beowulf The user should be able to navigate in a sound mostly world, with number of caves, with a certain theme

DigiWall Computer game interface in the form of a climbing wall https://www.youtube.com/watch?v=mPkp8ziM34M In both games, audio is used In ways to create a sense of presence Communicate instructions, cues, clues, feedback and results from the game Use sound to blur the boarders between virtual reality and physical reality of the player

Ambience & Sound Effects Ambient sounds can be strong carriers of emotion and mood Beowulf, air softly flowing through game world DigiWall, used to set basic mood and encourage physical activity Sound effects for cues and clues Natural sounds to warn, attention, direction https://www.youtube.com/watch?v=LgTTMsj-K38

Spatialized Audio The projection and localization of sound sources in physical or virtual space or sound's spatial movement in space. Beamforming Timing for constructive interference to create stronger signal at desired location Crosstalk Cancellation Destructive interference to remove parts of signal at desired location Constructive superimposition

Head-Related Transfer Function (HRTF) Describes transformation of sound from free-filed to ear Difference in timing and signal strength determine how we identify position of sound The impulse response from the source to the ear drum is called the Head-Related Impulse Response (HRIR), and its Fourier transform H(f) is called the HRTF

Audio Signal Analysis Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) Transforms commonly used on audio signals Allow for analysis of frequency features across time (e.g. power contained in a frequency interval) FFTs have equal sized windows where wavelets can vary based on frequency. Transform the view of the signal from time-base to frequency-base.

Audio Signal Analysis Mel-frequency cepstral coeffients (MFCC) Based on FFTs Maps results into bands approximating human auditory system Natural to use the mel-scale and log amplitude since it relates to how we perceive sounds MFCCs are commonly used as features in speech recognition systems, such as the systems which can automatically recognize numbers spoken into a telephone. 

Echology An interactive soundscape combining human collaboration with aquarium activity Engage visitors to spend more time with (and learn more about) Beluga whales Motion of each layer controls one channel of sound Spatialized sound based on whale activity and human interaction http://www.vanaqua.org/learn/see-and-learn/live-cams/beluga-cam

Echology Uses spatial sound as its core expressive component that participants interact with Octophonic spatial sound allows participants to experience the movement of sound in a plane formed above their heads 8 buttons represents reflection points on the edge of the 8 loudspeakers The movement of Beluga whales across a layer controls amplitude and triggers sounds

Echology: Interaction 4 full circles represent location and amplitude of a layer of sound. Each circle fade in and out as level of activity of the Belugas increases or decreases. 8 Blue pacman circles represent reflection points and current reflection angle of the speaker. By hitting a button, participant change the direction of the reflection angle. Default pattern is each pointing to its adjacent speaker

Echology Architecture

Speech Speaker segmentation Speaker identification Speech recognition Identify when a change in speaker occurs Useful for basic indexing or summarization of speech content Speaker identification Identify who is speaking during a segment Enables search (and other features) based on speaker Speech recognition Identify the content of speech

Speaker Segmentation Speaker Diarisation Partitioning an input audio stream into homogeneous segments according to speaker identity Bottom-up clustering Algorithms can start in splitting the full audio content and progressively tries to merge the redundant clusters to reach each corresponds to a real speaker Top-down clustering Start with single cluster and split to reach clusters equals to number of speakers

Speaker Segmentation Open source speaker diarisation software ALIZE speaker diarization SpkDiarization Audioseg SHoUT

Speech Recognition Start by segmenting utterances and characterizing phonemes Use gaps to segment Group segments into words Classifiers for limited vocabulary (HMMs) Using Viterbi sampler and Baum-Welch re-estimation Continuous speech Language models for disambiguation Speaker dependent or not