CS 445/656 Computer & New Media

CS 445/656 Computer & New Media
Audio, Speech and Music CS 445/656 Computer & New Media

Topics for Monday & Wednesday
General Audio Speech Music Music management support

General Audio Mapping audio cues to events
Recognizing sounds related to particular events (e.g. gunshot, falling, scream) Mapping events to audio cues Audio debugger to speed up stepping through code Spatialized audio Provides additional geographic/navigational channel

Background Audio in Games
Immersion Most successful computer games have one important element in common: the ability to draw players in Sense of being “in a game”, where thoughts, attention and goals are all focused in the game Background Audio All the sound including music and sound effects Communicate aspect of the narrative, convey emotion, and enrich the experience

Background Audio in Games
How to measure audio Immersion? Immersion questionnaire Psychological instruments Behavior during gameplay Functional Magnetic Resonance Imaging (fMRI)

Lair of Beowulf The user should be able to navigate in a sound mostly world, with number of caves, with a certain theme

DigiWall Computer game interface in the form of a climbing wall
In both games, audio is used In ways to create a sense of presence Communicate instructions, cues, clues, feedback and results from the game Use sound to blur the boarders between virtual reality and physical reality of the player

Ambience & Sound Effects
Ambient sounds can be strong carriers of emotion and mood Beowulf, air softly flowing through game world DigiWall, used to set basic mood and encourage physical activity Sound effects for cues and clues Natural sounds to warn, attention, direction

Spatialized Audio The projection and localization of sound sources in physical or virtual space or sound's spatial movement in space. Beamforming Timing for constructive interference to create stronger signal at desired location Crosstalk Cancellation Destructive interference to remove parts of signal at desired location Constructive superimposition

Head-Related Transfer Function (HRTF)
Describes transformation of sound from free-filed to ear Difference in timing and signal strength determine how we identify position of sound The impulse response from the source to the ear drum is called the Head-Related Impulse Response (HRIR), and its Fourier transform H(f) is called the HRTF

Audio Signal Analysis Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) Transforms commonly used on audio signals Allow for analysis of frequency features across time (e.g. power contained in a frequency interval) FFTs have equal sized windows where wavelets can vary based on frequency. Transform the view of the signal from time-base to frequency-base.

Audio Signal Analysis Mel-frequency cepstral coeffients (MFCC)
Based on FFTs Maps results into bands approximating human auditory system Natural to use the mel-scale and log amplitude since it relates to how we perceive sounds MFCCs are commonly used as features in speech recognition systems, such as the systems which can automatically recognize numbers spoken into a telephone.

Echology An interactive soundscape combining human collaboration with aquarium activity Engage visitors to spend more time with (and learn more about) Beluga whales Motion of each layer controls one channel of sound Spatialized sound based on whale activity and human interaction

Echology Uses spatial sound as its core expressive component that participants interact with Octophonic spatial sound allows participants to experience the movement of sound in a plane formed above their heads 8 buttons represents reflection points on the edge of the 8 loudspeakers The movement of Beluga whales across a layer controls amplitude and triggers sounds

Echology: Interaction
4 full circles represent location and amplitude of a layer of sound. Each circle fade in and out as level of activity of the Belugas increases or decreases. 8 Blue pacman circles represent reflection points and current reflection angle of the speaker. By hitting a button, participant change the direction of the reflection angle. Default pattern is each pointing to its adjacent speaker

Echology Architecture

Speech Speaker segmentation Speaker identification Speech recognition
Identify when a change in speaker occurs Useful for basic indexing or summarization of speech content Speaker identification Identify who is speaking during a segment Enables search (and other features) based on speaker Speech recognition Identify the content of speech

Speaker Segmentation Speaker Diarisation
Partitioning an input audio stream into homogeneous segments according to speaker identity Bottom-up clustering Algorithms can start in splitting the full audio content and progressively tries to merge the redundant clusters to reach each corresponds to a real speaker Top-down clustering Start with single cluster and split to reach clusters equals to number of speakers

Speaker Segmentation Open source speaker diarisation software
ALIZE speaker diarization SpkDiarization Audioseg SHoUT

Speech Recognition Start by segmenting utterances and characterizing phonemes Use gaps to segment Group segments into words Classifiers for limited vocabulary (HMMs) Using Viterbi sampler and Baum-Welch re-estimation Continuous speech Language models for disambiguation Speaker dependent or not

CS 445/656 Computer & New Media

Similar presentations

Presentation on theme: "CS 445/656 Computer & New Media"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 445/656 Computer & New Media

Similar presentations

Presentation on theme: "CS 445/656 Computer & New Media"— Presentation transcript:

Similar presentations

About project

Feedback