CS 445/656 Computer & New Media

CS 445/656 Computer & New Media
Audio and Speech CS 445/656 Computer & New Media

Topics for Today General Audio Speech
Basics of audio signal Features Event detection Speech Detection Segmentation Speaker identification Recognition Audio generation in software applications

The Audio Signal Energy at each frequency step for every recorded point of time

Features for Audio Analysis
Data over Time and Frequency

Energy Over Time What are these? Speech Music Gunshot

Summarizing the Audio Signal
Sum energy for bands of frequencies over intervals of time

Audio Signal Analysis Fast Fourier Transform (FFT)
Commonly used on audio signals Allows for analysis of frequency features across time Discrete Wavelet Transform (DWT) FFTs have equal sized windows where wavelets can vary based on frequency

Audio Signal Analysis Mel-frequency cepstral coeffients (MFCC)
Based on FFTs Maps results into bands approximating human auditory system

Event Detection Mapping audio cues to events
Recognizing sounds related to particular events (e.g. gunshot, falling, scream)

Classifying Audio Signals
Features are extracted from audio signals Can be time or frequency or both Features creates a multidimensional space of data points Supervised learning Train classifier with set of labeled signals SVMs, neural nets, … Unsupervised learning Cluster unlabeled signals based on similarity HAC, K-means, … Same for most any type of signal, not just audio

Speech Detection Another audio signal classification task
Complicated by background sounds

Distinguishing between Speakers
Speaker segmentation/diarization Identify when a change in speaker occurs Self-similarity assessments Useful for basic indexing or summarization of speech content Speaker identification Requires label attached to training data or label attached to cluster from unsupervised learning Enables search (and other features) based on speaker

Distinguishing between Speakers
Speaker segmentation/diarization Identify when a change in speaker occurs Useful for basic indexing or summarization of speech content Speaker identification Requires label attached to training data or label attached to cluster from unsupervised learning Enables search (and other features) based on speaker

Speech Recognition Segment utterances & characterize phonemes
Use gaps to segment Group phoneme segments into words Group words into requests or sentences

Speech Recognition Continuous speech What to do for noisy signal
Language models for disambiguation Speaker dependent training improves recognition What to do for noisy signal Topic spotting Heuristic search

Playing Back or Generating Audio
Where do you find audio cues in software outside of games? Mapping events in software to audio cues LogoMedia included audio cues to speed up stepping through code InfoSound used audio to aid in program comprehension Caitlin mapped code elements to different instruments

Playing Back or Generating Audio
Where do you find audio cues in software outside of games? Mapping events to audio cues Audio debugger to speed up stepping through code Spatialized audio Provides additional geographic/navigational channel Example: Michael Joyce’s Interactive Central Park

Spatialized Audio Additional geographic/navigational channel Examples
Joyce’s interactive Central Park hyperaudio Audio maps of city for the visually impaired Conveys distances, directions, and object sizes Not for use while moving at time of writing

Spatialized Audio Generation
Head-related transfer function (HRTF) Difference in timing and signal strength determine how we identify position of sound Easy to apply with headphones In open space Beamforming Timing for constructive interference to create stronger signal at desired location Crosstalk Cancellation Destructive interference to remove parts of signal at desired location

Echology An interactive 2D soundscape combining human collaboration with aquarium activity Goal: engage visitors to spend more time with (and learn more about) Beluga whales Spatialized sound based on whale activity and human interaction

Echology Interaction Whale activity is classified to create different sounds in soundstage Visitors determine how sounds move through space

Echology Architecture

Topics for Today General Audio Speech
Basics of audio signal Features Event detection Speech Detection Segmentation Speaker identification Recognition Audio generation in software applications

CS 445/656 Computer & New Media

Similar presentations

Presentation on theme: "CS 445/656 Computer & New Media"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 445/656 Computer & New Media

Similar presentations

Presentation on theme: "CS 445/656 Computer & New Media"— Presentation transcript:

Similar presentations

About project

Feedback