Download presentation
Presentation is loading. Please wait.
1
CS 445/656 Computer & New Media
Audio and Speech CS 445/656 Computer & New Media
2
Topics for Today General Audio Speech
Basics of audio signal Features Event detection Speech Detection Segmentation Speaker identification Recognition Audio generation in software applications
3
The Audio Signal Energy at each frequency step for every recorded point of time
4
Features for Audio Analysis
Data over Time and Frequency
5
Energy Over Time What are these? Speech Music Gunshot
6
Summarizing the Audio Signal
Sum energy for bands of frequencies over intervals of time
7
Audio Signal Analysis Fast Fourier Transform (FFT)
Commonly used on audio signals Allows for analysis of frequency features across time Discrete Wavelet Transform (DWT) FFTs have equal sized windows where wavelets can vary based on frequency
8
Audio Signal Analysis Mel-frequency cepstral coeffients (MFCC)
Based on FFTs Maps results into bands approximating human auditory system
9
Event Detection Mapping audio cues to events
Recognizing sounds related to particular events (e.g. gunshot, falling, scream)
10
Classifying Audio Signals
Features are extracted from audio signals Can be time or frequency or both Features creates a multidimensional space of data points Supervised learning Train classifier with set of labeled signals SVMs, neural nets, … Unsupervised learning Cluster unlabeled signals based on similarity HAC, K-means, … Same for most any type of signal, not just audio
11
Speech Detection Another audio signal classification task
Complicated by background sounds
12
Distinguishing between Speakers
Speaker segmentation/diarization Identify when a change in speaker occurs Self-similarity assessments Useful for basic indexing or summarization of speech content Speaker identification Requires label attached to training data or label attached to cluster from unsupervised learning Enables search (and other features) based on speaker
13
Distinguishing between Speakers
Speaker segmentation/diarization Identify when a change in speaker occurs Useful for basic indexing or summarization of speech content Speaker identification Requires label attached to training data or label attached to cluster from unsupervised learning Enables search (and other features) based on speaker
14
Speech Recognition Segment utterances & characterize phonemes
Use gaps to segment Group phoneme segments into words Group words into requests or sentences
15
Speech Recognition Continuous speech What to do for noisy signal
Language models for disambiguation Speaker dependent training improves recognition What to do for noisy signal Topic spotting Heuristic search
16
Playing Back or Generating Audio
Where do you find audio cues in software outside of games? Mapping events in software to audio cues LogoMedia included audio cues to speed up stepping through code InfoSound used audio to aid in program comprehension Caitlin mapped code elements to different instruments
17
Playing Back or Generating Audio
Where do you find audio cues in software outside of games? Mapping events to audio cues Audio debugger to speed up stepping through code Spatialized audio Provides additional geographic/navigational channel Example: Michael Joyce’s Interactive Central Park
18
Spatialized Audio Additional geographic/navigational channel Examples
Joyce’s interactive Central Park hyperaudio Audio maps of city for the visually impaired Conveys distances, directions, and object sizes Not for use while moving at time of writing
19
Spatialized Audio Generation
Head-related transfer function (HRTF) Difference in timing and signal strength determine how we identify position of sound Easy to apply with headphones In open space Beamforming Timing for constructive interference to create stronger signal at desired location Crosstalk Cancellation Destructive interference to remove parts of signal at desired location
20
Echology An interactive 2D soundscape combining human collaboration with aquarium activity Goal: engage visitors to spend more time with (and learn more about) Beluga whales Spatialized sound based on whale activity and human interaction
21
Echology Interaction Whale activity is classified to create different sounds in soundstage Visitors determine how sounds move through space
22
Echology Architecture
23
Topics for Today General Audio Speech
Basics of audio signal Features Event detection Speech Detection Segmentation Speaker identification Recognition Audio generation in software applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.