Download presentation
Presentation is loading. Please wait.
1
Audio and Speech Computers & New Media
2
Topics for Today General Audio Speech
Basics of audio signal Features Event detection Speech Detection Segmentation Speaker identification Recognition Audio generation in software applications
3
The Audio Signal Energy at each frequency step for every recorded point of time
4
Features for Audio Analysis
Data over Time and Frequency
5
Energy Over Time What are these? Speech Music Gunshot
6
Summarizing the Audio Signal
Sum energy for bands of frequencies over intervals of time
7
Audio Signal Analysis Fast Fourier Transform (FFT)
Commonly used on audio signals Allows for analysis of frequency features across time Discrete Wavelet Transform (DWT) FFTs have equal sized windows where wavelets can vary based on frequency
8
Audio Signal Analysis Mel-frequency cepstral coeffients (MFCC)
Based on FFTs Maps results into bands approximating human auditory system
9
Event Detection Mapping audio cues to events
Recognizing sounds related to particular events (e.g. gunshot, falling, scream)
10
Classifying Audio Signals
Features are extracted from audio signals Can be time or frequency or both Features creates a multidimensional space of data points Supervised learning Train classifier with set of labeled signals SVMs, neural nets, … Unsupervised learning Cluster unlabeled signals based on similarity HAC, K-means, … Same for most any type of signal, not just audio
11
Speech Detection Another audio signal classification task
Complicated by background sounds
12
Distinguishing between Speakers
Speaker segmentation/diarization Identify when a change in speaker occurs Self-similarity assessments Useful for basic indexing or summarization of speech content Speaker identification Requires label attached to training data or label attached to cluster from unsupervised learning Enables search (and other features) based on speaker
13
Speech Recognition Segment utterances & characterize phonemes
Use gaps to segment Group phoneme segments into words Group words into requests or sentences
14
Speech Recognition Continuous speech What to do for noisy signal
Language models for disambiguation Speaker dependent training improves recognition What to do for noisy signal Topic spotting Heuristic search
15
Playing Back or Generating Audio
Where do you find audio cues in software outside of games? Mapping events in software to audio cues LogoMedia included audio cues to speed up stepping through code InfoSound used audio to aid in program comprehension Caitlin mapped code elements to different instruments
16
Spatialized Audio Additional geographic/navigational channel Examples
Joyce’s interactive Central Park hyperaudio Audio maps of city for the visually impaired Conveys distances, directions, and object sizes Not for use while moving at time of writing
17
Spatialized Audio Generation
Head-related transfer function (HRTF) Difference in timing and signal strength determine how we identify position of sound Easy to apply with headphones In open space Beamforming Timing for constructive interference to create stronger signal at desired location Crosstalk Cancellation Destructive interference to remove parts of signal at desired location
18
Echology: Interacting with Spatialized Audio
An interactive 2D soundscape combining human collaboration with aquarium activity Goal: engage visitors to spend more time with (and learn more about) Beluga whales Spatialized sound based on whale activity and human interaction
19
Echology Interaction Whale activity is classified to create different sounds in soundstage Visitors determine how sounds move through space
20
Echology Architecture
21
Topics for Today General Audio Speech
Basics of audio signal Features Event detection Speech Detection Segmentation Speaker identification Recognition Audio generation in software applications
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.