Download presentation
Presentation is loading. Please wait.
Published byEdmund Hart Modified over 9 years ago
1
Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr. Michael Casey Centre for Computational Creativity Department of Computing City University, London
2
Centre for Computational Creativity Overview MPEG-7 Tools Low Level Audio Descriptors Statistical Sound Models (Semantic ?) Music Unmixing Independent Spectrogram Separation Sound Classification Automatic label extraction “Semantic” processing Segment Similarity, Structure Extraction Musaics S-Matrix (Self-Similarity Matrix) C-Matrix (Cross-Similarity Matrix) Segment Replacement Musaics
3
Centre for Computational Creativity Semantic Audio Analysis Acoustic Features Extraction Semantic Audio Description
4
Centre for Computational Creativity MPEG-7 Audio Descriptors Header
5
Centre for Computational Creativity MPEG-7 Audio Descriptors Segments
6
Centre for Computational Creativity MPEG-7 Audio Descriptors Descriptor
7
Centre for Computational Creativity Some Useful Descriptors for Music Processing AudioSpectrumEnvelopeD AudioSpectrumBasisD AudioSpectrumProjectionD SoundModelDS SoundModelStatePathD SoundModelStateHistogramD
8
Centre for Computational Creativity EXAMPLE 1 MUSIC UNMIXING
9
Centre for Computational Creativity AudioSpectrumBasisD
10
Centre for Computational Creativity AudioSpectrumBasisD SVD / ICA Basis Rotation AudioSpectrumProjectionD AudioSpectrumBasisD
11
Centre for Computational Creativity AudioSpectrumBasisD
12
Centre for Computational Creativity AudioSpectrumProjectionD SVD / ICA Basis Rotation AudioSpectrumProjectionD AudioSpectrumBasisD
13
Centre for Computational Creativity AudioSpectrumProjectionD
14
Centre for Computational Creativity Outer Product Spectrum Reconstruction Individual Basis Component
15
Centre for Computational Creativity 4 Component Reconstruction
16
Centre for Computational Creativity 10 Component Reconstruction
17
Centre for Computational Creativity Linear basis projection using SVD and ICA spectrum subspace separation fast computation of subspace ICA full-rate filterbank masking Blocked ICA functions subspace reconstruction Y = XVV cluster subspaces to identify “tracks” sum masked filterbank output to create audio Music Unmixing + j jj
18
Centre for Computational Creativity 1 Component 4 Components 10 Components Subspace Extraction Mixture Spectrogram Independent Spectrogram Subspace Layers Spectral Basis Time Function Spectrogram Layer
19
Centre for Computational Creativity Music Unmixing Example (Pink Floyd: mono -> 9 subspace tracks)
20
Centre for Computational Creativity EXAMPLE 2 AUTOMATIC AUDIO CLASSIFICATION
21
Centre for Computational Creativity Sound Model DS and related descriptors 1 3 3 2 2 3 4 4 4 4... 1 23 4 ContinuousHiddenMarkovModelDS SoundModelStatePathD AudioSpectrumBasisD T(i,j) x AudioSpectrumEnvelopeD AudioSpectrumProjectionD
22
Centre for Computational Creativity Sound Recognition using HMMs Trained HMMs Sound Database
23
Centre for Computational Creativity MPEG-7: Intelligent Music Browsing
24
Centre for Computational Creativity Music Genre Classification: Class Name Num of Files Num Segments 1) Blues 79 86 2) hiphop 15 129 3) Gospel 23 25 4) Country 27 28 5) DrumNBass 26 275 6) Classical 8 156 7) 2Step 39 311 8) Merengue 34 304 9) Reggae 80 398 10) Salsa 39 425 ------------------------------------------- Totals 370 2137
25
Centre for Computational Creativity Music Genre Classification
26
Centre for Computational Creativity Semantic Audio: General Sound Taxonomy
27
Centre for Computational Creativity DS: General Audio Classification
28
Centre for Computational Creativity EXAMPLE 3 STRUCTURE EXTRACTION
29
Centre for Computational Creativity Structure Discovery Acoustic Features State-Space Models Hierarchical Structure Discovery
30
Centre for Computational Creativity SoundModelStatePathD State Path A simplified representation of spectral dynamics
31
Centre for Computational Creativity SoundModelStateHistogramD seconds state index 0.01s Frames
32
Centre for Computational Creativity High-Level Structure Discovery
33
Centre for Computational Creativity S-Matrix
34
Centre for Computational Creativity STRUCTURE EXTRACTION == SEGMENTATION
35
Centre for Computational Creativity Structure Discovery Low level features High-level Structure Acoustic Features State-Space Models Hierarchical Structure Discovery
36
Centre for Computational Creativity Alanis Morrisette Human Segmentation Machine Segmentation High-Level Structure Discovery
37
Centre for Computational Creativity Cranberries Human Segmentation Machine Segmentation High-Level Structure Discovery
38
Centre for Computational Creativity Nirvana Human Segmentation Machine Segmentation High-Level Structure Discovery
39
Centre for Computational Creativity High-Level Structure Discovery
40
Centre for Computational Creativity EXAMPLE 4 MUSAICS
41
Centre for Computational Creativity Musaics ( Music Mosaics) C-Matrix : Cross-Song Similarity Matrix Outer product of target and source histograms Find segments similar to target segment Similarity between all target and database segments SORT columns of similarity matrix Replace segments with similar material Segmentation boundaries (beat alignment) Replace with “best fit” using DTW on most similar segments EXAMPLES
42
Centre for Computational Creativity Musaics Target Extract MPEG-7 Database StatePathHistograms Segment Beats Match Replace Musaic
43
Centre for Computational Creativity Musaics
44
Centre for Computational Creativity Musaics
45
Centre for Computational Creativity Musaics
46
Centre for Computational Creativity Musaics
47
Centre for Computational Creativity Musaics
48
Centre for Computational Creativity Musaics
49
Centre for Computational Creativity Musaics
50
Centre for Computational Creativity Musaics
51
Centre for Computational Creativity Musaics
52
Centre for Computational Creativity Musaics
53
Centre for Computational Creativity Musaics
54
Centre for Computational Creativity Musaics New Content by Similarity Replacement C-Matrix: Cross-Song Similarity Map 1 Target, Many Sources Constraints Preserve Rhythm by Beat Tracking Preserve Beats by DTW alignment Bigger Source Database == Better Greater Number of Accurate Matches
55
Centre for Computational Creativity Acknowledgements International Standards Organisation ISO/IEC JTC 1 SC29 WG11 (MPEG) Mitsubishi Electric Research Labs Massachusetts Institute of Technology Music Mind Machine Group (formerly Machine Listening Group) Paris Smaragdis, Youngmoo Kim, Brian Whitman Iroro Orife, John Hershey, Alex Westner, Kevin Wilson City University Department of Computing Centre for Computational Creativity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.