Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia, Vol. 7, No. 1, February 2005 Mark A. Bartsch, Member, IEEE, and Gregory H. Wakefield, Member, IEEE

Introduction Multimedia content is growing rapidly Efficient method of browsing is necessary Indexing and retrieval methods are media- dependent

Primary goal Minimize audition time for a given type of media

Current methods Images –Downsampling Produces a smaller version of image (thumbnail) Reduces cost of delivery and display

Current methods Audio: speech –Symbolic representation Produces a transcript of the audio

What about music? Adapt an existing method: –Downsampling (time compression) Results in highly distorted, unintelligible audio

What about music? Adapt an existing method (cont’d): –Symbolic representation (score transcription) Extremely difficult Results in essentially meaningless information Does not convey other important elements: –Vocal style –Instruments used –Processing effects used

Essential problem: Adapting existing methods cannot reduce the audition time for music and effectively convey the “gist” of the song

Possible Solution: Audio thumbnailing via chroma- based analysis

Audio thumbnailing Produces a short clip of the selection to represent the “gist” of the song

Chroma-based analysis Based on the extraction of chroma features from the audio Chroma Feature Extraction Algorithm: –Frame Segmentation –Feature Calculation –Correlation Calculation –Correlation Filtering –Thumbnail Selection

Chroma Feature Extraction Extract frequencies from audio file Calculate chroma values from frequencies: Categorize chroma values into pitch classes –12 pitch classes: A, A#/Bb, C, C#/Db, …, G#/Ab

Frame Segmentation Author’s Implementation: –Determined via beat tracking algorithm –Range: 0.25s to 0.56s Our Implementation: –Average of range: 0.41s

Feature Calculation Calculate 12-element chroma feature vector, v t for each frame: –Apply FFT to each frequency: –Constraints: Minimum frequency: 20 Hz –Lower limit of human hearing Maximum frequency: 2000 Hz –Higher frequencies effect the perception of chroma

Correlation Calculation Calculate similarity matrix C –Each element is equal to the correlation between two feature vectors: –High correlation along diagonals in the matrix indicate repetitions within the song

Correlation Filtering Calculate the filtered time-lag matrix T: –Exposes similarity between extended segments that are separated by constant lag –Filtering is performed along the diagonals of C Uses a symmetric rectangular windowing function (a uniform moving average filter) –T is then “rotated” so that the diagonals are oriented vertically

Thumbnail Selection Select maximum value in T –The location of this value indicates: Occurrence of the segment (the y-coordinate) Lag time (the x-coordinate) –Constraints: Minimum lag time = 1/10 of song length Maximum start time = 3/4 of song length –To reduce susceptibility to “fading repeat”

Results Jimmy Buffet – “Math Sucks” –System: [64, 89] Lifehouse – “You and Me” –System: [38, 63] Gavin DeGraw – “I Don’t Want To Be” –System: [95, 120] Super Mario Brothers Theme –System: [18, 43]

Conclusion Successfully extracted time segments which closely match the chorus of the song Feature Calculation issue: –Author’s implementation unclear

Possible Uses Audio domain: –Improved search capability Searching for similar songs –Audio fingerprinting Other domains: –Detection of irregular heartbeats

Suggested Improvements and Alternatives Image-based analysis on the waveform Tested alternatives –MSE on signal frequencies Chroma-based analysis proved more correct

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Similar presentations

Presentation on theme: "Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Similar presentations

Presentation on theme: "Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,"— Presentation transcript:

Similar presentations

About project

Feedback