Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.

Similar presentations


Presentation on theme: "1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment."— Presentation transcript:

1

2 1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment Residuals Asheesh Kashyap Spring 2005

3 2 ELEN 6820 Music contains much self-similarity and repetition at various levels of detail (many repeated segments). Remove redundancy by storing a single copy of a repeated segment, and then referencing it every time it is used. Can be used in conjunction with other audio compression techniques, such as MP3 (hence, metacompression). Concept has already been explored by Joseph Hazboun, “Detection of Audio Similarity for Redundancy Removal”, ELEN 6820, Spring 2004. Midterm Presentation High Quality Music Metacompression Using Repeated-Segment Residuals Music Compression

4 3 ELEN 6820Midterm Presentation High Quality Music Metacompression Using Repeated-Segment Residuals Previous Work Hazboun’s Method Phase I: Divide song into 1 sec segments, and correlate each segment with every other segment. Keep values with corr > 0.78. Phase II: Group successive 1 sec segments together. Phase III: Find similarity of 256 ms pairs with corr > 0.82 (fine tuning). Phase IV: Perform alignment of segments using a 2 ms STFT correlation. Phase V: Compare segments based on sum of spectral energy over each frequency, and discard segments with similarities < 0.995* Phase VI: Remove overlapping segments, and define new, longer similar segment (new start and end points). Phase VII: Encode audio stream by removing redundant segments. * In Hazboun’s example, identical tune with different lyrics has correlation of 0.968.

5 4 ELEN 6820 Current methods, such as Hazboun’s method, apply simple replacement scheme for repeated segments. Imposes high standards for audio similarity (corr > 0.995) Audibly dissimilar segments removed from consideration (conservative). Extension A: can relax similarity constraint by storing residuals (error difference between reference and repeated segments). Extension B: separate music and voice components (music has more self-similarity). Validate performance using two samples from contemporary, techno and classical music. Midterm Presentation High Quality Music Metacompression Using Repeated-Segment Residuals Extensions to Previous Work

6 5 ELEN 6820 Residuals: error difference between reference and repeated segments. Midterm Presentation High Quality Music Metacompression Using Repeated-Segment Residuals Extension A: Residuals - referencerepeated = residual Transmitting residuals allows more precise reconstruction of original waveform (higher quality), and relaxes audio similarity constraint. Basis of video compression Residuals should compress well, as they contain much less information than original signal (lower amplitude and / or fewer components). Basis of video compression.

7 6 ELEN 6820 Change Phase V to relax the similarity requirement from 0.995 down to 0.945 in 0.010 increments. This should allow us to compress segments with similar music, but different vocals. Modify Phase VII to generate residuals for repeated segments instead of removing the segment. Midterm Presentation High Quality Music Metacompression Using Repeated-Segment Residuals Extension A: Modification to Hazboun’s Code Convert wave to MP3 and compare compression with baseline (i.e., converting unmodifed song from wave to MP3). Convert MP3 back to wave files, decode and compare SNR with original decoded song.

8 7 ELEN 6820 Separating voice from music may result in improved compression. Changed lyrics produce different formants, can hamper our correlation/alignment. Challenging part Separation of music and voice is an extremely difficult problem. Compressing voice and music components separately requires two streams or files (compression needs to be much better). Perfect separation is not required for our purposes (our goal is compression). Correlation and alignment performed on segments with voice removed, but encoding uses original segments (music component will be maximally compressed). Midterm Presentation High Quality Music Metacompression Using Repeated-Segment Residuals Extension B: Separating Voice from Music

9 8 Formants still visible in presence of music. Use cepstral analysis to find max. peak in range 70-255 Hz (voice excitation pitch) for each timeslice. Build a filter bank that attenuates frequencies at pitch harmonics. Take derivative across spectrogram to minimize horizontal bands (musical notes). Midterm Presentation High Quality Music Metacompression Using Repeated-Segment Residuals Extension B: Simple Algorithm


Download ppt "1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment."

Similar presentations


Ads by Google