Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16
Outline 1. Introduction 2. System Architecture 3. Proposed Methods 4. Experimental Results 5. Conclusions
1. Introduction Multimedia Information Retrieval Traditional multimedia database — File name or keywords Content-Based Retrieval (CBR) The musical files of the computer are mainly included WAV and MIDI MP3 (MPEG Audio Layer-3) –Tone quality is like CD music –Storage space is small Text 、 Picture and Audio showing the information description of the information content Waveform Audio : Recording the real musical data Musical Instrument Digital Interface : Recording the simplified musical data WAVMIDI tone quality memory space
1. Introduction cont. Query By Humming (QBH) is getting more and more attention Using QBH to search MP3 music, we have to consider the following issues: –Background music –Inaccuracy of humming –Inconsistent length of phrase
2. System Architecture MP3 Database Feature Extraction and Transformation Feature Extraction and Transformation MP3 Segmentation MP3 Segmentation MP3 Phase Feature Database Feature Extraction and Transformation Feature Extraction and Transformation Similarity Matching MP3 Phase Feature Database Establishment of MP3 Feature Database. Music Content-Based Retrieval. MP3 M P 3 (1,2,3) 1.xxxxxx 2. xxxxxx 3. xxxxxx
3. Proposed Methods Relative note is defined as the pitch change between the current musical note and its preceding one The N-gram approach is according to the relationship of near N musical notes Using N-gram to defined mapping functions (N = 2, 3 and 4) 79, 76, 76, 77, 74, 74, 72, 74, 76, 77, 79, 79, > 76 = 76 < 77 > 74 = 74 > 72 < 74 <76 < 77 < 79 = 79 = 79
3. Proposed Methods cont. Relative Note Mapping Function Distance of near musical notes Distance of near musical notes Euclidean distance Method I Method II MIDI Note Similarity Matching of the MP3 Phase N-gram : Bi-gram 、 Tri-gram 、 Four-gram Bi-gram 、 Tri-gram 、 Four-gram by proper proportion
3. Proposed Methods Method I <1 =2 >3 Bi-gram Tri-gram Four-gram <<1=<4><7 <=2==5>=8 <>3=>6>>9 N = 2 N = 3 N = 4 <<<1=<<10><<19 <<=2=<=11><=20 3=<>12><>21 <=<4==<13>=<22 <==5===14>==23 6==>15>=>24 <><7=><16>><25 <>=8=>=17>>=26 <>>9=>>18>>>27 Method I : Method Based on Relative Note
3. Proposed Methods Method I <1 =2 >3 79 > 76 = = 74 > 72 < 74 < 76 < 77 < 79 = 79 = <<1=<4><7 <=2==5>=8 <>3=>6>> Bi-gramTri-gram Method I : Method Based on Relative Note
3. Proposed Methods Method I Euclidean distance N is the smaller dimension between and = The vector of query i by singer The vector of phrase j in MP3 phrase feature database Mapping Function (3,2,1) (2,1,3) (3,2,1,3,3) (2,1,3) N = 3 Min,,
3. Proposed Methods Method I w B + w T + w F = 1 Bi-gram mapping function Tri-gram mapping function Four-gram mapping function
3. Proposed Methods Method II Singer humming and the data of paragraph of MP3 phases Euclidean distance 79, 76, 76, 77, 74, 74, 72, 74, 76, 77, 79, 79, 79 -3, 0, +1, -3, 0, -2, +2, +2, +1, +2, 0, 0 Method II : Method Based on Note Difference
Original <=> <123 =456 >789 Sample11<2<3 Diff. <<16 Correct 3>2<3 ><7 Sample23>2>1 2 >>9 Method III : Modified Mapping Function 3. Proposed Methods Method III
Improve <=> <T1T2T3 =T4T5T6 >T7T8T9 Sample11<2<3Diff T7-T12 Correct 3>2< T7 Sample23>2>12 T7-T
4. Experimental Results The aim of these experiments are –Investigate the performance of the matching method –Probe into some influence factors in order to achieve better performance Data Collection –Cashbox KTV billboard most hot 20 songs –675 phases
4. Experimental Results cont. 8.5% 38.3% 53.2% 27.6% 66.0% 76.6% 14.9% 78.7% 91.5% 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% TOP 5TOP 10TOP 15 Method IMethod IIMethod III Method I : Using N-gram to do melody similarity comparison (the baseline one). Method II : Using near note difference to do melody similarity comparison. Method III : Using N-gram to do melody similarity comparison (modified mapping function). Mapping Function Something Wrong MP3 music extraction Did not sing the songs correctly Describe the characteristic of the music
4. Experimental Results cont. Method 1 Method 2 More flexible Too rigid
5. Conclusions The aim of this paper is to develop a system for users to retrieve the MP3 music object by simply humming Matching methods –Using N-gram and Mapping Function to do melody similarity comparison (Method I 、 III) –Using near note difference to do melody similarity comparison (Method II) Our experimental results show that –Encoding based on N-gram method and using Mapping Function to the music melody matching is feasible
5. Conclusions cont. The proportion of Bi-gram, Tri-gram and Four-gram –Bi-gram information was naturally loose, but it was more flexible and fault-tolerant. –Four-gram was more rigid and strong, but it was less flexible. –The behaviors of Tri-gram were between those of Bi-gram and Four-gram. Tri-gram > Bi-gram > Four-gram
5. Conclusions cont. Method IIMethod III The up and down relationship The characteristic of pitch difference The up and down relationship Too rigid in a senseMore flexible Almost correctSome mistakes
5. Conclusions cont. Find out more reasonable Mapping Functions Merge Method III and Method II Use better fault-tolerant techniques Consider different kinds of characteristics of music Develop a better method of extracting the melody of MP3 object Segment MP3 phrases automatically
Thank You !!!