Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. © Machine-Learning Based Classification Of Speech And Music Khan, MKS; Al-Khatib, WG SPRINGER, MULTIMEDIA SYSTEMS;

Similar presentations


Presentation on theme: "1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. © Machine-Learning Based Classification Of Speech And Music Khan, MKS; Al-Khatib, WG SPRINGER, MULTIMEDIA SYSTEMS;"— Presentation transcript:

1 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. © Machine-Learning Based Classification Of Speech And Music Khan, MKS; Al-Khatib, WG SPRINGER, MULTIMEDIA SYSTEMS; pp: 55-67; Vol: 12 King Fahd University of Petroleum & Minerals http://www.kfupm.edu.sa Summary The need to classify audio into categories such as speech or music is an important aspect of many multimedia document retrieval systems. In this paper, we investigate audio features that have not been previously used in music-speech classification, such as the mean and variance of the discrete wavelet transform, the variance of Mel- frequency cepstral coefficients, the root mean square of a lowpass signal, and the difference of the maximum and minimum zero-crossings. We, then, employ fuzzy C- means clustering to the problem of selecting a viable set of features that enables better classification accuracy. Three different classification frameworks have been studied:Multi-Layer Perceptron (MLP) Neural Networks, radial basis functions (RBF) Neural Networks, and Hidden Markov Model (HMM), and results of each framework have been reported and compared. Our extensive experimentation have identified a subset of features that contributes most to accurate classification, and have shown that MLP networks are the most suitable classification framework for the problem at hand. References: BEIERHOLM T, 2004, P 17 INT C PATT REC, V2, P379 BEZDEK JC, 1981, PATTERN RECOGNITION BUGATTI A, 2002, EURASIP J APPL SIG P, V4, P372 CAREY MJ, 1999, P IEEE INT C AC SPEE, V1, P149 CHOU W, 2001, P ICASSP 01 SALT LAK, V2, P865 CYBENKO G, 1989, MATH CONTROL SIGNAL, V2, P303 DELFS C, 1998, P INT C AC SPEECH SI, V3, P1569 DUDA RO, 2001, PATTERN CLASSIFICATI ELMALEH K, 2000, P ICASSP2000 JUN, V4, P2445 HARB H, 2001, P 7 INT C DISTR MULT, P257 HARB H, 2003, P 7 INT C SIGN PROC, V2, P125 HOYT JD, 1994, P INT C NEUR NETW IE, V7, P4493 Copyright: King Fahd University of Petroleum & Minerals; http://www.kfupm.edu.sa

2 13. 14. 15. 16. 17. 18. 19. © KARNEBACK S, 2001, P EUR C SPEECH COMM, P1891 KHAN MKS, 2005, THESIS KING FAHD U P LAMBROU T, 1998, P INT C AC SPEECH SI, V6, P3621 LI DG, 2001, PATTERN RECOGN LETT, V22, P533 LIPP OV, 2004, EMOTION, V4, P233, DOI 10.1037/1528-3542.4.3.233 LU L, 2001, P 9 ACM INT C MULT, P203 LU L, 2002, IEEE T SPEECH AUDI P, V10, P504, DOI 10.1109/TSA.2002.804546 20. LU L, 2003, ACM MULTIMEDIA SYSTE, V8, P482 21. MAMMONE RJ, 1994, ARTIFICIAL NEURAL NE 22. PANAGIOTAKIS C, 2004, IEEE T MULTIMEDIA 23. PARRIS ES, 1999, P EUROSPEECH 99 BUD, P2191 24. PELTONEN V, 2001, THESIS TAMPERE U TEC 25. PINQUIER J, 2002, P ICSLP 02, V3, P2005 26. PINQUIER J, 2002, P INT C AC SPEECH SI, V4, P4164 27. PINQUIER J, 2003, P INT C AC SPEECH SI, V2, P17 28. RABINER LR, 1986, IEEE ASSP MAG, V3, P4 29. SAAD EM, 2002, P 19 NAT RAD SCI C N, P208 30. SAUNDERS J, 1996, P INT C AC SPEECH SI, V2, P993 31. SCHEIRER E, 1997, P ICASSP 97, V2, P1331 32. SHAO X, 2003, P 4 INT C INF COMM S, V3, P1823 33. SRINIVASAN SH, 2004, P INT C AC SPEECH SI, V4, P321 34. TZANETAKIS G, 1999, EUROMICRO WORKSH MUS, V2, P61 35. TZANETAKIS G, 2001, P INT S MUS INF RETR, P205 36. TZANETAKIS G, 2002, IEEE T SPEECH AUDI P, V10, P293 37. WANG WQ, 2003, P INF COMM SIGN PROC, V3, P1325 For pre-prints please write to: wasfi@ccse.kfupm.edu.sa Copyright: King Fahd University of Petroleum & Minerals; http://www.kfupm.edu.sa


Download ppt "1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. © Machine-Learning Based Classification Of Speech And Music Khan, MKS; Al-Khatib, WG SPRINGER, MULTIMEDIA SYSTEMS;"

Similar presentations


Ads by Google