MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示) J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ. http://mirlab.org/jang 2018/5/13
Our R&D Foci About me Mission Approaches Application domains Music Use machine learning to tackle real-world problems with immediate applications Approaches New learning paradigms GPU for big data Application domains Music Retrieval and analysis Speech Recognition, scoring, and synthesis Image Classification and analysis for semiconductor manufacturing automation
Music-related Research Mature technologies Query by singing/humming Audio fingerprinting Music genre classification Music mood classification Beat tracking Query by tapping Pitch/time modification Under development MART Audio watermarking Audio melody extraction Singing voice separation Score following Drum id for gaming Singing scoring Vibrato detection Enthusiasism detection
Focus: Music Retrieval Large-scale music search Query by singing/humming Audio fingerprinting Achievements Top-ranked for some MIREX tasks: Genre classification Mood classification Beat tracking Audio melody extraction Technology transfer to several companies Flow chart Applications on toys Video Clients Cloud servers Request: Acoustic features PC Smartphones Response: search result Mobile devices I’m Billy Bass. I know what you are singing! Pat me and sing to me!
Demos for Music-related Research PC Query by singing & humming Audio fingerprinting Genre classification Beat tracking Singing voice separation Pitch scaling Real-time pitch tracking Drum position id. Embedded systems QBSH over Toys Apps Auto-rhythm game Beat-off drum game I’m Billy Bass. I know what you are singing! Pat me and sing to me!
Speech-related Research Mature technologies Voice commands (語音命令) Speech scoring (語音評分) Mandarin, English, Japanese, Taiwanese Text-to-speech synthesis for Mandarin (語音合成) Speech emotion classification Under development Long utterance and text alignment Speaker recognition
Snapshots of ASRA Applications 華語語音評分軟體(授權給資策會) 日語語音評分軟體(授權給巨匠電腦) 英語語音評分軟體(授權給Speak2me公司)
Demos for Speech-related Research PC Idiom relay (成語接龍) Recitation machine (唸唸不忘) Bricks of idioms (一語中的) Stress detection(重音偵測) Text-to-speech synthesis Chinese conversation classroom Speech scoring Voice commands Lucy’s Café Embedded systems Toys Voice commands over iOS/Android Mobile apps Speech scoring game
Image-related Research Projects with TSMC Wafer map failure pattern recognition Depth from SEM images Defect circuit image detection Wafer image enhancement Etching width prediction Face-based analysis Face recognition Age estimation Expression ID Gender classification Others Human identification Particle tracking Leaf identification People counting
Thank you for your attention! Questions & comments?