Low Power Speech Enhancement David Halupka Ph.D. Candidate Electronics Group June 24 th, 2005
University of Toronto 2 of 6 Motivation Today’s recognition systems can achieve a 95%+ recognition accuracy after extensive training Research systems: same accuracy with no training Typically: 10% accuracy in the presence of noise, reverberations, and conflicting conversations Humans are equipped to deal with noisy environments Two ears → let us localize and focus on a single speaker Complex noise: one sensor doesn’t cut it Multiple microphones → superhuman noise filtering
June 24th, 2005 University of Toronto 3 of 6 Step 1: Sound Localization d x+τν x t t m 2 (t) m 1 (t) Time-Based Cross-Correlation
June 24th, 2005 University of Toronto 4 of 6 Step 2: Speech Enhancement
June 24th, 2005 University of Toronto 5 of 6 A Hard Case for Hardware Localization is a exhaustive linear search Gradient search, etc. not applicable Each time delay must be checked Each likelihood can be evaluated in parallel 1 GHz Intel Pentium III needed just for real- time localization → consumes 35 W Speech interface is beneficial for handheld devices, but battery life is limited. Palm M100 → 150 mW
June 24th, 2005 University of Toronto 6 of 6 Results – 0.18 μm CMOS Die Size: 2.51 mm x 2.51 mm Power Utilization: 29 mW Die Size: 1.51 mm x 1.38 mm Power Utilization: 3.45 mW FPGA: 184 mW DSP: 650 mW