Download presentation
Presentation is loading. Please wait.
1
The Beatbox Voice-to-Drum Synthesizer A BSTRACT The Beatbox is a real time voice-to-drum synthesizer intended primarily for the entertainment of small children. It accepts speech input limited to a small dictionary of sounds the system is pre-trained to recognize. Each sound in the dictionary has a pre-determined corresponding drumbeat, which is then played back to the user. In this manner, someone without knowledge of the drums can effectively “play” the instrument with his/her mouth. Speech recognition is a key tool in the design of the next-generation user- friendly computer application. A major obstacle remaining in the way of this goal is the detection of stop consonants, sounds created by stopping the flow of air in the mouth and letting it go into a burst (ex. b, d, g, k, p & t). Telling stop consonants apart is a difficult problem due to their similarity. Building a system to distinguish stop consonants may help bring speech recognition one step closer to reality. M OTIVATION Digital Sampling by sound card Pattern Recognition using Hidden Markov Models Audio Playback and Visual Feedback Digital Signal Processing Voice Input S YSTEM F LOW D IAGRAM D IGITAL S IGNAL P ROCESSING U NIT Fast Fourier Transform Logarithmic Compression Cepstrum Critical Band Integration Windowing Input Signal When a user’s voice input triggers the underlying engine, it is converted into a digital signal and passed onto the DSP unit. The signal is divided into tiny windows ~25 ms long and multiplied by the Hanning window, before its FFT is taken. Our system is modeled as a filter bank like the human ear, allowing for compression of information contained in the signals. Further redundancies are eliminated through Cepstral analysis before handing over the processed signal to the Pattern Recognition subsystem. P ATTERN R ECOGNITION U NIT Estimate parameters of /k/ HMM Viterbi algorithm infers N state sequences N samples of /k/ received from DSP unitConvergence Iterate till convergence Training: The recognition/ classification system is based on the theory of Hidden Markov Models (HMMs). Given the observation sequence, we infer the most likely underlying “hidden state” sequence using the Viterbi algorithm. We then iteratively estimate the parameters of the HMM till convergence is achieved. Testing: Each dictionary sound has a pre-trained HMM corresponding to it. The signal is passed through each HMM; the input is classified in favor of the HMM with the highest estimated likelihood. AUTHORS: PRIYADARSHINI ROUTH ARYEH LEVINE ARYEH LEVINE RAPHAEL LEVY RAPHAEL LEVY ADVISOR: PROF. LAWRENCE SAUL The Beatbox system is comprised of three main components: The DSP unit accepts voice input, then cleans and analyzes the incoming signal The Pattern Recognition subsystem uses frequency characteristics to probabilistically determine the most likely match for the input data The Demonstration system is a GUI that controls the audio and visual feedback given to the user D EMONSTRATION 90% accuracy on dictionary sounds if trained on the same user. 80% accuracy if pre-existing training set is used. D SP F LOWCHART H OW T HE S YSTEM H EARS Y OU ! Waveforms and Spectrograms of /k/, /pff/, /t/ T RAINING T HE /K/ H MM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.