Download presentation
Presentation is loading. Please wait.
Published byKelsie Gonzalez Modified over 9 years ago
2
Hierarchy of Design Voice Controlled Remote Voice Input Control Path Speech Processing IR Interface
3
Characteristics of Speech n Amplitude variations n Frequency variations n Continuous in frequency domain n Most of the energy is within 100Hz to 4kHz n Requires >8kHz sampling for intelligible speech
4
Our Speech Algorithm n Isolated word - cannot distinguish important areas in a stream of uninterrupted speech n “Small” vocabulary - in the zero to tens of words region - Up, Down, Power, Surf n Training Required - tells the device what the command sounds like n Speaker Dependent - re-training required for separate user
5
The Voice Input n Condenser microphone n Signal is amplified approximately 6000x n Sampling rate ~8 kHz n 8 bit linear conversion
6
Word Boundary Detection n Samples continuously n Has the threshold level been reached? n Begin analyzing the data n Is the threshold level being reached very often? n Stop analyzing the data
7
Zero Crossings n One transition from positive to negative or vice-versa n Algorithm to determine the frequency of the signal n Frequency inversely proportional to the period
8
Energy Analysis n The energy of the signal is the amplitude squared (Parseval’s theorem). n we used absolute value of amplitude. n Real-time calculation (as it is received).
9
The Recognition Process Compare the characteristics of the sample against Command1 Compare the characteristics of the sample against Command2 Compare the characteristics of the sample against Command3 Compare the characteristics of the sample against Command4 The command most similar to the recognized word. The command most similar to the recognized word. The command that was spoken
10
The Infra-Red Beam n Detects and stores codes for common Sony TVs n Utilizes blind copycat method of IR memory, no decoding occurs n Method easily modified to other IR protocols
11
General A/V IR coding schemes n 38-40kHz carrier at 940nm wavelength n Carrier output is gated by bit stream. n Most protocols use Pulse Width Modulation for bit encoding. – Logic ‘1’s coded as T (un-modulated) followed by 2T (modulated). Where T 550 s – Logic ‘0’s coded as T (un-modulated) followed by T (modulated). n Various bit lengths, start and end sequences.
12
Gated modulation to Carrier
13
Bit stream for “Power” command
14
Bit stream for “Channel up” command
15
Common North American IR code sequence
16
The Control Path n Implemented in two Moore state machines n Training/Initialization n Active/Recognition
17
The Surf Function n Start and stop the function with the utterance of the command SURF n Enables a three-second preview of each channel n Risk of developing carpal- tunnel syndrome decreases sharply!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.