SPEECH TECHNOLOGY An Overview Gopala Krishna. A (gopalakrishna@students)
SPEECH TECHNOLOGY WHAT IS SPEECH TECHNOLOGY ABOUT ?? SPEECH TECHNOLOGY IS ABOUT PROCESSING HUMAN SPEECH as SIGNAL as a form of LANGUAGE
SPEECH TECHNOLOGY Speech Processing By Machine Algorithms: Speech Recognition, synthesis, coding etc.
SPEECH TECHNOLOGY WHAT ALL IS INVOLVED IN PROCESSING SPEECH ?? MULTI-DISCIPLINARY FIELD Linguistics Physiology Psychology Signal Processing Acoustics (Physics) Statistics Pattern Recognition Communication Theory Computer Science: A.I. Heuristics / Machine Learning Speech Technology
Applications: Man-Machine Communication Bio Metrics Smart Talking Machines, devices Speech enabled web interface Communication Speech Coding Speech Enhancement Bio Metrics Speaker Identification – security applications Entertainment Technology Singing Voices Voice Conversion Artificial Characters / Avatar
Research Areas: Speech Recognition (Speech To Text) Speech Synthesis (Text to Speech) Speech Coding (Compression of speech) Speech Enhancement (Voice quality) Speaker Recognition (Identity of the speaker) Spoken Language Identification (Which language?) Language Models (Modeling of natural text) Multimedia (Integration of Audio & Visual modes)
WHAT ARE WE DOING CURRENTLY ?? SPEECH TECHNOLOGY WHAT ARE WE DOING CURRENTLY ?? INDIAN LANGUAGE SPEECH SYNTHESIS Hindi and Telugu TTS building Prosody SPEECH RECOGNITION - Large Vocabulary ASR - Landmark-based ASR 3. SPEECH-ENABLED INTERFACES
Text-To-Speech Synthesis (TTS) Indian Language TTS Effort Hindi, Telugu, Tamil, Kannada, Bengali Text Normalization Machine learning Techniques Speech Segmentation Ergodic HMMs, SVMs….etc
Automatic Speech Recognition Large Vocabulary ASR “Mimic”ing the Sphinx Alternative ASR Techniques HMM-ANN Hybrid Dynamic Bayesian Networks (DBNs) Landmark-based Segmentation
Speech Enabled Interfaces Screen Readers RAVI (Reading Aid for the Visually Impaired) Porting to Low Memory devices Talking Tourist Aid Agent (PDA) Speech-to-Speech Devices Limited Domain Bi-lingual translation
Projects Hindi TTS (Sponsored by NOKIA) Telugu TTS (Sponsored by Bhrigus Inc.) Speech Recognition Systems for Indian Lang. (Sponsored by HP labs India) Reading Software For Blind (Sponsored by Ministry of Social Justice).
Stream Courses Speech Technology: A Practical Introduction Topics in Speech Processing Building TTS and ASR Systems Signal Processing Language Modeling - Intro. To NLP - Language and Statistics Machine Learning, Pattern Recognition
WHO WILL YOU BE WORKING WITH ?? SPEECH TECHNOLOGY WHO WILL YOU BE WORKING WITH ?? S. P. Kishore (Ph.D. @ CMU, Scientist @ IIIT) Prof. Rajeev Sangal Dr. Vasudeva Varma Faculty Members of Speech Group at CMU Dr. Alan Black, Dr. Jim Baker….. (TTS) (ASR) Fellow researchers
What you get at the end Career Opportunities – Off late, many companies and R & D organizations are investing in Indian Language and specifically in speech systems - Microsoft Research (Bangalore), HP Labs India, Yahoo India Research skills – publications Interaction and collaboration with faculty members of Speech Group at Carnegie Mellon System building skills - Would have developed speech systems using state of art techniques Opportunities for higher education in India and abroad
SPEECH TECHNOLOGY QUESTIONS [ skishore@cs.cmu.edu ]