Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.

Slides:



Advertisements
Similar presentations
Speech Recognition There are different kinds of voice or speech “_______" that take the sounds of your voice and match it with words. The engine is software.
Advertisements

                      Digital Audio 1.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Accuracy vs Fluency Cesar Klauer 28 Feb., Presentation scheme What is fluency? What is accuracy? Fluency VS Accuracy? Communicative competence Suggestions.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Making a Clay Mask 6 Step 1 Step 2 Step 3Decision Point Step 5 Step 4 Reading ComponentsTypical Types of Tasks and Test Formats Phonological/Phonemic.
DSS: Decision Support Systems and AI: Artificial Intelligence
Spik v1.0 Voice Commands Execution in a Windows Environment Dekel Abelson Eliran Dahan Instructor: Ari Todtfeld.
Voice-enabled Image Identification System Design Aashish P. Shrestha Ming Ming Zheng Multimedia Signal Processing, University of Bridgeport, Connecticut.
DSS: Decision Support Systems and AI: Artificial Intelligence
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Describe the purpose, components, and use of speech recognition systems.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
Communicative Language Teaching Vocabulary
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
1 Computational Linguistics Ling 200 Spring 2006.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Teaching Productive Skills Which ones are they? Writing… and… Speaking They have similarities and Differences.
Natural Language Processing Rogelio Dávila Pérez Profesor – Investigador
Helynn Boughner EDU 674 Prof. Klein.  Is any technology that can help a person do a task. It can be as high- tech, as a computer system that speaks the.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
1 Lecture 1: Introduction to Artificial Intelligence.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
1 CP586 © Peter Lo 2003 Multimedia Communication Human Computer Interaction.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
© 2013 by Larson Technical Services
CONTENTS INTRODUCTION TO A.I. WORKING OF A.I. APPLICATIONS OF A.I. CONCLUSIONS ON A.I.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
© 2013 by Larson Technical Services
Basic structure of sphinx 4
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Basics of Natural Language Processing Introduction to Computational Linguistics.
Speech Recognition Created By : Kanjariya Hardik G.
1 Applying Principles To Reading Presented By Anne Davidson Michelle Diamond.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
How can speech technology be used to help people with disabilities?
G. Anushiya Rachel Project Officer
Natural Language Processing and Speech Enabled Applications
Speech Recognition There are different kinds of voice or speech "engines" that take the sounds of your voice and match it with words. The engine is software.
Automatic Speech Recognition
ARTIFICIAL NEURAL NETWORKS
Artificial Intelligence for Speech Recognition
DSS: Decision Support Systems and AI: Artificial Intelligence
                      Digital Audio 1.
Speech Recognition There are different kinds of voice or speech "engines" that take the sounds of your voice and match it with words. The engine is software.
Dialog Design 4 Speech & Natural Language
Kocaeli University Introduction to Engineering Applications
Artificial Intelligence 2004 Speech & Natural Language Processing
Keyword Spotting Dynamic Time Warping
Presentation transcript:

Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad

2 Presentation Content What is natural language processing –Speech synthesis –Speech recognition –Natural language understanding Basic concepts and terms Types of speech recognition engines Hardware requirements How speech recognition/synthesis works Speech enabled applications Applications of speech enabled system Commercial & non-commercial software

3 Natural language processing Natural Language Processing (NLP) or Computational Linguistic (CL) “is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty” [1]. “It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science that is aiming at computational models of human cognition” [1].

4 Natural Language Processing Other words, NLP is a discipline that aims to build computer systems that will be able to analyze, understand and generate human speech. Therefore, NLP sub areas of research are: –Speech Recognition (speech analysis), –Speech Synthesis (speech generation), and –Natural Language Understanding (NLU).

5 Speech Recognition & Synthesis Speech recognition is the process of converting spoken language to written text or some similar form. Speech synthesis is the process of converting the text into spoken language.

6 Natural Language Understanding Natural Language Understanding (NLU) is a process of analysis of recognized words and transforming them into data meaningful to computer. Other words, NLU is a computer based system that “understands” human language. NLU is used in combination with speech recognition.

7 Basic Terms and Concepts Utterance is any stream of speech between two periods of silence. Pronunciation is what the speech engine thinks a word should sound like. Grammars define a domain (of words) within which recognition engine works. Vocabulary (dictionary) a list of words (utterances) that can be recognized by the speech recognition engine. Training is the process of adapting the recognition system to a speaker.

8 Basic Terms and Concepts Accuracy is the measure of recognizer’s ability to correctly recognize utterances. Speaker Dependence –Speaker dependent system is designed for only one user (at the time). –Speaker independent system is designed for variety of speakers.

9 Types Of Speech Recognition Speech recognizers are divided into several different classes according to the type of utterance that they can to recognize: –Isolated words, –Connected words, –Continuous speech (computer dictation) –Spontaneous speech –Voice Verification –Voice Identification

10 Hardware Requirements Natural Language Processing requires string systems in order to work accurately and with a minimum response time. The important hardware parts are: –Sound Card –Microphone –Processor/RAM

11 How speech synthesis works? There are five major steps in the process of speech synthesis: –Structure analysis: process the structure of the input text. –Text pre-processing: analyze input text for special constructs of the language. –Text-to-phoneme conversion: converts each word to phonemes (e.g. “times” = “t ay m s”). –Prosody analysis: determining appropriate prosody for the sentence (e.g. pitch, timing, pausing, etc…). –Waveform production: phoneme and prosody information is used to produce the audio waveform.

12 How speech recognition works? The basic characteristics of mostly used speech recognizers are: –Mono-lingual, –Process a single input at the time, –Can optionally adopt to the voice of speaker, –Grammars can be dynamically updated, and –Has a small defined set of properties.

13 How speech recognition works? 1. Grammar design: Defines the words that may be spoken by a user and the pattern in which they may be spoken. 2. Signal Processing: Analyze the spectrum (frequency) characteristics of the incoming audio. Holds the knowledge of the environment (how user pronounces Phonemes) – User profile. 3. Phoneme Recognition: Compare spectrum patterns To the patterns of the phonemes. 4. Word recognition: Compare the sequence of likely phonemes against the words and patterns of words specified by grammar. 5. Result generation: Provides the information about the words that recognizer has detected.

14 Speech Enabled Applications -1 The primary aim of speech enabled applications is to improve interaction between user and machine. For this purpose are used both speech recognition and synthesis or either one of them. It mostly depends of the type of application and its purpose.

15 Speech Enabled Applications -2 Speech synthesis is farley easy for usage. After setting up the “type” of voice, the speed of “speaking”, the duration of pause between sentences, and so on, speech synthesis engine is ready for usage.

16 Speech enabled applications -3 Applying speech recognition requires careful analysis of what could be the possible inputs to the system, and the way in which user provides the input. The way in which user provides the input to the system, and the way the application responds to the user is called Natural Language Dialog. Natural Language Dialog is the first decision that developer must to make.

17 Natural Language Dialog -1 Three essential types of interaction that are available to software applications are: –Direct dialog, –Mixed initiative dialog, and –Natural dialog.

18 Natural Language Dialog -2 Direct Dialog Interaction directs the user to perform a specific task by asking for information at each turn and expecting the specific words or phrases in response. System:“Welcome to ABC bank customer services system. Please say your name.” User:“Nenad Pavlovic” System: “Please say your account number.” User:“ ” System:“Would you like to perform a transfer or to see the status on your account?” User:“Transfer.”, etc…

19 Natural Language Dialog - 3 Mixed initiative dialog Is similar to previous interaction dialog but it gives speaker some freedom. However, it allows user to have as much as little control as s/he desire. System:“Welcome to ABC bank customer services system. Please say your name.” User:“My name is Nenad Pavlovic, and my account number is: ” System:“Would you like to perform a transfer or to see the status on your account?” User:“Show me the status and than go to transfers.”, etc…

20 Natural Language Dialog - 4 Natural dialog Allows user to enjoy a more unstructured interaction with an application (as natural as possible) System:“Welcome to City Directory Dialer, how can I help you?” User:“I’d like to call Mr. George Eleftherakis in Tsimiski building.” System:“George Eleftherakis – Tsimiski building. Is this correct?” User:“Yes” System:“George Eleftherakis is found in directory. Calling…”, etc…

21 Grammars vs. Statistical NLU More freedom is given to the user to interact with application, the more complex processing of input data become. According to complexity of possible user inputs and used interaction dialog, it will be used on of two approaches of implementation: –Grammar-based NLU –Statistical NLU

22 Grammars vs. Statistical NLU Grammar-based NLU: relies on defining (creating) the grammar, which means constructing the phrases and stating all posible words that can be used. –Advantages: fast, allows freedom of phrases construction. –Disadvantages: used only for small set of phrases and words, if word or phrase is not defined it will not be recognized.

23 Grammars vs. Statistical NLU Statistical NLU: relies on usage of statistical model of utterances derived from actual conversation data. –Advantages: huge set of phrases and words –Disadvantages: slow, difficult to add new phrases.

24 Uses of speech applications The speech technology is mostly used in the following areas: –Dictation –Command and Control –Telephony –Wearables –Medical Disabilities –Embedded Applications

25 Speech Systems Commercial –IBM’s ViaVoice (Linux, Windows, MacOS) –Dragon NaturalySpeaking (Windows) –Microsoft’s Speech Engine (Windows) –BaBear (Linux, Windows, MacOS) –SpeechWorks (Linux, Sparc & x86 Solaris, Tru64, Unixware, Windows) Non-commercial –OpenMind Speech (Linux) –XVoice (Linux) –CVoiceControl/kVOiceControl (Linux) –GVoice (Linux)

26 Conclusion Developers’ perspective: developing speech enabled application does not require redesigning or explicitly designing systems to support speech. It is treated and “attached entity” and can be viewed as separate module. Also, It does not require special linguistic or programming skills. Business perspective: usage of speech enabled applications can noticeable improve the accuracy and effectives of employees that work with big number of data or people or both.

Thank you Thank you Pavlovic Nenad

28 References [1] Radev, R., D.(2001), “Natural Language Processing FAQ”, Columbia University, Dept. of Computer Science, NYC.