Natural Language Processing and Speech Enabled Applications

Slides:

Advertisements

Similar presentations

Introduction to Computational Linguistics

Advertisements

Speech Recognition There are different kinds of voice or speech “_______" that take the sounds of your voice and match it with words. The engine is software.

Natural Language Systems

Digital Audio 1.

PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,

Making a Clay Mask 6 Step 1 Step 2 Step 3Decision Point Step 5 Step 4 Reading ComponentsTypical Types of Tasks and Test Formats Phonological/Phonemic.

DSS: Decision Support Systems and AI: Artificial Intelligence

Voice-enabled Image Identification System Design Aashish P. Shrestha Ming Ming Zheng Multimedia Signal Processing, University of Bridgeport, Connecticut.

DSS: Decision Support Systems and AI: Artificial Intelligence

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.

Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.

Describe the purpose, components, and use of speech recognition systems.

ISSUES IN SPEECH RECOGNITION Shraddha Sharma

Speech Recognition. My computer doesn’t understand me……….. Software is now mainstream Many people use it within office/home setting for inputting text.

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Profile and a quick introduction Software Engineering: ) هندسة البرمجيات (in Arabic: is the branch of computer science Designed to develop a set rules.

Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.

1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.

Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.

1 Computational Linguistics Ling 200 Spring 2006.

 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.

1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.

Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©

Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©

1 CP586 © Peter Lo 2003 Multimedia Communication Human Computer Interaction.

Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.

CONTENTS INTRODUCTION TO A.I. WORKING OF A.I. APPLICATIONS OF A.I. CONCLUSIONS ON A.I.

Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.

Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.

Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.

Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.

© 2013 by Larson Technical Services

Basic structure of sphinx 4

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Speech Recognition Created By : Kanjariya Hardik G.

Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.

By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

Introduction to Machine Learning, its potential usage in network area,

How can speech technology be used to help people with disabilities?

WP4 Models and Contents Quality Assessment

G. Anushiya Rachel Project Officer

2/21/ :54 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.

Speech Recognition

Speech Recognition There are different kinds of voice or speech "engines" that take the sounds of your voice and match it with words. The engine is software.

Natural Language Processing with Qt

Introduction Characteristics Advantages Limitations

Automatic Speech Recognition

Intro to Machine Learning

ARTIFICIAL NEURAL NETWORKS

System Design Ashima Wadhwa.

Artificial Intelligence for Speech Recognition

Reading and Frequency Lists

DSS: Decision Support Systems and AI: Artificial Intelligence

Introduction CSE 1310 – Introduction to Computers and Programming

Digital Audio 1.

ARTIFICIAL INTELLIGENCE.

Speech Recognition There are different kinds of voice or speech "engines" that take the sounds of your voice and match it with words. The engine is software.

Specifying, Compiling, and Testing Grammars

Kocaeli University Introduction to Engineering Applications

¨Educating for a new Citizenship¨

Command Me Specification

Algorithms and Problem Solving

Chapter 5 Architectural Design.

Artificial Intelligence

Artificial Intelligence 2004 Speech & Natural Language Processing

Keyword Spotting Dynamic Time Warping

Presentation transcript:

Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad

Presentation Content What is natural language processing Speech synthesis Speech recognition Natural language understanding Basic concepts and terms Types of speech recognition engines Hardware requirements How speech recognition/synthesis works Speech enabled applications Applications of speech enabled system Commercial & non-commercial software

Natural language processing Natural Language Processing (NLP) or Computational Linguistic (CL) “is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty” [1]. “It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science that is aiming at computational models of human cognition” [1].

Natural Language Processing Other words, NLP is a discipline that aims to build computer systems that will be able to analyze, understand and generate human speech. Therefore, NLP sub areas of research are: Speech Recognition (speech analysis), Speech Synthesis (speech generation), and Natural Language Understanding (NLU).

Speech Recognition & Synthesis Speech recognition is the process of converting spoken language to written text or some similar form. Speech synthesis is the process of converting the text into spoken language.

Natural Language Understanding Natural Language Understanding (NLU) is a process of analysis of recognized words and transforming them into data meaningful to computer. Other words, NLU is a computer based system that “understands” human language. NLU is used in combination with speech recognition.

Basic Terms and Concepts Utterance is any stream of speech between two periods of silence. Pronunciation is what the speech engine thinks a word should sound like. Grammars define a domain (of words) within which recognition engine works. Vocabulary (dictionary) a list of words (utterances) that can be recognized by the speech recognition engine. Training is the process of adapting the recognition system to a speaker.

Basic Terms and Concepts Accuracy is the measure of recognizer’s ability to correctly recognize utterances. Speaker Dependence Speaker dependent system is designed for only one user (at the time). Speaker independent system is designed for variety of speakers.

Types Of Speech Recognition Speech recognizers are divided into several different classes according to the type of utterance that they can to recognize: Isolated words, Connected words, Continuous speech (computer dictation) Spontaneous speech Voice Verification Voice Identification

Hardware Requirements Natural Language Processing requires string systems in order to work accurately and with a minimum response time. The important hardware parts are: Sound Card Microphone Processor/RAM

How speech synthesis works? There are five major steps in the process of speech synthesis: Structure analysis: process the structure of the input text. Text pre-processing: analyze input text for special constructs of the language. Text-to-phoneme conversion: converts each word to phonemes (e.g. “times” = “t ay m s”). Prosody analysis: determining appropriate prosody for the sentence (e.g. pitch, timing, pausing, etc…). Waveform production: phoneme and prosody information is used to produce the audio waveform.

How speech recognition works? The basic characteristics of mostly used speech recognizers are: Mono-lingual, Process a single input at the time, Can optionally adopt to the voice of speaker, Grammars can be dynamically updated, and Has a small defined set of properties.

How speech recognition works? 1. Grammar design: Defines the words that may be spoken by a user and the pattern in which they may be spoken. 4. Word recognition: Compare the sequence of likely phonemes against the words and patterns of words specified by grammar. 5. Result generation: Provides the information about the words that recognizer has detected. 3. Phoneme Recognition: Compare spectrum patterns To the patterns of the phonemes. 2. Signal Processing: Analyze the spectrum (frequency) characteristics of the incoming audio. Holds the knowledge of the environment (how user pronounces Phonemes) – User profile.

Speech Enabled Applications -1 The primary aim of speech enabled applications is to improve interaction between user and machine. For this purpose are used both speech recognition and synthesis or either one of them. It mostly depends of the type of application and its purpose.

Speech Enabled Applications -2 Speech synthesis is farley easy for usage. After setting up the “type” of voice, the speed of “speaking”, the duration of pause between sentences, and so on, speech synthesis engine is ready for usage.

Speech enabled applications -3 Applying speech recognition requires careful analysis of what could be the possible inputs to the system, and the way in which user provides the input. The way in which user provides the input to the system, and the way the application responds to the user is called Natural Language Dialog. Natural Language Dialog is the first decision that developer must to make.

Natural Language Dialog -1 Three essential types of interaction that are available to software applications are: Direct dialog, Mixed initiative dialog, and Natural dialog.

Natural Language Dialog -2 Direct Dialog Interaction directs the user to perform a specific task by asking for information at each turn and expecting the specific words or phrases in response. System: “Welcome to ABC bank customer services system. Please say your name.” User: “Nenad Pavlovic” System: “Please say your account number.” User: “1234-123-12332-1233” System: “Would you like to perform a transfer or to see the status on your account?” User: “Transfer.”, etc…

Natural Language Dialog - 3 Mixed initiative dialog Is similar to previous interaction dialog but it gives speaker some freedom. However, it allows user to have as much as little control as s/he desire. System: “Welcome to ABC bank customer services system. Please say your name.” User: “My name is Nenad Pavlovic, and my account number is: 1234-123-12332-1233” System: “Would you like to perform a transfer or to see the status on your account?” User: “Show me the status and than go to transfers.”, etc…

Natural Language Dialog - 4 Natural dialog Allows user to enjoy a more unstructured interaction with an application (as natural as possible) System: “Welcome to City Directory Dialer, how can I help you?” User: “I’d like to call Mr. George Eleftherakis in Tsimiski building.” System: “George Eleftherakis – Tsimiski building. Is this correct?” User: “Yes” System: “George Eleftherakis is found in directory. Calling…”, etc…

Grammars vs. Statistical NLU More freedom is given to the user to interact with application, the more complex processing of input data become. According to complexity of possible user inputs and used interaction dialog, it will be used on of two approaches of implementation: Grammar-based NLU Statistical NLU

Grammars vs. Statistical NLU Grammar-based NLU: relies on defining (creating) the grammar, which means constructing the phrases and stating all posible words that can be used. Advantages: fast, allows freedom of phrases construction. Disadvantages: used only for small set of phrases and words, if word or phrase is not defined it will not be recognized.

Grammars vs. Statistical NLU Statistical NLU: relies on usage of statistical model of utterances derived from actual conversation data. Advantages: huge set of phrases and words Disadvantages: slow, difficult to add new phrases.

Uses of speech applications The speech technology is mostly used in the following areas: Dictation Command and Control Telephony Wearables Medical Disabilities Embedded Applications

Speech Systems Commercial Non-commercial IBM’s ViaVoice (Linux, Windows, MacOS) Dragon NaturalySpeaking (Windows) Microsoft’s Speech Engine (Windows) BaBear (Linux, Windows, MacOS) SpeechWorks (Linux, Sparc & x86 Solaris, Tru64, Unixware, Windows) Non-commercial OpenMind Speech (Linux) XVoice (Linux) CVoiceControl/kVOiceControl (Linux) GVoice (Linux)

Conclusion Developers’ perspective: developing speech enabled application does not require redesigning or explicitly designing systems to support speech. It is treated and “attached entity” and can be viewed as separate module. Also, It does not require special linguistic or programming skills. Business perspective: usage of speech enabled applications can noticeable improve the accuracy and effectives of employees that work with big number of data or people or both.

Pavlovic Nenad pavlovic@city.academic.gr Thank you  Pavlovic Nenad pavlovic@city.academic.gr

References [1] Radev, R., D.(2001), “Natural Language Processing FAQ”, Columbia University, Dept. of Computer Science, NYC.