Presentation is loading. Please wait.

Presentation is loading. Please wait.

OpenWorld 2018 Audio Recognition Using Oracle Data Science Platform

Similar presentations


Presentation on theme: "OpenWorld 2018 Audio Recognition Using Oracle Data Science Platform"— Presentation transcript:

1 OpenWorld 2018 Audio Recognition Using Oracle Data Science Platform
Welcome for attending this hands on labs. My name is Jean-Rene Gauthier and I’m a lead data scientist working on the Oracle Data Science Platform. I was part of the company DataScience.com which was acquired by Oracle in June of this year. Jean-René Gauthier Lead Data Scientist Oracle Data Science Platform October 24, 2018 Confidential – Oracle Internal/Restricted/Highly Restricted

2 Presentation Agenda 1 Quick Overview of the Upcoming Oracle Data Science Platform Speech Recognition and Keyword Spotting Overview of the Lab Lab 2 3 The lab/demo we will go through today will be done on the legacy DataScience.com platform. The set of features available on the Oracle Data Science Platform will be similar to what you will use on the legacy product today. Ask a few questions : How many have coded or are familiar with Python? How many have used Jupyter notebooks in the past? How many have used a data science platform before?  4 The Lab GitHub Public Repository Confidential – Oracle Internal/Restricted/Highly Restricted

3 Oracle Data Science Platform
What is It? The Oracle Data Science Platform enables data science teams to organize their work, easily access data and computing resources, and build, train, deploy, and manage models on the Oracle Cloud. What’s the Value? The Oracle Data Science Platform makes data science teams more productive, and enables them to deploy more work faster to power their organizations with machine learning. We want to make data scientists more productive? How? Giving them self-service access to scalable compute resource to train and deploy their models. A platform to manage these models after they deployed them. All on the latest and great Oracle Cloud Infrastructure hardware (GPUs) The value? 5% of data science projects make their way to production. We want to shorten the time to model deployment. We want to limit the involvement of IT teams and devops engineers in the completion of data science projects *Final name pending legal review and approval

4 Oracle Data Science Platform Core Capabilities
End-to-end platform for enterprise data scientists Data science workflow: Collaboration for enterprise data science teams in projects Model building and training*: Python development in Jupyter notebooks Model deployment: Deploy models as APIs, serve predictions in real-time Version control: External Git Provider required for files Access to open-source: Curated sets of packages for data science use cases Access to compute: Self-service access to spin up containers on OKE Cluster of OCI VMs (CPU only) Access to data: Oracle Object Store  * Model training in single Jupyter container with reserved CPU/memory (non-distributed over multiple containers)

5 Speech Recognition Applications are endless
“The task of speech recognition is to map an acoustic signal containing a spoken natural language utterance into the corresponding sequence of words intended by the speaker.” - Goodfellow et al. (2016) Deep Learning, MIT Press High level overview: Use cases Task of the lab Dataset Waveform and sampling FFT Machine learning model Lab Applications are endless Speech transcription Text captions for movies or TV Issue commands to your car while driving Issue commands to personal assistants (e.g. Siri, Google Home, Alexa, etc) Etc. Confidential – Oracle Internal/Restricted/Highly Restricted

6 Speech Recognition – Keyword spotting task
Problem Statement Personal assistant devices use keyword spotting to start interactions (e.g. Siri, Alexa, Hey Google, etc.) Most devices run a small keyword recognition (spotting) model locally. The device listens and runs a model to spot keywords. If a keyword is recognized, data transfer to the cloud starts. Otherwise, device is listening and calling locally-stored model for inferences. You need a local, fast model you can store on the device You don’t want to continuously transfer data to the cloud. Very costly and ineffective. You need a small model that runs fewer operations. That’s the difference here. Typically: one word or very short sentences like “Hey Siri” Confidential – Oracle Internal/Restricted/Highly Restricted

7 The Lab in One Slide Notebook 1 (optional) Notebook 2 Notebook 3
This is what we’re going to do: We are going to use Jupyter notebooks written in python on the legacy DataScience.com Platform A machine learning models (deep learning model) using keras that can recognize audio keywords. We’re going to start with a collection of 1-sec audio clips of someone saying a single word (cat, dog, left, right, etc.) Then we’re going to take that data and transform it into a 2-D map called a spectrogram. We give you a basic introduction to spectrograms, etc. in notebook 1. I don’t think we’ll have time to go through that notebook. It’s optional. In notebook 2, we’ll train a convolutional neural network using keras to classify these keywords. This is a modeling technique that his generally applied to images. We’re going to deploy this model as a REST API endpoint. Follow what a real data scientist would do on this platform and move this model to a production environment. Lastly, I will run a notebook on my local laptop and call that web service to classify different audio clips. I’ll also create live clips that I will classify with the REST API endpoint Notebook 3 Confidential – Oracle Internal/Restricted/Highly Restricted

8 The Speech Command Dataset (Warden 2018)
Standard training & evaluation dataset for simple speech recognition tasks 105k utterances in WAVE audio file format: Single word spoken. 35 different words: Dog, cat, bed, bird, up, wow, yes, etc. One second or less 16 kHz sampling rate 16-bit single channel 2,618 speakers were recorded Recorded with phone or computer mic in realistic settings This is the dataset we’re going to use. It’s called the Speech Command Dataset. English only. Confidential – Oracle Internal/Restricted/Highly Restricted

9 References Warren, P. 2018, ”Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition” Shainath, T.N., Parada, C. 2015, “Convolutional Neural Networks for Small- footprint Keywork Spotting”, Interspeech 2015, ISCA Confidential – Oracle Internal/Restricted/Highly Restricted

10 The Lab GitHub Public Repository
Any Questions? All the Lab materials are freely available on github.com. Feel free to download the materials and go over all the notebooks in your free times. Confidential – Oracle Internal/Restricted/Highly Restricted

11

12 CNN Model for Keywork Spotting Task
CNN Model ( 2 convo layers + 2 max pools + 2 dropouts; 2 FC layers) (see also Shainath & Parada 2015 ) Confidential – Oracle Internal/Restricted/Highly Restricted


Download ppt "OpenWorld 2018 Audio Recognition Using Oracle Data Science Platform"

Similar presentations


Ads by Google