AN IMPROVED AUDIO Jenn Tam Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
Data Mining Classification: Alternative Techniques
An Introduction of Support Vector Machine
An Overview of Machine Learning
CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart A Computer Program that can generate and grade test that: Most Humans.
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Classification and Decision Boundaries
Discriminative and generative methods for bags of features
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
K nearest neighbor and Rocchio algorithm
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.
Jeff Yan School of Computing Science Newcastle University, UK (Joint work with Ahmad Salah El Ahmad) Usability of CAPTCHAs Or “usability issues in CAPTCHA.
Speaker Adaptation for Vowel Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:
Optimal Adaptation for Statistical Classifiers Xiao Li.
CAPTCHA Prabhakar Verma “08MC30”.
Human Computation CSC4170 Web Intelligence and Social Computing Tutorial 7 Tutor: Tom Chao Zhou
FYP0202 Advanced Audio Information Retrieval System By Alex Fok, Shirley Ng.
Vision-Based Biometric Authentication System by Padraic o hIarnain Final Year Project Presentation.
ECE 8443 – Pattern Recognition ECE 3163 – Signals and Systems Objectives: Pattern Recognition Feature Generation Linear Prediction Gaussian Mixture Models.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Speech and Language Processing
Exploration Seminar 3 Human Computation Roy McElmurry.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Permission-based Malware Detection in Android Devices REU fellow: Nadeen Saleh 1, Faculty mentor: Dr. Wenjia Li 2 Affiliation: 1. Florida Atlantic University,
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Jacob Zurasky ECE5526 – Spring 2011
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Presented By, Shivvasangari Subramani. 1. Introduction 2. Problem Definition 3. Intuition 4. Experiments 5. Real Time Implementation 6. Future Plans 7.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
CS315 Multimedia Search and Retrieval. Types of Multimedia Multimedia: Beyond text communication Stored in a variety of formats Audio Today most popular.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
By: Steven Baker.  What is a CAPTCHA?  History of CAPTCHA  Applications of CAPTCHAs  Accessibility  Examples of CAPTCHAs  reCAPTCHA  Vulnerabilities.
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
SUBMITTED TO:-SUBMITTED BY:- Ms.Kavita KhannaShruty Ahuja H.O.D(CSE DEPARTMENT)02/MT/10 PDM,BAHADURGARHCE(2 ND SEM)
Billy Vivian Dr. Oblitey COSC  What is CAPTCHA?  History  Uses  Artificial Intelligence Relationship  reCAPTCHA  Works Cited.
CAPTCHA Presented by: Md.R ahim 08B21A Agenda Definition Background Motivation Applications Types of CAPTCHAs Breaking CAPTCHAs Proposed Approach.
SANDEEP MEHTA (ECE, IV Year). CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart Invented at CMU by Luis von Ahn, Manuel.
Recognition of bumblebee species by their buzzing sound
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
ARTIFICIAL NEURAL NETWORKS
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
COSC 4335: Other Classification Techniques
Presented By Vibhute J.B. Class : M.Sc. (CS)
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Presentation transcript:

AN IMPROVED AUDIO Jenn Tam Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA

1 WHAT ARE CAPTCHAS? CAPTCHAs are tests generated by computers and generally passable by humans but not current computer programs.

2 THE PROBLEM WITH CURRENT AUDIO CAPTCHAS In some cases the human passing rate is only 70%! To make the CAPTCHAs secure, noise was injected into the audio files making it harder for both computers and humans to pass.

3 ARE CURRENT AUDIO CAPTCHAS SECURE? A CAPTCHA is considered broken once a program can pass it 5% of the time. Since the current audio CAPTCHAs use a limited vocabulary, it was possible for us to collect enough data to train a system that could pass the current audio CAPTCHAs more than 45% of the time.

4 HOW DID WE TEST THE CURRENT AUDIO CAPTCHAs? Selected three different types of audio CAPTCHAs: google, reCAPTCHA, and digg Collected 1000 CAPTCHAs per type of audio CAPTCHA to use for training and testing Created an ASR system using machine learning techniques

5 THE ALGORITHM Given the.wav file of an audio CAPTCHA Segmentation - selecting portions of the audio which most likely are digits/letters Recognition Extract features from the segment Classify segment as digit/letter or noise and output the label Stop once a maximum number of segments are classified

6 ALGORITHM DETAILS - SEGMENTATION CAPTCHAs were manually labeled and segmented. We created training segments using this information. For testing, we chose the highest energy peaks in the test CAPTCHA and selected fixed size segments roughly centered at the peaks.

7 ALGORITHM DETAILS - FEATURES We used three popular techniques for extracting features from speech to derive 5 sets of features from the audio. Mel-frequency cepstral coefficients (MFCC) Perceptual linear prediction (PLP) Relative spectral transform with PLP (RASTA- PLP)

8 ALGORITHM DETAILS - AdaBoost Used decision stumps for weak classifiers For each type of audio CAPTCHA we created enough classifiers to label a segment as a digit, letter, or noise.

9 ALGORITHM DETAILS - SVM Created a single multiclass classifier using all the training segments (from 900 CAPTCHAs)

10 ALGORITHM DETAILS - k-NN Created 5 classifiers corresponding to each of the feature sets Used Euclidean distance as our distance metric Cross-validation gives us k=1

11 THE ALGORITHM Input: Audio CAPTCHA as an audio file Segmentation Find the highest energy peak, and extract a fixed size segment centered at that peak Recognition Extract features from segment Give segment to classifier and obtain label Stop extracting segments once all segments have been labeled or a max solution size is reached.

12 Using three machine learning techniques to perform ASR on the CAPTCHAs AdaBoost Support Vector Machines (SVM) k-Nearest Neighbor (k-NN) ANALYSIS OF CURRENT AUDIO CAPTCHAs

13 THE GOAL Make a secure audio CAPTCHA which will be easier for a human to pass and harder for a computer to pass. Equate solving a CAPTCHA with doing some useful work. In other words, create an audio reCAPTCHA.

14 WHAT IS reCAPTCHA? reCAPTCHA helps digitize text on which OCR fails by using the text as its CAPTCHA. Since millions of people solve CAPTCHAs each day, millions of words get digitized each day!

15

16 THE AUDIO RECAPTCHA Takes advantage of the human ability to understand words through context. Will help transcribe digital audio on which ASR systems fail. The audio being used was originally recorded with the intention that it should be easily understood by humans.

17 HOW WILL IT WORK? Start with a database of phrases with known transcriptions. Give user adjacent phrases to transcribe as the CAPTCHA. Check user solution against the database to determine the result of the test. Store the rest of the solution as transcription

18 That was the shot that killed Harry Lime. He died in a That was the shot that killed Harry Lime. He died in a Harry Lime he died in a sewer beneath Vienna Harry Lime he died in a sewer beneath Vienna Harry Lime. He died in a Segment #1Segment #2Segment #3

19 ANALYSIS OF SECURITY Speaker independent recognition is difficult. Open vocabularies make it even more difficult for ASR systems AM broadcasts and.mp3 compression cause the loss of important data needed for automatic analysis

20 CONCLUSION CAPTCHAs need to be more accessible, yet remain secure and not too difficult for humans. Deploy audio reCAPTCHA through reCAPTCHA site. Help make knowledge captured in audio available in text form

21 ACKNOWLEDGEMENTS Dr. Luis von Ahn, CMU Dr. Manuel Blum, CMU Dr. Roni Rosenfeld, CMU David Huggins-Daines, CMU Jiri Simsa, CMU Sean Hyde, CMU