Breaking CAPTCHA By Willer Travassos. What it is CAPTCHA? CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart.

Slides:



Advertisements
Similar presentations
YEARBOOK Layout and Design.
Advertisements

CS 352: Computer Graphics Chapter 7: The Rendering Pipeline.
EDW647 Internet For Educators Setting Up a Gmail Account Roger W. Webster, Ph.D. Department of Computer Science Millersville University (717)
Using Different Forms of Basic Knowledge of the 3 Different Platform: Outlook, AOL and HTML Prepared by Mitch.
CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart A Computer Program that can generate and grade test that: Most Humans.
DIGITAL IMAGE PROCESSING
A Low-cost Attack on a Microsoft CAPTCHA Yan Qiang,
CAPTCHA Presented by: Sari Louis SPAM Group: Marc Gagnon, Sari Louis, Steve White University of Illinois Spring 2006.
CONTOUR LINES.
Breaking an Animated CAPTCHA Scheme
Multimedia Data Introduction to Image Processing Dr Mike Spann Electronic, Electrical and Computer.
Jeff Yan School of Computing Science Newcastle University, UK (Joint work with Ahmad Salah El Ahmad) Usability of CAPTCHAs Or “usability issues in CAPTCHA.
CAPTCHA Prabhakar Verma “08MC30”.
California Car License Plate Recognition System ZhengHui Hu Advisor: Dr. Kang.
Copyright © 2012 Elsevier Inc. All rights reserved.. Chapter 9 Binary Shape Analysis.
Downloading and Installing AutoCAD Architecture 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the software.
Gmail Tutorial This tutorial aims to quickly cover some of the basic elements of web based using Gmail - a free service Use the Index on the.
Basics: Getting Started Uploading and Sharing Videos on YouTube. Basics: Getting Started Uploading and Sharing Videos on YouTube. 1.
PowerPoint: Tables Computer Information Technology Section 5-11 Some text and examples used with permission from: Note: We are.
USSF North Carolina Arbiter Referee Assignment Site An Introduction & Guide for Use by Referees © Copyright June 2005 by Paul James, all rights reserved.
Internet Research Finding Free and Fee-based Obituaries Online.
Review of last Session Adding custom html Adding custom html HTML is the language that web servers understand, all web pages are created using HTML. HTML.
Hotmail Tutorial This tutorial aims to quickly cover some of the basic elements of web based using msn Hotmail - a free service Use the Index.
1 SKETCHING and LETTERING Print multiple handouts on one page to save paper Select File – Print Edit the following selections to read: Select the OK button.
CAPTCHA 1 Are you Human? (Sorry, I had to ask). CAPTCHA 2 Agenda What is CAPTCHA? Types of CAPTCHA Where to use CAPTCHAs? Guidelines when making a CAPTCHA.
Part 2  Access Control 1 CAPTCHA Part 2  Access Control 2 Turing Test Proposed by Alan Turing in 1950 Human asks questions to another human and a computer,
There are lots of wikis out there… But I like…. A how-to for the classroom.
Optical Manufacturing Solutions 1 FOLLOW THE DIRECTIONS BELOW AFTER INSTALLING A NEW CUTTER BODY WITH POLISHING WHEELS INTO THE 7E TYPE MACHINES. IT IS.
Customer Service and Support Sutherland Global Services Consultant Learning Services Microsoft Store.
WORD PROCESSING UNIT 2 TERMS. LOG INTO MOODLE DO THIS Click on Unit 2 Terms Save them in your CBA, Unit 2 Folder Open them We are going to edit them together!
Addison Wesley is an imprint of © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 5 Working with Images Starting Out with Games & Graphics in.
September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.
CAPTCHA solving Tianhui Cai Period 3. CAPTCHAs Completely Automated Public Turing tests to tell Computers and Humans Apart Determines whether a user is.
Multimedia Data Introduction to Image Processing Dr Sandra I. Woolley Electronic, Electrical.
Downloading and Installing Autodesk Revit 2016
Lesson 11: Looking at Files and Folders what a file or folder is on the computer how to recognize a file or folder on the desktop how to recognize the.
Preventing Automated Use of STMP Reservation System Using CAPTCHA.
Presented By: Abirami Poonkundran Authors: Jeff Yan, Ahmad El Ahmad.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
Game Maker Terminology
Making Python Pretty!. How to Use This Presentation… Download a copy of this presentation to your ‘Computing’ folder. Follow the code examples, and put.
How to Create an Address How to Create a Free Account, Read and Answer your s. Yahoo! provides FREE . To create a free .
Graphics Concepts CS 2302, Fall /17/20142 Drawing in Android.
Designing Human Friendly Human Interaction Proofs (HIPs) Kumar Chellapilla, Kevin Larson, Patrice Simard and Mary Czerwinski Microsoft Research Presented.
Using Document Collaboration, Integration, and Charting Tools
CAPTCHA solving Tianhui Cai Period 3. CAPTCHAs Completely Automated Public Turing tests to tell Computers and Humans Apart User is human or machine? Prevents.
Chapter 3: Data Representation Chapter 3 Data Representation Page 17 Computers use bits to represent all types of data, including text, numerical values,
BLOG STARTUP. What is a blog A Blog (or weblog) is an online journal or ‘diary’ that can be immediately and easily updated. A Blog can consist of a list.
CAP Malware and Software Vulnerability Analysis Term Project Proposal - Spring 2009 Professor: Dr. Zou Team members: Andrew Mantel & Peter Matthews.
Peter Matthews, Cliff C. Zou University of Central Florida AsiaCCS 2010.
Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.
Graphics and Image Data Representations 1. Q1 How images are represented in a computer system? 2.
Spam By Dan Sterrett. Overview ► What is spam? ► Why it’s a problem ► The source of spam ► How spammers get your address ► Preventing Spam ► Possible.
Usability of CAPTCHAs Or usability issues in CAPTCHA design Authors: Jeff Yan and Ahmad Salah El Ahmad Presented By: Kim Giglia CSC /19/2008.
Online Job Applications Workshop Coordinators Sharon Feeney – Andrea Reynolds –
Graphics Basic Concepts 1.  A graphic is an image or visual representation of an object.  A visual representation such as a photo, illustration or diagram.
AN INTRODUCTION TO FACEBOOK. Learning Objectives A brief introduction to the social networking site Facebook. Instructions to create an account. How to.
Windows Vista Configuration MCTS : Internet Explorer 7.0.
Third Grade Home Directory/H-Drive The location on the server where individual users can save their work. This directory is named the same as the username.
CAPTCHA Presented by: Md.R ahim 08B21A Agenda Definition Background Motivation Applications Types of CAPTCHAs Breaking CAPTCHAs Proposed Approach.
SANDEEP MEHTA (ECE, IV Year). CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart Invented at CMU by Luis von Ahn, Manuel.
Common Methods Used to Commit Computer Crimes
How to Start This PowerPoint® Tutorial
Unit 2 Terms Word Processing.
Create your Benner - intro
How to Fix AOL Error Code 101 in an Effective Way
Computer Vision Lecture 5: Binary Image Processing
Fighting the WebBots A webbot is a program that visits web sites for all kinds of purposes. For example, Google webbots make copies of all web sites for.
How to Start This PowerPoint® Tutorial
Presented By Vibhute J.B. Class : M.Sc. (CS)
Presentation transcript:

Breaking CAPTCHA By Willer Travassos

What it is CAPTCHA? CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. It is a challenge-response test used to ensure that the response comes from a human being. It, usually, requires that users type letters and/or digits (response) from an image that appears on screen (challenge).

Why use it & Who uses it CAPTCHA are used to avoid automated actions, which might prevent QOS. Be it from source exhaustion or source abuse. It is commonly used as means to stop spamming, automated postings, and limiting excessive automated probing to a resource. Services like Gmail, Yahoo! Mail, forums, wikis, and many others use CAPTCHA to avoid at least of one the three.

Reason to break it There might be several reasons to break CAPTCHA. Usually depending on the service that a person is trying to use. Here we concentrate on the CAPTCHAs in mail services, especially Yahoo! Mail, Gmail, and Windows Live Hotmail. The four main reasons to break mail CAPTCHAS are as follows:

Reason to break it 1.Signing up to a mail service gives access to a wide array of services. 2.Usually those three companies are unlikely to be blacklisted. 3.Those three services are free to signup. 4.It is hard to keep track of those who are using the accounts, and it services for spamming, as there are millions of other users that utilize these services.

Overview of how each of the 3 CAPTCHAs were broken Hackers seem to take a similar approach in breaking CAPTCHA. It consists of a Client- Server architecture. In it, the server program sits and waits for CAPTCHA information from the client program (both programs might be located in infected machines). The client program is responsible for reading the image files containing the CAPTCHAs to be sent to the server Once the CAPTCHA is received, i.e., the CAPTCHA, the server tries to break it.

A more concrete way of attacking CAPTCHA As you can imagined I was not clear on how the server program breaks a CAPTCHA. Reports of breaking the three mail services were never clear on the process breaking CAPTCHA. The sites that contained information these programs were not clear on whether breaking CAPTCHA was fully automated. And, the hackers sites were in Russian.

A more concrete way of attacking CAPTCHA According to “Computers beat humans at single character recognition in reading- based Human Interaction Proofs”, by K. Chellapilla, K. Larson, P. Simard, M. Czerwinski, computers are really good at single character recognition. So breaking CAPTCHA becomes a matter segmenting/separating the characters in the CAPTCHA text.

MSN CAPTCHA One thing to note is that there are several flavors for text-based CAPTCHAs. Thus, a method for breaking a CAPTCHA is usually exclusive to its particular flavor. The one method shown here is the one designed for breaking the MSN CAPTCHA (before the changes made to it). The structure of MSN CAPTCHA is as follows:

MSN CAPTCHA 1.It consists of eight characters. 2.Only upper-case letters and digits are used. 3.Challenge text is of a dark blue color and the background. 4.Warping is used as distortion for characters and the CAPTCHA as a whole.

MSN CAPTCHA 5.The random dashes/arcs of different thickness and sizes are there to avoid anti-segmentation. And they can be divided into 3 categories: Thick Arcs: they have the same color as the text, and do not intersect any characters Thin Arcs: they have the same color as the text, and intersect with other characters and arcs Thin Background Arcs: they are arcs with the same color as the background, and intersect characters removing pixels of it.

Segmenting MSN CAPTCHA An attack to MSN CAPTCHA has to take into consideration the following ideas: –Identification and Removal of arcs. –Identification of character locations, and division of characters. To break a MSN CAPTCHA, an attack will follow these 7 steps :

Segmenting MSN CAPTCHA 1.Binarization 2.Broken Character Correction 3.Vertical Segmentation 4.Color filling segmentation 5.Thick Arc Removal 6.Locating Connected Characters 7.Segmentation of Connected Characters

Binarization The MSN CAPTCHA contains different tonalities of blue in the same image. Thus, we convert the image to a black and white one, to easily separate background from foreground.

Broken Character Correction Here we take care of thin background arcs that omit parts of characters. Background arcs are usually 1-2 pixels wide and they become more pronounced after Binarization. This step is necessary because we want to have characters as a whole, and avoid pieces of a characters to be treated as arcs.

Broken Character Correction Thankfully the method to restore characters is a simple one. It consists checking the immediate vertical and horizontal neighbors of a pixel of background color. If 2 pixels surrounding our pixel are of foreground color, the turn that pixel into foreground.

Broken Character Correction Results

Vertical Segmentation Then the first attempt to segment a CAPTCHA is taken here, by segmenting it vertically into chunks containing one more letters. This done by mapping a CAPTCHA into a histogram that represents the number of foreground pixels per column in the image.

Color filling segmentation Here a color filling segment algorithm is used to color each connected component/object (arc or character) with a different color. It detects a foreground pixel, and then trace all its foreground neighbors until all pixels in this object are traversed. Then it looks for another foreground pixel outside of the current object, and repeats the previous process until every object is located

Color filling segmentation While traversing each pixel of an object, the algorithm colors the pixels that it traverse a certain color. This helps further segment letters in a CAPTCHA, since colors will give away objects that could not be segmented in the Vertical Segmentation step, i.e.,

Thick Arc Removal Once CFS is done, we look into the characteristics of arcs, and how we can recognize them. General characteristics of such arcs are as follows: –Usually made up of a small number of pixels. –Do not contain circles, like chars A, B, and etc. –Usually located near the border of the image. –Shape x Location relation, ex: arcs in the beginning of the image are usually tall and short. Arcs in the end tend to be wider and short.

Thick Arc Removal One thing to note is that thick arcs never cross a character, unless a thin arc (which can cross a character) crosses the thick arc, or the Broken Character Correction joins it with a character.

Thick Arc Removal To remove a thick arc the following procedures are taken: Circle Detection: –Draw a bounding box around an object. –Use color filling to color all the background not contained in a character. –Scan the box to find pixels of the original background color, i.e., char has a circle. If there is a circle we skip all steps. If not we going into arc detection and removal.

Thick Arc Removal Scan objects that passed the first step: –We count the number of pixels in order to differentiate chars from arcs, and remove them. Relative position checking: –We look at the chunks of objects that we got from CFS and Vertical segmentation. –The positioning of the objects in these chunks can then tell us whether they are arcs or not, which is a removal criteria. Ex: chars are usually close to the equator of the image, and arcs are in the extremities.

Thick Arc Removal

Detection of the remaining arcs: –We count the number of objects left in the image. If the count number is bigger than eight, then there are still arcs left. Usually, the arc is either the last or first object. –We then check to see which object has a circle in it. If both do not contain a circle, then we remove the object with the smallest pixel count. We repeat until there are 8 objects left.

Locating Connected Characters This step tries to take care of the chars not detect in CFS, and Vertical Segmentation, by estimating how many characters are connected. We play on the design of the MSN CAPTCHA to figure out connected characters. –Objects containing two or more chars are always wide, never tall, i.e., chars are not on top of each other, always side-by-side. –A single character, on average, never surpasses 35 pixels after being normalized. –MSN CAPTCHA always uses 8 chars.

Locating Connected Characters Using this information we can guess which chunks contains chars, and how may of them.

Segmentation of Connected Characters With the previous step locating characters, and determining the n non-connected objects in the image, we can segment the leftover c connected characters (where n +c = 8) by: –Finding the width of the connected character. –Diving the object into c parts of equal sizes, thus getting 8 final chars with 90% accuracy. Using Segmentation and Recognition, MSN CAPTCHA was broken with success rate of 61%.

Microsoft’s Response Due to the news of MSN CAPTCHA being broken, Microsoft answered it with a new scheme that tries to lessen the possible published attacks.

Example of what they can do with spam s For example, when a hacker gains access to a Gmail account he has access to social networks, Google applications, and free web hosting. Ex1: A hacker may use Google Pages to redirect people to blacklisted sites, since he/she can get around numerous well-established spam filters, which do not block the GooglePages sub domain, since Google is widely white listed. Ex2: A hacker may use Orkut (Google’s Facebook) to write scrap/wall messages that redirects users to other web pages, or executes code, infecting a machine with malware.

New CAPTCHAS Due to the weaknesses of text-based CAPTCHA, other new CAPTCHA flavors are being developed in hopes of replacing the text-based version. Most of them play on the human capacity of understanding picture meaning, geometric shapes, and points within shapes and pics.

New CAPTCHAS One of the new CAPTCHA is called Kitten Auth and uses pictures of Animals to determine if an user is human. Images (all different) are on a grid, and the challenge asks to click on all animals of a certain type. If the user gets them all they pass.

New CAPTCHAS Imagination is a two-step CAPTCHA that asks the user to first, to CLICK on the geometric center of any of the pictures displayed. Once the user passes the first test, he/she is asked to ANNOTATE (recognize), through radio buttons, the object being displayed in an image.

Questions?

References Jeff Yan, Ahmad Salah El Ahmad. “A Low-cost Attack on a Microsoft CAPTCHA”. School of Computing Science, Newcastle University, UK K. Chellapilla, K. Larson, P. Simard, M. Czerwinski, “Computers beat humans at single character recognition in reading-based Human Interaction Proofs”, 2nd Conference on and Anti-Spam (CEAS), “Network Security Research and AI”. research.blogspot.com/2008/01/yahoo-captcha-is-broken.html. [Feb. 27, 2008] WebSense. “Sumeet Prasad”. [Feb. 27, 2008] Ryan Naraine, Dancho Danchev, Adam O'Donnell. “Zero Day Blog”. [Feb. 27, 2008] Spam Trackers. [Feb. 28, 2008] “13BIT IT news Blog”. [March 2, 2008] “Spybot Search & Destroy Forums”. [March 2, 2008] Sam Hocevar. “PWNTCHA”. [March 2, 2008] “Kitten Auth”. [March 3, 2008] “Three Lights Bright”. busted.html. [March 3, 2008] “IMAGINATION”. [March 2, 2008]