CAP Malware and Software Vulnerability Analysis Term Project Proposal - Spring 2009 Professor: Dr. Zou Team members: Andrew Mantel & Peter Matthews Image Tagging: An Alternative Approach to Current CAPTCHA Techniques
What is CAPTCHA? Computer Automated Turing Test for telling Computers and Humans Apart – A challenge-response test used by many websites to establish that a user is a human rather than a script or bot. Reliant on the fact that there are a number of tasks that are relatively easy for humans to perform, but difficult for computers. – For example, humans can easily read the distorted text shown below, but computer programs can not. Image Source: Source:
CAPTCHA Uses Preventing spam/worm comments in blogs & posts on message boards Preventing automated user registration Preventing abuse of online polls Preventing dictionary / brute-force password attacks Used extensively by Google, YouTube, FaceBook, Yahoo, MySpace, and almost every other high-traffic website.
Text-Based CAPTCHA In very wide usage – Fairly easy to implement – Intuitive – Usable by speakers of other languages who are familiar with the Latin character set Advancements in optical character recognition (OCR) technology have required the distortions performed upon the text to become increasingly complicated and extreme. Making the CAPTCHA images more obscure and difficult for computers to read also makes the images also more difficult for humans to read. – User tolerance only extends so far.
Examples Produced by Google CAPTCHA Source: Usability of CAPTCHAs Or “Usability Issues in CAPTCHA Design” by Jeff Yan, School of Computing Science, Newcastle University, UK. Modern day text- based CAPTCHA images are becoming very difficult for even humans to read. This can only become worse as OCR technology improves.
Examples of cracked CAPTCHA systems Image Source: A Low-cost Attack on a Microsoft CAPTCHA, Jeff Yan, Ahmad Salah El Ahmad, CCS Image Source: Yahoo MSN
Alternative CAPTCHA Tests Audio-based – Require speech recognition to be performed via the playback of a distorted audio recording. – Useful for those with visual impairments. Image-based – Require recognition of the visual information conveyed by an image. – The core of our approach.
Image Tagging Basic idea behind image tagging: o Task the user with identifying a thing portrayed in an image Examples: Image sources: Sun: Lion: Basketball:
Image Tagging Attempt to prove the following features of image tagging: 1.Image tagging is easy for a human to solve reliably. 2.Image tagging has a sufficiently large solution surface to probabilistically avoid random computer attacks. 3.Image tagging is sufficiently difficult for current computers to solve. 4.Image tagging is scalable to adjust to advancements in computer technologies.
Scene Tagging Image tagging potentially vulnerable to data mining techniques Explore scene tagging as implementable realization of image tagging Scene tagging: o Same basic idea as image tagging o Task the user with identifying a single thing within an image with multiple things
Scene Tagging Example: (modified from source:
Scene Tagging In respect to scene tagging, we will examine: 1.The same features mentioned above for image tagging. 2.Various ways to automatically generate scene tagging problems. 3.Various types of scene tagging problems.
References Jeff Yan, Ahmad Salah El Ahmad, A low-cost attack on a Microsoft captcha, Proceedings of the 15th ACM conference on Computer and communications security, October 27-31, 2008, Alexandria, Virginia, USA.. Jeff Yan, Usability of CAPTCHAs Or “Usability Issues in CAPTCHA Design”,. Network Security Research and AI: Around the CAPTCHA,.