CAPTCHA solving Tianhui Cai Period 3
CAPTCHAs Completely Automated Public Turing tests to tell Computers and Humans Apart Determines whether a user is a human or a computer to prevent spam, etc Found on lots of website registration pages Audio and visual Visual – contains noise, distortions rotation translation scaling noise warp
Goal Solve a CAPTCHA, pretend to be a human Read the image – figure out what it says This has been done before. Show weaknesses of visual CAPTCHAs
Procedure? Acquire image (from internet) Remove background clutter Segmentation (separating letters) Letter identification
Implementation JAVA Acquire images – captchas.net formula to get actual text from image Remove background clutter – median filter, etc Segmentation – flood fill Letter identification – neural network
First quarter progress summary Three layer backpropagation neural network written and tested It works Neural network – good for classification. Used often for image recognition Consists of artificial neurons, which convert input to output Backpropagation is used to let the neural network learn Training Testing
Second quarter process Image processing Noise removal Segmentation
Noise removal Modified median filter Advantages: unlike Gaussian blur, it doesn't lose edge data Disadvantages: It compromises edge integrity and noise
Segmentation Flood fill Advantages: It's easy and often used Disadvantages: letters may be stuck together in some cases and broken up in others
Third quarter goals Neural network – make it able to be saved so that it can be trained Feed inputs from flood fill into neural net for training, then test neural net and run