Download presentation
Presentation is loading. Please wait.
Published byEdmund Singleton Modified over 9 years ago
1
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Henry S. Baird Terry Riopka Jon Bentley (Avaya Labs) Michael A. Moll Sui-Yu Wang Protecting eCommerce from Robots Impersonating Human Users
2
Pattern Recognition Research Lab D. Lopresti & H. S. Baird A Pitfall of the World Wide Web © Peter Steiner, The New Yorker, July 5, 1993, p. 61 (Vol.69, No. 20)
3
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Straws in the wind… Mid 90’s: spammers trolling for email addresses in defense, people start disguising them, e.g. “ baird AT cse DOT lehigh DOT edu ” 1997: abuse of ‘Add-URL’ feature at AltaVista some write programs to add their URL many times to skew search rankings in their favor Andrei Broder et al (then at DEC SRC) a user action which is legitimate when performed once becomes abusive when repeated many times no effective legal recourse how to block or slow down these programs …
4
Pattern Recognition Research Lab D. Lopresti & H. S. Baird The first known instance… Altavista’s AddURL filter 1999: “ransom note filter” randomly pick letters, fonts, rotations – render as an image every user is required to read and type it in correctly reduced “spam add_URL” by “over 95%” Weaknesses: isolated chars, filterable noise, affine deformations M. D. Lillibridge, M. Abadi, K. Bharat, & A. Z. Broder, “Method for Selectively Restricting Access to Computer Systems,” U.S. Patent No. 6,195,698, Filed April 13, 1998, Issued February 27, 2001. An image of text, not ASCII
5
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Alan Turing (1912-1954) 1936 a universal model of computation 1940s helped break Enigma (U-boat) cipher 1949 first serious uses of a working computer including plans to read printed text (he expected it would be easy) 1950 proposed a test for machine intelligence
6
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Turing’s Test for AI How to judge that a machine can ‘think’: play an ‘imitation game’ conducted via teletypes a human judge & two invisible interlocutors: a human a machine `pretending’ to be human after asking any questions (challenges) he/she wishes, the judge decides which is human failure to decide correctly would be convincing evidence of machine intelligence Modern GUIs invite richer challenges than teletypes…. A. Turing, “Computing Machinery & Intelligence,” Mind, Vol. 59(236), 1950.
7
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Completely Automated Public Turing Tests to Tell Computers & Humans Apart challenges can be generated & graded automatically (i.e. the judge is a machine) accepts virtually all humans, quickly & easily rejects virtually all machines resists automatic attack for many years (even assuming that its algorithms are known?) NOTE: machines administer, but cannot pass the test! L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw, Poland, May 4-8, 2003. “CAPTCHAs”
8
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Some Typical CAPTCHAs Microsoft eBay/PayPal Yahoo! PARC’s PessimalPrint
9
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Cropping up everywhere… Used to defend against: skewing search-engine rankings (Altavista, 1999) infesting chat rooms, etc (Yahoo!, 2000) gaming financial accounts (PayPal, 2001) robot spamming (MailBlocks, SpamArrest 2002) In the last two years: Overture, Chinese website, HotMail, CD-rebate, TicketMaster, MailFrontier, Qurb, Madonnarama, Gay.com, … … how many have you seen? On the horizon: ballot stuffing, password guessing, denial-of-service attacks `blunt force’ attacks (e.g. UT Austin break-in, Mar ’03) …many others D. P. Baron, “eBay and Database Protection,” Case No. P-33, Case Writing Office, Stanford Graduate School of Business, Stanford Univ., 2001.
10
Pattern Recognition Research Lab D. Lopresti & H. S. Baird The Limitations of Image Understanding Technology There remains a large gap in ability between human and machine vision systems, even when reading printed text Performance of OCR machines has been systematically studied: 7 year olds can consistently do better! This ability gap has been mapped quantitatively S. Rice, G. Nagy, T. Nartker, OCR: An Illustrated Guide to the Frontier, Kluwer Academic Publishers: 1999.
11
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Image Degradation Modeling Effects of printing & imaging: We can generate challenging images pseudorandomly H. Baird, “Document Image Defect Models,” in H. Baird, H. Bunke, & K. Yamamoto (Eds.), Structured Document Image Analysis, Springer-Verlag: New York, 1992. blur thrs sens thrs x blur
12
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Machine Accuracy is Often a Nearly Monotonic Function of Parameters T. K. Ho & H. S. Baird, “Large Scale Simulation Studies in Image Pattern Recognition,” IEEE Trans. on PAMI, Vol. 19, No. 10, p. 1067-1079, October 1997.
13
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Can You Read These Degraded Images? Of course you can …. but OCR machines cannot!
14
Pattern Recognition Research Lab D. Lopresti & H. S. Baird The PessimalPrint CAPTCHA Three OCR machines fail when: OCR outputs – blur = 0.0 & threshold 0.02 - 0.08 – threshold = 0.02 & any value of blur ~~~.I~~~ ~~i1~~ N/A ~~I~~ A. Coates, H. Baird, R. Fateman, “Pessimal Print: A Reverse Turing Test,” Proc. 6th IAPR Int’l Conf. On Doc. Anal. & Recogn. (ICDAR’01), Seattle, WA, Sep 10-13, 2001. … but people find all these easy to read
15
Pattern Recognition Research Lab D. Lopresti & H. S. Baird 1st Int’l Workshop on Human Interactive Proofs PARC, Palo Alto, CA, January 9-11, 2002
16
Pattern Recognition Research Lab D. Lopresti & H. S. Baird 2nd Int’l Workshop on Human Interactive Proofs PARC, Palo Alto, CA, January 9-11, 2002 Lehigh University, Bethlehem, PA – May 19-20, 2005
17
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Variations & Generalizations CAPTCHA Completely Automatic Public Turing test to tell Computers and Humans Apart HUMANOID Text-based dialogue which an individual can use to authenticate that he/she is himself/herself (‘naked in a glass bubble’) PHONOID Individual authentication using spoken language Human Interactive Proof (HIP) An automatically administered challenge/response protocol allowing a person to authenticate him/herself as belonging to a certain group over a network without the burden of passwords, biometrics, mechanical aids, or special training.
18
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Weaknesses of Existing CAPTCHAs English lexicon is too predictable: dictionaries are too small only 1.2 bits of entropy per character (cf. Shannon) Physics-based image degradations vulnerable to well-studied image restoration attacks, e.g. Complex images irritate people even when they can read them need user-tolerance experiments
19
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Human Readers Literature on the psychophysics of reading is helpful: many kinds of familiarity helps, not just English words optimal word-image size is known: 0.3-2 degrees subtended angle optimal contrast conditions known other factors measured for the best performance: to achieve and sustain “critical reading speed” BUT gives no answer to: where’s the optimal comfort zone? G. E. Legge, D. G. Pelli, G. S. Rubin, & M. M. Schleske, “Psychophysics of Reading: I. normal vision,” Vision Research 25(2), 1985. J. Grainger & J. Segui, “Neighborhood Frequency Effects in Visual Word Recognition,’ Perception & Psychophysics 47, 1990.
20
Pattern Recognition Research Lab D. Lopresti & H. S. Baird The BaffleText CAPTCHA Nonsense words generate ‘pronounceable’ – not ‘spellable’ – words using a variable-length character n-gram Markov model they look familiar, but aren’t in any lexicon, e.g. ablithan wouquire quasis Gestalt perception force inference of a whole word-image from fragmentary or occluded characters, e.g. using a single familiar typeface also helps M. Chew & H. S. Baird, “BaffleText: A Human Interactive Proof,” Proc., SPIE/IS&T Conf. on Document Recognition & Retrieval X, Santa Clara, CA, January 23-24, 2003.
21
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Mask Degradations Parameters of pseudorandom mask generator: shape type: square, circle, ellipse, mixed density: black-area / whole-area range of radii of shapes
22
Pattern Recognition Research Lab D. Lopresti & H. S. Baird User Acceptance % Subjects willing to solve a BaffleText… 17% every time they send email 39% … if it cut spam by 10x 89% every time they register for an e-commerce site 94% … if it led to more trustworthy recommendations 100% every time they register for an email account Out of 18 responses to the exit survey.
23
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Many Are Vulnerable to Character-Segmentation Attack Effective strategy of attack: Segment image into characters Apply aggressive OCR to isolated chars If it’s known (or guessed) that the word is ‘spellable’ (e.g. legal English), use the lexicon to constrain interpretations Patrice Simard (MS Research) reports that this breaks many widely used CAPTCHAs
24
Pattern Recognition Research Lab D. Lopresti & H. S. Baird So, try to generate word-images that will be hard to segment into characters Slice characters up: -vertical cuts; then -horizontal cuts Set size of cuts to constant within a word Choose positions of cuts randomly Force pieces to drift apart: ‘scatter’ horiz. & vert. Change intercharacter space
25
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Character fragments can interpenetrate Not only is it hard to segment the word into characters, …. … it can be hard to recombine characters’ fragments into characters
26
Pattern Recognition Research Lab D. Lopresti & H. S. Baird How Well Can People Read These? We carried out a human legibility trial with the help of ~60 volunteers: students, faculty, & staff at Lehigh Univ. plus colleagues at Avaya Labs Research
27
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Subjects were told they got it right/wrong – after they rated its ‘difficulty’
28
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Subjective difficulty ratings are correlated with illegibility Right: Wrong : 1 Easy 2 3 4 5 Impossible
29
Pattern Recognition Research Lab D. Lopresti & H. S. Baird People Rated These “Easy’ (1/5) aferatic memmari heiwho nampaign
30
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Rated “Medium Hard” (3/5) overch / ovorch wouwould atlager / adager weland / wejund
31
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Rated “Impossible” (5/5) acchown / echaeva gualing / gealthas bothere / beadave caquired / engaberse
32
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Why is ScatterType legible at all? Should it surprise you that this is legible…? We speculate that we can read it because: human readers exploit typeface consistency cues … evidence remains in small details of local shape this ability seems largely unconscious
33
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Mean Horizontal Scatter vs Mean Vertical Scatter Mirage: data analysis tool, Tin Kam Ho, Bell Labs. Right: Wrong : 1 Easy 2 3 4 5 Impossible
34
Pattern Recognition Research Lab D. Lopresti & H. S. Baird The Arms Race When will serious technical attacks be launched? ‘spam kings’ make $$ millions two spam-blocking firms rely on CAPTCHAs How long can a CAPTCHA withstand attack? especially if its algorithms are published or guessed Strategy: keep a pipeline of defenses in reserve: continuing partnership between R&D & users
35
Pattern Recognition Research Lab D. Lopresti & H. S. Baird The 2nd HIP Workshop May 2005 -- Lehigh University, Bethlehem, PA Advisory Board: Manuel Blum, CMU Doug Tygar, UCB CS/SIMS Patrice Simard, Microsoft Research Gordon Legge, Univ. Minnesota Organizers: Henry Baird, Dan Lopresti
36
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Lots of Open Research Questions What are the most intractable obstacles to machine vision? segmentation, occlusion, degradations, …? Under what conditions is human reading most robust? linguistic & semantic context, Gestalt, style consistency…? Where are ‘ability gaps’ located? quantitatively, not just qualitatively How to generate challenges strictly within ability gaps? fully automatically an indefinitely long sequence of distinct challenges
37
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Disguised CAPTCHAs Note that many normal navigation aids are CAPTCHAs (though not designed for that purpose)
38
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Implicit CAPTCHAs We are investigating design principles for “implicit CAPTCHAs” that relieve these drawbacks: Challenges disguised as necessary browsing links Challenges that can be answered with a single click while still providing several bits of confidence Challenges that can be answered only through experience of the context of the particular website weave CAPTCHAs into a multi-page “story” can’t be extracted and “farmed-out” to people Challenges that are so easy that failure indicates a failed robot attack
39
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Alan Turing might have enjoyed the irony … A technical problem – machine reading – which he thought would be easy, has resisted attack for 50 years, and now allows the first widespread practical use of variants of his test for artificial intelligence.
40
Pattern Recognition Research Lab D. Lopresti & H. S. Baird Contact Henry S. Baird baird@cse.lehigh.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.