Guess again (and again and again): Measuring password strength by simulating password-cracking algorithms by Patrick Gage Kelley, Saranga Komanduri, Michelle.

Slides:

Advertisements

Similar presentations

Critical Reading Strategies: Overview of Research Process

Advertisements

An analysis of Social Network-based Sybil defenses Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University.

The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group ( University.

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li †, Lujo Bauer* *Carnegie Mellon.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Beyond Null Hypothesis Testing Supplementary Statistical Techniques.

Guess again (and again and again): Measuring password strength by simulating password-cracking algorithms Patrick Gage Kelley, Saranga Komanduri, Michelle.

Matt Weir, Sudhir Aggarwal, Michael Collins, Henry Stern Presented by Erik Archambault.

Active Learning and Collaborative Filtering

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.

A GOAL-BASED FRAMEWORK FOR SOFTWARE MEASUREMENT

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.

8-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft.

Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.

A Multiresolution Symbolic Representation of Time Series

Guidelines to Publishing in IO Journals: A US perspective Lois Tetrick, Editor Journal of Occupational Health Psychology.

ENTROPY OF FINGERPRINT SENSORS. Do different fingerprint sensors affect the entropy of a fingerprint? RESEARCH QUESTION/HYPOTHESIS.

Chapter 2: The Research Enterprise in Psychology

Chapter 13: Inference in Regression

Hypothesis Testing in Linear Regression Analysis

Dr. Engr. Sami ur Rahman Assistant Professor Department of Computer Science University of Malakand Research Methods in Computer Science Lecture: Research.

Results Following Signal Detection Theory, Accuracy is calculated as the difference between Real and Foil claim rates, and Bias is the mean of the two.

Chapter 3 An Overview of Quantitative Research

GRAPHICAL PASSWORD AUTHENTICATION PRESENTED BY SUDEEP KUMAR PATRA REGD NO Under the guidance of Mrs. Chinmayee Behera.

On the Security of Picture Gesture Authentication Ziming Zhao †‡, Gail-Joon Ahn †‡, Jeong-Jin Seo †, Hongxin Hu § † Arizona State University ‡ GFS Technology.

by B. Zadrozny and C. Elkan

Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.

Program Evaluation. Program evaluation Methodological techniques of the social sciences social policy public welfare administration.

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.

Confidence Interval Estimation

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Session 7 LBSC 690 Information Technology Security.

Password security Dr.Patrick A.H. Bours. 2 Password: Kinds of passwords Password A string of characters: PIN-code A string.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Internet Safety. Phishing, Trojans, Spyware, Trolls, and Flame Wars—oh my! If the idea of these threats lurking around online makes you nervous, then.

User Friendly Passwords Nicole Longworth Michael Shoppell RJ Brown.

Securing Passwords Against Dictionary Attacks Presented By Chad Frommeyer.

Maintaining a Cache of Previously Queried Prefixes “Telepathwords: Preventing weak passwords by reading users’ minds.” Saranga Komanduri, Richard Shay,

Chap 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers Using Microsoft Excel 7 th Edition, Global Edition Copyright ©2014 Pearson Education.

Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15

Research Design Week 6 Part February 2011 PPAL 6200.

Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.

Measuring Real-World Accuracies and Biases in Modeling Password Guessability Segreti. et al. Usenix Security 2015.

 Patrick Gage Kelley, et al. “Guess again (and again and again): [...].” In 2012 IEEE Symposium on Security and Privacy (SP), pp IEEE, 2012.

WERST – Methodology Group

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

IST_Seminar II CHAPTER 12 Instructional Methods. Objectives: Students will: Explain the role of all teachers in the development of critical thinking skills.

1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.

Guess again (and again and again) Measuring password strength by simulating password-cracking algorithms Patrick Gage Kelley, Saranga Komanduri, Michelle.

1 CS 430: Information Discovery Lecture 8 Automatic Term Extraction and Weighting.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Presentation for CDA6938 Network Security, Spring 2006 Timing Analysis of Keystrokes and Timing Attacks on SSH Authors: Dawn Xiaodong Song, David Wagner,

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor ： Dr. Hsu Graduate.

Jeremiah Blocki, Saranga Komanduri, Lorrie Cranor, Anupam Datta Presented by Lihua Ren.

By Jyh-haw Yeh Department of Computer Science Boise State University.

Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.

Approaches to Intrusion Detection statistical anomaly detection – threshold – profile based rule-based detection – anomaly – penetration identification.

HOW CAN ATTACKERS READ YOUR MIND? Telepathwords: Preventing Weak Passwords By Reading Users’ Minds Saranga, K., Richard, S., lorrie, F.C., Cormac, H. and.

1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:

Department of Computer Science Chapter 5 Introduction to Cryptography Semester 1.

DATA COLLECTION METHODS IN NURSING RESEARCH

Investigation of Instructions for Password Generation

Understanding and Exploiting Amazon EC2 Spot Instances

Presentation transcript:

Guess again (and again and again): Measuring password strength by simulating password-cracking algorithms by Patrick Gage Kelley, Saranga Komanduri, Michelle L. Mazurek, Richard Shay, Timothy Vidas, Lujo Bauer, Nicolas Christin, Lorrie Faith Cranor, and Julio L´opez Carnegie Mellon University Pittsburgh, PA, USA {fpgage, sarangak, CS558 Lecture on Passwords II presenter : Eirini Aikaterini Degleri, 2735

Table of contents 1. Introduction 2. Data collection methodology 3. Data analysis methodology 4. Findings 5. Conclusion Computer Science Department, Passwords2

Introduction – Background and Related Work Summarize the different types of data collection and analysis that have been used. Discuss evaluations of the impact of password policies and metrics for quantifying password strength. Computer Science Department, Passwords3

Introduction Computer Science Department, Passwords4  Text-based passwords  most commonly used authentication method in computer systems  suffer from many security issues.  We need  research defining metrics to characterize password strength evaluating password-composition policies using these metrics  an efficient method for calculating how effectively several heuristic password-guessing algorithms guess passwords.

This paper Computer Science Department, Passwords5  Investigates a) the resistance of passwords created under different conditions to guessing b) the performance of guessing algorithms under different training sets c) the relationship between passwords explicitly created under a given composition policy and other passwords that happen to meet the same requirements d) the relationship between guessability, as measured with password-cracking algorithms, and entropy estimates.

Origin of dataset Computer Science Department, Passwords6  Online survey  Participants using Amazon’s Mechanical Turk crowdsourcing service (MTurk).  which produces more diverse samples than typical lab studies  Participants created real passwords that were needed several days later to complete the study.  These passwords did not protect high-value accounts.

Impact of password policies Computer Science Department, Passwords7  Passwords created under stricter composition requirements were  more resistant to automated cracking  more difficult for participants to create and remember Too-strict policies, induce coping strategies that can hurt both security and productivity

Measuring password strength Computer Science Department, Passwords8  One possible metric is information entropy,  expected value (in bits) of the information contained in a string  a lower bound on the expected number of guesses to find a text  An alternative metric of password strength is “guessability,” which characterizes the time needed for an efficient password-cracking algorithm to discover a password.

Password-cracking techniques Computer Science Department, Passwords9  Password guesses are made based on contextual frequency of characters (based on Markov model)  Uses the text structure from training data while applying mangling rules to the text itself (Weir et al.) This work :  generates blacklists restricting password creation in some study conditions,  implements a new measure of password strength, the guess number.  by applying the Weir algorithm and a variation of the Markov model

Data collection methodology methodology for collecting plaintext passwords, the word lists used to assemble the blacklists used in some conditions, the eight conditions under which authors gathered data. Computer Science Department, Passwords10

Collection instrument Computer Science Department, Passwords11  Each participant was given a scenario for making a new password and asked to create a password that met a set of password-composition requirements.  The survey scenario  users create low-value passwords  It’s use was to link their survey responses  The scenario  elicit higher-value passwords  provider was under attack, so users had to change their passwords, under a new policy

Word lists for algorithm training Computer Science Department, Passwords12  Use of word lists as training data in their analysis and to assemble the blacklists used in some of their experimental conditions.  Words  varied grammatical forms such as plurals and past tense.  applied on most Unix Systems  containing standard and mangled versions of dictionary words and common passwords  RockYou, MySpace, The inflection list1, The simple dictionary, Two cracking dictionaries from the Openwall Project2

Conditions I Computer Science Department, Passwords13  basic8survey: Password must have at least 8 characters.  Only this condition uses the survey scenario.  basic8: Password must have at least 8 characters.  basic16: Password must have at least 16 characters.  dictionary8: Password must have at least 8 characters. It may not contain a dictionary word.  Authors removed non-alphabetic characters and checked the remainder against a dictionary, ignoring case.  comprehensive8: Password must have at least 8 characters including an uppercase and lowercase letter, a symbol, and a digit. It may not contain a dictionary word.”  same dictionary check as in dictionary8.

Conditions II Computer Science Department, Passwords14  blacklistEasy: Password must have at least 8 characters. It may not contain a dictionary word.  checked the password against the simple Unix dictionary, ignoring case. Unlike the dictionary8 and comprehensive8 conditions, the password was not stripped of non-alphabetic characters before the check.  blacklistMedium:  Same as the blacklistEasy condition, except they used the paid Openwall list.  blacklistHard:  Same as the blacklistEasy condition, except they used a five-billion- word dictionary created using the algorithm outlined by Weir et al. Both training and testing were conducted case-insensitively, increasing the strength of the blacklist.

Data analysis methodology how authors analyzed collected password data measuring how resistant passwords are to cracking Computer Science Department, Passwords15

Guess-number calculators Computer Science Department, Passwords16  A calculator function maps a password to the number of guesses required to guess that password  output value the guess number of the password.  A guess-number calculator must be implemented for each cracking algorithm under consideration of its implementation.

Authors’ steps Computer Science Department, Passwords17 1. Used a guessing algorithm’s calculator to look up the associated guess number for each password 2. Computed the percentage of passwords that would be cracked by a given algorithm 3. Computed the percentage that would be cracked with a given number of guesses. 4. Used calculators to compare the performance of different cracking algorithms, and different training-set tunings within each algorithm. 5. Combined guess-number results across a variety of algorithms and training sets, to develop a general picture of the overall strength of a set of passwords. 6. Implemented two guess-number calculators:  one for a brute-force algorithm loosely based on the Markov model  one for the heuristic algorithm proposed by Weir et al.

B rute F orce M arkov calculator Computer Science Department, Passwords18  The BFM calculator determines guess numbers for a brute-force cracking algorithm loosely based on Markov chains.  The BFM algorithm uses the training set to  calculate the frequency of first characters and of digrams within the password body  uses these frequencies to deterministically construct guessing orders of unknown passwords.

Weir algorithm calculator Computer Science Department, Passwords19  The Weir algorithm uses the following definitions:  structures are patterns of character types such as letters, digits, and symbols  a terminal is one instantiation of a structure  a probability group is a set of terminals with the same probability of occurring.

Entropy Computer Science Department, Passwords20  To investigate how well entropy estimates correlate with guess resistance, they compared guess-number results for each condition.  First, they apply NIST guidelines  each password-composition rule contributes a specific amount of entropy  the entropy of the policy is the sum of the entropy contributed by each rule.  Second approximation is calculated empirically from the plaintext passwords in their dataset.  calculate for each password condition the entropy contributed by the number, content, and type of each character  sum the individual entropy contributions to estimate the total entropy of the passwords in that condition.

Findings calculated guess numbers under 32 different combinations of algorithm and training data distill from them four major findings with application both to selecting password policies and to conducting password research Computer Science Department, Passwords21

Major findings Computer Science Department, Passwords22 1. Basic16 provides the greatest security against a powerful attacker, outperforming the more complicated comprehensive8 condition. 2. Access to abundant, closely matched training data  important for successfully cracking passwords from stronger composition policies. 3. Adding more and better training data  no benefit against passwords from weaker conditions,  significant boost against stronger ones. 4. Passwords created under a specific composition policy do not have the same guess resistance as passwords selected from a different group that happen to meet the rules of that policy. 5. Entropy only provides a very rough approximation of overall password strength

Comparing policies for guessability Computer Science Department, Passwords23  Two experiments evaluate the guessability of all seven password-composition policies, but against differently trained guessing algorithms.  Experiment P4 simulates  an attacker with access to a broad variety of publicly available data for training.  Experiment E simulates  a powerful attacker with extraordinary insight into the password sets under consideration.

Experiment P4 (P for trained with public data) It consists of a Weir-algorithm calculator trained on all the public word lists used and tested on 1000 passwords from each condition. Computer Science Department, Passwords24

Experiment E (E for trained with everything) It consists of a Weir-algorithm calculator trained with all the public data used in P4 plus 500 passwords from each of our eight conditions. Computer Science Department, Passwords25

Effects of training-data selection Computer Science Department, Passwords26  Examine the effect of varying the amount and source of training data on both total cracking success and on cracking efficiency. the choice of training data affects different password- policy conditions differently; abundant, closely matched training data is critical when cracking passwords from harder-to-guess conditions, but less so when cracking passwords from easier ones

Effects of test-data selection Computer Science Department, Passwords27 Problem :  Researchers typically don’t have access to passwords created under the password-composition policy they want to study.  Are subsets like these are representative of passwords actually created under a specific policy ?  No, and may in fact contain passwords that are more difficult to guess than passwords created under the policy in question.

Guessability and entropy Computer Science Department, Passwords28  It remains unclear how well entropy reflects the guess resistance of a password set.  Information entropy provides a theoretical lower bound on the guessability of a set of passwords however, in practice a system administrator may be more concerned about how many passwords can be cracked in a given number of guesses than about the average guessability across the population.

Experiment E Relationship among the resistance of our collected password sets to heuristic cracking; emprical entropy estimates calculated from those sets; and NIST entropy estimates for all password conditions Computer Science Department, Passwords29

Conclusion Computer Science Department, Passwords30

Key points of this paper, results Computer Science Department, Passwords31  A technique for evaluating password strength that can be implemented for a variety of password-guessing algorithms and tuned using a variety of training sets to gain insight into the comparative guess resistance of different sets of passwords.  Basic16 is superior for large numbers of guesses. Combined with a prior result that basic16 is also easier for users, this suggests that basic16 is the better policy choice.  The effectiveness of a dictionary check depends heavily on the choice of dictionary.  Effective attacks on passwords created under complex or rare-in-practice composition policies require access to abundant, closely matched training data.  Entropy, provides only a rough correlation with guess resistance and is unable to correctly predict quantitative differences in guessability among password sets.

End of Part II Thank you very much Computer Science Department, Passwords33