August 6, 2007Electronic Voting Technology 2007 On Estimating the Size and Confidence of a Statistical Audit Javed A. Aslam College of Computer and Information.

Slides:



Advertisements
Similar presentations
Making Sure Every Vote Counts in the Digital Era: The Need for Standards Mandating Voter-Verified Paper Ballots Sarah Rovito 2007 WISE Intern August 3,
Advertisements

Electronic Voting: Danger and Opportunity J. Alex Halderman Department of Computer Science Center for Information Technology Policy Princeton University.
Discrete (Categorical) Data Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
1. Estimation ESTIMATION.
Sample size computations Petter Mostad
Basic Elements of Testing Hypothesis Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College.
Evaluating Hypotheses
8-3 Testing a Claim about a Proportion
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Reliability in Embedded Software Joseph Lucas. Requirements Real time/reactive operation Real time/reactive operation Small size, low weight Small size,
Sample size calculations
Quantitative Business Methods for Decision Making Estimation and Testing of Hypotheses.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Hypothesis Testing.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
8.1 Inference for a Single Proportion
Audit Purpose of Audit Quality assurance procedure Check accuracy of machine tally of ballots Ballots for a contest are sampled, manually verified, and.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
PARAMETRIC STATISTICAL INFERENCE
California Secretary of State Voting Systems Testing Summit November 28 & 29, 2005, Sacramento, California Remarks by Kim Alexander, President, California.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 8-3 Testing a Claim About a Proportion.
CHAPTER 17: Tests of Significance: The Basics
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Section 10.1 Confidence Intervals
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
Statistics 101 Chapter 10 Section 2. How to run a significance test Step 1: Identify the population of interest and the parameter you want to draw conclusions.
Tests of Significance: The Basics BPS chapter 15 © 2006 W.H. Freeman and Company.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
1 Estimation of Population Mean Dr. T. T. Kachwala.
Electronic Voting: Danger and Opportunity
VVPAT Building Confidence in U.S. Elections. WHAT IS VVPAT ? Voter-verifiable paper audit trail Requires the voting system to print a paper ballot containing.
URBDP 591 A Lecture 16: Research Validity and Replication Objectives Guidelines for Writing Final Paper Statistical Conclusion Validity Montecarlo Simulation/Randomization.
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 8: Tests of significance and confidence.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Ronald L. Rivest MIT NASEM Future of Voting Meeting June 12, 2017
Confidence Intervals for Proportions
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Chapter 4. Inference about Process Quality
Unit 5: Hypothesis Testing
P-values.
Chapter 9: Inferences Involving One Population
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
1. Estimation ESTIMATION.
Audit Thoughts Ronald L. Rivest MIT CSAIL Audit Working Meeting
Chapter 8: Inference for Proportions
Ronald L. Rivest MIT NASEM Future of Voting December 7, 2017
Chapter 9 Hypothesis Testing.
ISI Day – 20th Anniversary
Statistical Inference
Four-Cut: An Approximate Sampling Procedure for Election Audits
Significance Tests: The Basics
Significance Tests: The Basics
Type I and Type II Errors
Presentation transcript:

August 6, 2007Electronic Voting Technology 2007 On Estimating the Size and Confidence of a Statistical Audit Javed A. Aslam College of Computer and Information Science Northeastern University Raluca A. Popa and Ronald L. Rivest Computer Science and Artificial Intelligence Laboratory M.I.T.

August 6, 2007Electronic Voting Technology Outline Motivation Background How Do We Audit? The Problem Analysis Model Sample Size Bounds Conclusions

August 6, 2007Electronic Voting Technology Motivation There have been cases of electoral fraud (Gumbel’s Steal This Vote, Nation Books, 2005) Would like to ensure confidence in elections Auditing = comparing statistical sample of paper ballots to electronic tally Provides confidence in a software independent manner

August 6, 2007Electronic Voting Technology How Do We Audit? Proposed Legislation: Holt Bill (2007) Voter-verified paper ballots Manual auditing Granularity: Machine, Precinct, County Procedure Determine u, # precincts to audit, from margin of victory Sample u precincts randomly Compare hand count of paper ballots to electronic tally in sampled precincts If all are sufficiently close, declare electronic result final If any are significantly different, investigate!

August 6, 2007Electronic Voting Technology How Do We Audit? Proposed Legislation: Holt Bill (2007) Voter-verified paper ballots Manual auditing Granularity: Machine, Precinct, County Procedure Determine u, # precincts to audit, from margin of victory Sample u precincts randomly Compare hand count of paper ballots to electronic tally in sampled precincts Our formulas are independent of the auditing procedure

August 6, 2007Electronic Voting Technology The Problem How many precincts should one audit to ensure high confidence in an election result?

August 6, 2007Electronic Voting Technology Previous Work Saltman (1975): The first to study auditing by sampling without replacement Dopp and Stenger (2006): Choosing appropriate audit sizes Alvarez et al. (2005): Study of real case auditing of punch-card machines

August 6, 2007Electronic Voting Technology Hypothesis Testing Null hypothesis: The reported election outcome is incorrect (electronic tally indicates different winner than paper ballots) Want to reject the null hypothesis Need to sample enough precincts to ensure that, if no fraud is detected, the election outcome is correct with high confidence

August 6, 2007Electronic Voting Technology Model n precincts b corrupted (“bad”) Sample u precincts (without replacement) c = desired confidence Want: If there are ≥ b corrupted precincts, then sample contains at least one with probability ≥ c Equivalently: If the sample contains no corrupted precincts, then the election outcome is correct with probability ≥ c Typical values: n = 400, b = 50, c = 95%

August 6, 2007Electronic Voting Technology Minimum # of precincts adversary must corrupt to change election outcome Derived from margin of victory Our formulas are independent of b’s calculation b = (half margin of victory) · n What is b? margin [times 5 (Dopp and Stenger, 2006)]

August 6, 2007Electronic Voting Technology Rule of Three If we draw a sample of size ≥ 3n/b with replacement, then: Expect to see at least three corrupted precincts Will see at least one corrupted precinct with c ≥ 95% In practice, we sample without replacement (no repeated precincts)

August 6, 2007Electronic Voting Technology Sample Size Probability that no corrupted precinct is detected: Optimal Sample Size: Minimum u such that Pr ≤ 1- c Problem: Need a computer Goal: Derive a simple and accurate upper bound that an election official can compute on a hand-held calculator Pr = ﴾ ﴿ / ﴾ ﴿ n-b u nunu

August 6, 2007Electronic Voting Technology Our Bounds Intuition: How many different precincts are sampled by the Rule of Three? Our without replacement upper bounds: ACCURACYACCURACY

August 6, 2007Electronic Voting Technology Our Bounds Intuition: How many different precincts are sampled by the Rule of Three? Our without replacement upper bounds: Example: n = 400, b = 50 (margin=5%), c = 95%

August 6, 2007Electronic Voting Technology Conservative: provably an upper bound Accurate: For n ≤10,000, b ≤ n/2, c ≤ 0.99 (steps of 0.01): 99% is exact, 1% overestimates by 1 precinct Analytically, it overestimates by at most –ln(1-c)/2, e.g. three precincts for c < Can be computed on a hand-held calculator Our Bound

August 6, 2007Electronic Voting Technology Observations Margin of Victory 10% 20% 1% Precincts to Audit 1% Fixed level of auditing is not appropriate n = 400, c=95%

August 6, 2007Electronic Voting Technology Observations (cont’d) Margin of Victory 10% 20% 1% Precincts to Audit n = 400, c=65% 2% Holt Bill (2007): Tiered auditing Holt Tier

August 6, 2007Electronic Voting Technology Related Problems Inverse questions Estimate confidence level c from u, b, and n Estimate detectable fraud level b from u, c, and n Auditing with constraints Holt Bill (2007): Audit at least one precinct in each county Future work Handling precincts of variable sizes (Stanislevic, 2006)

August 6, 2007Electronic Voting Technology We develop a formula for the sample size: that is: Conservative (an upper bound) Accurate Simple, easy to compute on a pocket calculator Applicable to different other settings Conclusions

August 6, 2007Electronic Voting Technology Thank you! Questions?