Julia Lane, and many many coauthors
BIG DATA DEFINITION “Big Data” is an imprecise description of a rich and complicated set of characteristics, practices, techniques, ethics, and outcomes all associated with data. (AAPOR) No canonical definition By characteristics: Volume Velocity Variety (and Variability and Veracity) By source: found vs. made By use: professionals vs. citizen science By reach: datafication By paradigm: Fourth paradigm Source: Julia Lane
IMPLICATIONS FOR MEASUREMENT New business model Federal agencies no longer major players New analytical model Outliers Finegrained analysis New units of analysis New sets of skills Computer scientists Citizen scientists => Different cost structure
Source: Ian Foster, University of Chicago EXAMPLE
Source: Jason Owen Smith and UMETRICS data
ACCESS FOR RESEARCH
VALUE IN OTHER FIELDS
DATA HAVE VALUE
SO WE NEED TO GET THINGS RIGHT
VALUE IN OTHER FIELDS
What is the legal framework? What is the practical framework? What is the statistical framework? CORE QUESTIONS
LEGAL FRAMEWORK Current legal structure inadequate “The recording, aggregation,and organization of information into a form that can be used for data mining, here dubbed ‘datafication’, has distinct privacy implications that often go unrecognized by current law (Strandburg) Assessment of harm from privacy inadequate Privacy and big data are incompatible Anonymity not possible Informed consent not possible Source: Julia Lane
BAROCAS AND NISSENBAUM
INFORMED CONSENT (NISSENBAUM)
STATISTICAL FRAMEWORK Importance of valid inference The role of statisticians/access Inadequate current statistical disclosure limitation Diminished role of federal statistical agencies Limitations of survey New analytical framework : Mathematically rigorous theory of privacy Measurement of privacy loss Differential privacy
PRACTICAL FRAMEWORK
SOME SUGGESTIONS
AND A REMINDER OF WHY