“Mortgages, Privacy, and Deidentified Data” Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau Conference on “New Research on Sustainable Mortgages & Access to Credit” October 6, 2011
Overview Federal experience to date with deidentification (“DeID”) Federal experience to date with deidentification (“DeID”) Why DeID technically harder over time Why DeID technically harder over time Technical & administrative measures to protect identity Technical & administrative measures to protect identity Court records: public records and privacy Court records: public records and privacy Conclusion: Technology alone often cannot succeed, so the choice becomes make public, keep private, or create effective data use agreements Conclusion: Technology alone often cannot succeed, so the choice becomes make public, keep private, or create effective data use agreements
Federal DeID to Date 2000 HIPAA rule Recognized reidentification (“ReID”) is possible Recognized reidentification (“ReID”) is possible Can scrub 18 data fields; or expert testifies have “very small” risk of ReID Can scrub 18 data fields; or expert testifies have “very small” risk of ReID Current HHS study in progress on DeID – similar issues to financial data Current HHS study in progress on DeID – similar issues to financial data Data.gov Administration push for transparency Administration push for transparency Privacy & DeID more challenging than many had hoped Privacy & DeID more challenging than many had hoped Census data History of census data sensitivity, required data collection History of census data sensitivity, required data collection Suppress small cell size; technical limits on researchers’ access Suppress small cell size; technical limits on researchers’ access
Why DeID is Harder over Time Two tech trends Two tech trends Search vastly improved: Google incorporated in 1999 Search vastly improved: Google incorporated in 1999 Increase in (almost) unique publicly available facts Increase in (almost) unique publicly available facts Mortgages Mortgages Street View of each house -- pictures Street View of each house -- pictures Public records and likely market values & date of sale of each house Public records and likely market values & date of sale of each house Social networks, blogs, marketing information available for purchase: Social networks, blogs, marketing information available for purchase: “We got our new house today, and Bank X did a great/lousy job”“We got our new house today, and Bank X did a great/lousy job” How hard for forensic, automated efforts to reID? How hard for forensic, automated efforts to reID? Sweeney “K-anonymity” and can shrink “deID mortgage” to one or a few properties Sweeney “K-anonymity” and can shrink “deID mortgage” to one or a few properties
Technical Measures Technical measures to DeID may: Technical measures to DeID may: Be subject to ReID (previous slide); Be subject to ReID (previous slide); Introduce noise to data; or Introduce noise to data; or Both Both Add noise (or subtract signal) Add noise (or subtract signal) Census approach Census approach Public data set, suppress small cell size, lots of noise; orPublic data set, suppress small cell size, lots of noise; or Researchers can run regressions using somewhat better dataResearchers can run regressions using somewhat better data Cynthia Dwork’s “differential privacy” (Microsoft Research) Cynthia Dwork’s “differential privacy” (Microsoft Research) Limits queries into database based on tolerance for ReIDLimits queries into database based on tolerance for ReID Agrawal and other IBM research Agrawal and other IBM research “Hippocratic Database” adds noise with goal of allowing analysis but minimizing risk of linkage“Hippocratic Database” adds noise with goal of allowing analysis but minimizing risk of linkage
Administrative Measures HIPAA data use agreements HIPAA data use agreements Agreements apply to a “limited data set”, with obvious identifiers (name, address) stripped out Agreements apply to a “limited data set”, with obvious identifiers (name, address) stripped out Data use agreement Data use agreement Contractual guarantees to use data only for limited purposes, such as researchContractual guarantees to use data only for limited purposes, such as research Promise to use appropriate safeguards on dataPromise to use appropriate safeguards on data Promise not to reID the dataPromise not to reID the data 2009 CDT conference report on DeID and health data emphasized importance of administrative safeguards 2009 CDT conference report on DeID and health data emphasized importance of administrative safeguards
Public Records & Privacy Court records have been the subject of intense study on tradeoffs of public records and privacy Strong reasons for public access Strong reasons for public access Privacy: juvenile court, financial account info, etc. Privacy: juvenile court, financial account info, etc. Annual Williamsburg conference, each November Many state task forces on subject
Conclusion Some records are or should be public Some records are or should be private Ability to ReID is large and growing Technical measures to mask exist but are limited in applicability Technical measures to mask exist but are limited in applicability Administrative measures often essential for researchers to get meaningful results Administrative measures often essential for researchers to get meaningful results Technology alone often cannot succeed, so the choice becomes make public, keep private, or create effective data use agreements