Computation and Society: The Case of Privacy and Fairness

Computation and Society: The Case of Privacy and Fairness
Omer Reingold Stanford CS, April 2017 Collaborators: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Aaron Roth, Guy Rothblum, Salil Vadhan, Rich Zemel, …

CS and Other Disciplines
First: tell me what you do again? (aka. “I have that problem with my modem …”) Then: We have tons of data, do you have any clever algorithm for us? Now: The power of the computational lens: Various natural and social phenomena can be viewed as computations. (Recent example: “A Lizard With Scales That Behave Like a Computer Simulation” NYT 4/12/17 reporting on a Nature article.) The age of collaboration!

Big Data + ML Revolution
weapons of math destruction

Computation and Society
With the centrality of algorithms and data, more and more policy questions revolve around computation: Here: tradeoffs between privacy, fairness, and economic utility. Other examples Censorship vs. free speech in social platforms, Filtering of news (the filtered bubble), Identifying fake news, Net neutrality, National security vs. individual freedoms (the San Bernardino cell phone case), Loss of jobs due to automatization, Fear of AI, … CS can inform public debate but also extend the range of solutions.

Sensitive Information
Digital Footprint: browsing history, social network interactions, location, s, pictures, levels of physical-activity, food consumption

Privacy vs. Secrecy Private analysis\learning from a corpus of data
What can Crypto do it for us? Encryption, Computation on Encrypted Data Secure Function Evaluation Secrecy rather than privacy: Privacy: what (is safe) to compute and share? Crypto (Secure Function Evaluation): how to compute? Invaluable when data curator untrusted (or distributed) Lots of good research questions, lots of good questions on the crypto side for another talk …

Notions of Privacy - Anonimization
The outcome of a learning algorithm may leak sensitive data “Traditionaly” (with some legal protections): Anonymization, Deidentification, k-Anonymity… The President's Council of Advisors on Science and Technology report to the president on big data and privacy: “Anonymization is increasingly easily defeated by the very techniques that are being developed for many legitimate applications of big data. In general, as the size and diversity of available data grows, the likelihood of being able to re‐identify individuals (that is, re associate their records with their names) grows substantially. While anonymization may remain somewhat useful as an added safeguard in some situations, approaches that deem it, by itself, a sufficient safeguard need updating.” Industry reaction: nah

Notions of Privacy - DP 
The outcome of a learning algorithm may leak sensitive data Recent (decade-old) Notions: Differential Privacy (DP) [Dwork, McSherry, Nissim, Smith] Incredible impact on various disciplines as well as industry (Google, Apple, Startups, …). Lots of variants: Distributional DP, Pan Privacy, … Lots of good questions, for another talk …  Differential Privacy: (loosely) your increased harm from being in the corpus is small. One motivation: encourages opt-in

DP via Expectation of Privacy
A study on the connection between smoking and cancer compromises the privacy of smokers (even with DP). No single definition – need to incorporate social choice What is a reasonable expectation of privacy? Assume I only want to protect Alice Allow Alice to erase herself and a few others from the database DP provides similar protection simultaneously to everyone Any different “protection for individual” implies a different variant of DP A way to interface policy-makers and privacy experts

Classification Taxation Advertising Health Care Financial aid
Schooling Taxation paper acceptance Banking

Privacy and Classifiers
Privacy preserving classifiers (observable outcomes of classification): Alice sees a particular ad Alice clicks on the ad What information is leaked about Alice? More challenging scenario, missing even a good definition

Apply Classifier on a Coarse Noisy Version?
Influenced by our definition of fairness (later) If the coarse version doesn’t distinguishs possible omers, then sensitive properties may be protected ??

Good Definition? Not as strong as crypto defs and even DP: information is leaked Protection: Blend me in with the (surrounding) crowd If your surrounding is “normative” may imply meaningful protection (and substantiate, currently unjustified, sense of security of users). Lots of possible failings (as with k-anonymity). As strong as the similarity metric

Fairness in Classification
Health Care Advertising Financial aid Schooling Taxation paper acceptance Banking

Concern: Discrimination
Population includes minorities Ethnic, religious, medical, geographic Protected by law, policy, ethics A catalog of evils: redlining, reverse tokenism, self fulfilling prophecy, … discrimination may be subtle!

Credit Application (WSJ 8/4/10)
User visits capitalone.com Capital One uses tracking information provided by the tracking network [x+1] to personalize offers Concern: Steering minorities into higher rates (illegal) *

Suggested A CS Perspective
An individual based notion of fairness – fairness through awareness Versatile framework for obtaining and understanding fairness (including fair affairmative action) Fairness vs. Privacy: Privacy does not imply fairness but definitions and techniques useful

Fairness through Blindness
Ignore all irrelevant/protected attributes e.g., Facebook “sex” & “interested in men/women” Point of failure: Redundant encodings Machine learning: You don’t need to see the label to be able to predict it E.g., redlining

Group Fairness (Statistical Parity)
Equalize minority S with general population T at the level of outcomes Pr[outcome o | S] = Pr[outcome o | T] Insufficient as a notion of fairness Has some merit, but can be abused Example: Advertise burger joint to carnivores in T and vegans in S. Example: Self fulfilling prophecy Example: Multiculturalism …

Lesson: Fairness is task-specific
Fairness requires understanding of classification task Utility and fairness align! Cultural understanding of protected groups Awareness! Secrecy  fairness

Our Approach: Individual Fairness
Treat similar individuals similarly Similar for the purpose of (fairness in) the classification task Similar distribution over outcomes

Metric – Who Decides? Assume task-specific similarity metric
Extent to which two individuals are similar w.r.t. the classification task at hand Privacy and fairness are context specific and depends on society’s norms How can we facilitate informed public discussion (taking into account algorithmic limitations and ML insights)? Can we learn a good metric? Can we avoid learning past biases? User control? Not obvious if possible Users need to be informed …

I Was Rejected Why? NYC teachers
Simple explanations of complicated classifiers? Additional risk of gaming? Books in parents home Adversarial errors in deep learning

False discovery — Just Getting Worse
“Trouble at the Lab” – The Economist

Accuracy and Privacy Align
e Accuracy and Privacy Align Showed how to use facilitate adaptive investigations using differential privacy Reusable holdout Limit on how much we can squeeze data – for privacy but also for the risk of overfitting

Computation and Society: The Case of Privacy and Fairness

Similar presentations

Presentation on theme: "Computation and Society: The Case of Privacy and Fairness"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computation and Society: The Case of Privacy and Fairness

Similar presentations

Presentation on theme: "Computation and Society: The Case of Privacy and Fairness"— Presentation transcript:

Similar presentations

About project

Feedback