Computation and Society: The Case of Privacy and Fairness

Slides:



Advertisements
Similar presentations
“Mortgages, Privacy, and Deidentified Data” Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau.
Advertisements

Legal, Social, and Ethical Issues
Internet Privacy Jillian Brinberg, Maggie Kowalski, Sylvia Han, Isabel Smith-Bernstein, Jillian Brinberg.
Web Security A how to guide on Keeping your Website Safe. By: Robert Black.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
Social and Economic Impacts of IT Professor Matt Thatcher.
A Survey of Mobile Phone Sensing Michael Ruffing CS 495.
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
Fairness, Privacy, and Social Norms Omer Reingold, MSR-SVC “Fairness through awareness” with Cynthia Dwork, Moritz Hardt, Toni Pitassi, Rich Zemel + Musings.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Copyright © Allyn & Bacon 2008 POWER PRACTICE Chapter 11 Issues in Implementing Technology in Schools START This multimedia product and its contents are.
Lecture 17 Page 1 CS 236 Online Network Privacy Mostly issues of preserving privacy of data flowing through network Start with encryption –With good encryption,
Europe's work in progress: quality of mHealth Pēteris Zilgalvis, J.D., Head of Unit, Health and Well-Being, DG CONNECT Voka Health Community 29 September.
LEVEL 3 I can identify differences and similarities or changes in different scientific ideas. I can suggest solutions to problems and build models to.
Sociological Criminology, Criminology & Cultural Criminology.
© Dr Adnan Gutub Ethics Dr Adnan Gutub. © Dr Adnan Gutub Outline What are Ethics? Protection of Rights Professional Ethics & Computer Ethics Moral & Ethical.
CyberInfrastructure for Network Analysis Importance of, contributions by network analysis Transformation of NA Support needed for NA.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
SAN Distance Learning Project Teacher Survey 2002 – 2003 School Year... BOCES Distance Learning Program Quality Access Support.
DIGITAL FOOTPRINTS 11 TIPS FOR MONITORING YOUR DIGITAL FOOTPRINT AND 5 TIPS TO MAKE IT POSITIVE.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Algorithmic Transparency & Quantitative Influence
Creating your online identity
Social Media Security: Understanding how to keep yourself safe.

Social Studies Experiences
CORE Academic Growth Model: Introduction to Growth Models
The Argument against Regulation The Argument for Regulation
Big Data Considerations
Antonis Papadimitriou, Arjun Narayan, Andreas Haeberlen
Legal and Ethical Responsibilities
Seghill First School 10th May 2013
PowerPoint® Slides to Accompany
Preface to the special issue on context-aware recommender systems
Methodology Overview 2 basics in user studies Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Notice: some material in this.
GDPR Overview Gydeline – October 2017
Ethical Dilemmas in Leadership
Algorithmic Transparency with Quantitative Input Influence
Privacy-preserving Release of Statistics: Differential Privacy
CH. 1: Introduction 1.1 What is Machine Learning Example:
Fairness in Classification
GDPR Overview Gydeline – October 2017
Complexity Matters: Aligning the Evaluation of Social and Behavior Change with the Realities of Implementation International Social and Behavior Change.
The Nature of Qualitative Research
Information Security Footprint.
Motivation and Engagement in Learning
GENERAL DATA PROTECTION REGULATION (GDPR)
Big Data Considerations
Teaching with Instructional Software
Changing Your World: Investigating Empowerment
Vitaly (the West Coast) Feldman
Current Developments in Differential Privacy
Preserving Validity in Adaptive Data Analysis
Information technologies/NBIC and Big data
How to design programs that work better in complex adaptive systems
PBKM: A Secure Knowledge Management Framework
Classification Trees for Privacy in Sample Surveys
Chapter 19 Systems That Adapt to Their Users
Privacy Protection for Social Network Services
CS 5310 Data Mining Hong Lin.
“WHAT IS WHERE, WHY THERE, & WHY CARE?”
Scott Aaronson (UT Austin) UNM, Albuquerque, October 18, 2018
Published in: IEEE Transactions on Industrial Informatics
The reusable holdout: Preserving validity in adaptive data analysis
Security Principles and Policies CS 236 On-Line MS Program Networks and Systems Security Peter Reiher.
IT and Society Week 2: Privacy.
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

Computation and Society: The Case of Privacy and Fairness Omer Reingold Stanford CS, April 2017 Collaborators: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Aaron Roth, Guy Rothblum, Salil Vadhan, Rich Zemel, …

CS and Other Disciplines First: tell me what you do again? (aka. “I have that problem with my modem …”) Then: We have tons of data, do you have any clever algorithm for us? Now: The power of the computational lens: Various natural and social phenomena can be viewed as computations. (Recent example: “A Lizard With Scales That Behave Like a Computer Simulation” NYT 4/12/17 reporting on a Nature article.) The age of collaboration!

Big Data + ML Revolution weapons of math destruction

Computation and Society With the centrality of algorithms and data, more and more policy questions revolve around computation: Here: tradeoffs between privacy, fairness, and economic utility. Other examples Censorship vs. free speech in social platforms, Filtering of news (the filtered bubble), Identifying fake news, Net neutrality, National security vs. individual freedoms (the San Bernardino cell phone case), Loss of jobs due to automatization, Fear of AI, … CS can inform public debate but also extend the range of solutions.

Sensitive Information Digital Footprint: browsing history, social network interactions, location, emails, pictures, levels of physical-activity, food consumption

Privacy vs. Secrecy Private analysis\learning from a corpus of data What can Crypto do it for us? Encryption, Computation on Encrypted Data Secure Function Evaluation Secrecy rather than privacy: Privacy: what (is safe) to compute and share? Crypto (Secure Function Evaluation): how to compute? Invaluable when data curator untrusted (or distributed) Lots of good research questions, lots of good questions on the crypto side for another talk …

Notions of Privacy - Anonimization The outcome of a learning algorithm may leak sensitive data “Traditionaly” (with some legal protections): Anonymization, Deidentification, k-Anonymity… The President's Council of Advisors on Science and Technology report to the president on big data and privacy: “Anonymization is increasingly easily defeated by the very techniques that are  being developed for many legitimate applications of big data.  In general, as the  size and diversity of available data grows, the likelihood of being able to  re‐identify individuals (that is, re associate their records with their names) grows  substantially. While anonymization may remain somewhat useful as an added  safeguard in some situations, approaches that deem it, by itself, a sufficient  safeguard need updating.” Industry reaction: nah

Notions of Privacy - DP  The outcome of a learning algorithm may leak sensitive data Recent (decade-old) Notions: Differential Privacy (DP) [Dwork, McSherry, Nissim, Smith] Incredible impact on various disciplines as well as industry (Google, Apple, Startups, …). Lots of variants: Distributional DP, Pan Privacy, … Lots of good questions, for another talk …  Differential Privacy: (loosely) your increased harm from being in the corpus is small. One motivation: encourages opt-in

DP via Expectation of Privacy A study on the connection between smoking and cancer compromises the privacy of smokers (even with DP). No single definition – need to incorporate social choice What is a reasonable expectation of privacy? Assume I only want to protect Alice Allow Alice to erase herself and a few others from the database DP provides similar protection simultaneously to everyone Any different “protection for individual” implies a different variant of DP A way to interface policy-makers and privacy experts

Classification Taxation Advertising Health Care Financial aid Schooling Taxation paper acceptance Banking

Privacy and Classifiers Privacy preserving classifiers (observable outcomes of classification): Alice sees a particular ad Alice clicks on the ad What information is leaked about Alice? More challenging scenario, missing even a good definition

Apply Classifier on a Coarse Noisy Version? Influenced by our definition of fairness (later) If the coarse version doesn’t distinguishs possible omers, then sensitive properties may be protected ??

Good Definition? Not as strong as crypto defs and even DP: information is leaked Protection: Blend me in with the (surrounding) crowd If your surrounding is “normative” may imply meaningful protection (and substantiate, currently unjustified, sense of security of users). Lots of possible failings (as with k-anonymity). As strong as the similarity metric

Fairness in Classification Health Care Advertising Financial aid Schooling Taxation paper acceptance Banking

Concern: Discrimination Population includes minorities Ethnic, religious, medical, geographic Protected by law, policy, ethics A catalog of evils: redlining, reverse tokenism, self fulfilling prophecy, … discrimination may be subtle!

Credit Application (WSJ 8/4/10) User visits capitalone.com Capital One uses tracking information provided by the tracking network [x+1] to personalize offers Concern: Steering minorities into higher rates (illegal) *

Suggested A CS Perspective An individual based notion of fairness – fairness through awareness Versatile framework for obtaining and understanding fairness (including fair affairmative action) Fairness vs. Privacy: Privacy does not imply fairness but definitions and techniques useful

Fairness through Blindness Ignore all irrelevant/protected attributes e.g., Facebook “sex” & “interested in men/women” Point of failure: Redundant encodings Machine learning: You don’t need to see the label to be able to predict it E.g., redlining

Group Fairness (Statistical Parity) Equalize minority S with general population T at the level of outcomes Pr[outcome o | S] = Pr[outcome o | T] Insufficient as a notion of fairness Has some merit, but can be abused Example: Advertise burger joint to carnivores in T and vegans in S. Example: Self fulfilling prophecy Example: Multiculturalism …

Lesson: Fairness is task-specific Fairness requires understanding of classification task Utility and fairness align! Cultural understanding of protected groups Awareness! Secrecy  fairness

Our Approach: Individual Fairness Treat similar individuals similarly Similar for the purpose of (fairness in) the classification task Similar distribution over outcomes

Metric – Who Decides? Assume task-specific similarity metric Extent to which two individuals are similar w.r.t. the classification task at hand Privacy and fairness are context specific and depends on society’s norms How can we facilitate informed public discussion (taking into account algorithmic limitations and ML insights)? Can we learn a good metric? Can we avoid learning past biases? User control? Not obvious if possible Users need to be informed …

I Was Rejected Why? NYC teachers Simple explanations of complicated classifiers? Additional risk of gaming? Books in parents home Adversarial errors in deep learning

False discovery — Just Getting Worse “Trouble at the Lab” – The Economist

Accuracy and Privacy Align e Accuracy and Privacy Align Showed how to use facilitate adaptive investigations using differential privacy Reusable holdout Limit on how much we can squeeze data – for privacy but also for the risk of overfitting