Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tradeoffs Between Fairness and Accuracy in Machine Learning

Similar presentations


Presentation on theme: "Tradeoffs Between Fairness and Accuracy in Machine Learning"β€” Presentation transcript:

1 Tradeoffs Between Fairness and Accuracy in Machine Learning
Aaron Roth

2

3

4 The Basic Setup Given: Data, consisting of 𝑑 features and a label.
Goal: Find a rule to predict label from features. College? Bankruptcy? Tall? Employed? Homeowner? Paid back loan? Y N Yes No π‘₯ 𝑦 (College and Employed and not Bankruptcy) or (Tall and Employed and not College)

5 The Basic Setup Given data, select a hypothesis β„Ž: π‘Œ,𝑁 𝑑 β†’{π‘Œ,𝑁}
Goal is not prediction on the training data, but prediction on new examples.

6 𝑷 The Basic Setup ( π‘₯ 1 , 𝑦 1 ) Training Data ( π‘₯ 2 , 𝑦 2 )
( π‘₯ 1 , 𝑦 1 ) Training Data ( π‘₯ 2 , 𝑦 2 ) 𝑷 ( π‘₯ 3 , 𝑦 3 ) ( π‘₯ 𝑖 , 𝑦 𝑖 ) ( π‘₯ 𝑛 , 𝑦 𝑛 ) (π‘₯, 𝑦) Algorithm β„Ž β„Ž(π‘₯) The example is misclassified if β„Ž π‘₯ ≠𝑦.

7 The Basic Setup Goal: Find a classifier to minimize
π‘’π‘Ÿπ‘Ÿ β„Ž = Pr π‘₯,𝑦 βˆΌπ‘ƒ [β„Ž π‘₯ ≠𝑦] We don’t know 𝑃… But we can minimize the empirical error: π‘’π‘Ÿπ‘Ÿ β„Ž,𝐷 = 1 𝑛 π‘₯,𝑦 ∈𝐷 :β„Ž π‘₯ ≠𝑦

8 The Basic Setup Empirical Error Minimization:
College? Bankruptcy? Tall? Employed? Homeowner? Paid back loan? Y N Yes No The Basic Setup Empirical Error Minimization: Try a lookup table! β„Ž π‘₯ =π‘Œ if π‘₯∈ π‘Œπ‘π‘Œπ‘Œπ‘, π‘Œπ‘π‘π‘Œπ‘, π‘Œπ‘π‘Œπ‘Œπ‘, π‘π‘Œπ‘Œπ‘Œπ‘Œ β„Ž π‘₯ =𝑁 otherwise. π‘’π‘Ÿπ‘Ÿ β„Ž,𝐷 = 0. This would over-fit. We haven’t learned, Just memorized. Learning must summarize.

9 The Basic Setup Instead, limit hypotheses to come from a β€œsimple” class 𝐢. E.g. linear functions, or small decision trees, etc. Compute β„Ž βˆ— = arg min β„ŽβˆˆπΆ π‘’π‘Ÿπ‘Ÿ β„Ž,𝐷 π‘’π‘Ÿπ‘Ÿ β„Ž βˆ— ≀ π‘’π‘Ÿπ‘Ÿ β„Ž βˆ— ,𝐷 + max β„ŽβˆˆπΆ π‘’π‘Ÿπ‘Ÿ β„Ž βˆ’ π‘’π‘Ÿπ‘Ÿ β„Ž,𝐷 Training Error Generalization Error

10 Why might machine learning be β€œunfair”?
Many reasons: Data might encode existing biases. E.g. labels are not β€œCommitted a crime?” but β€œWas arrested.” Data collection feedback loops. E.g. only observe β€œPaid back loan?” if the loan was granted. Different populations with different properties. E.g. β€œSAT score” might correlate with label differently in populations that employ SAT tutors. Less data (by definition) about minority populations.

11 Why might machine learning be β€œunfair”?
Many reasons: Data might encode existing biases. E.g. labels are not β€œCommitted a crime?” but β€œWas arrested.” Data collection feedback loops. E.g. only observe β€œPaid back loan?” if the loan was granted. Different populations with different properties. E.g. β€œSAT score” might correlate with label differently in populations that employ SAT tutors. Less data (by definition) about minority populations.

12 SAT Score Number of Credit Cards Population 1 Population 2

13 SAT Score Number of Credit Cards Population 1 Population 2

14 Else, optimizing accuracy fits the majority population.
SAT Score Upshot: To be β€œfair”, the algorithm may need to explicitly take into account group membership. Else, optimizing accuracy fits the majority population. Number of Credit Cards Population 1 Population 2

15 Why might machine learning be β€œunfair”?
Many reasons: Data might encode existing biases. E.g. labels are not β€œCommitted a crime?” but β€œWas arrested.” Data collection feedback loops. E.g. only observe β€œPaid back loan?” if the loan was granted. Different populations with different properties. E.g. β€œSAT score” might correlate with label differently in populations that employ SAT tutors. Less data (by definition) about minority populations.

16 SAT Score Number of Credit Cards Population 1 Population 2

17 Qualified individuals will be denied at a higher rate.
Upshot: Algorithms trained on minority populations will be less accurate. Qualified individuals will be denied at a higher rate. SAT Score Number of Credit Cards Population 1 Population 2

18 Why might machine learning be β€œunfair”?
Many reasons: Data might encode existing biases. E.g. labels are not β€œCommitted a crime?” but β€œWas arrested.” Data collection feedback loops. E.g. only observe β€œPaid back loan?” if the loan was granted. Different populations with different properties. E.g. β€œSAT score” might correlate with label differently in populations that employ SAT tutors. Less data (by definition) about minority populations.

19 Toy Model Two kinds of loan applicants.
Type 1: Pays back loan with unknown probability 𝑝 1 Type 2: Pays back loan with unknown probability 𝑝 2 Initially, bank believes 𝑝 1 , 𝑝 2 uniform in [0,1] Every day, bank makes a loan to type most likely to pay it back according to posterior distribution on 𝑝 1 , 𝑝 2 Bank observes if it is repaid, but not counterfactuals. Bank then updates posteriors.

20 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.5 0.5 𝑛 𝑖 =0 𝑠 𝑖 =0 𝑑 𝑖 =0 𝑛 𝑖 =0 𝑠 𝑖 =0 𝑑 𝑖 =0

21 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.5 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =0 𝑠 𝑖 =0 𝑑 𝑖 =0

22 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.66 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =1 𝑠 𝑖 =1 𝑑 𝑖 =0

23 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.75 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =2 𝑠 𝑖 =2 𝑑 𝑖 =0

24 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.60 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =3 𝑠 𝑖 =2 𝑑 𝑖 =1

25 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.5 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =4 𝑠 𝑖 =2 𝑑 𝑖 =2

26 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.57 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =5 𝑠 𝑖 =3 𝑑 𝑖 =2

27 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.625 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =6 𝑠 𝑖 =4 𝑑 𝑖 =2

28 Toy Model Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.55 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =7 𝑠 𝑖 =4 𝑑 𝑖 =3

29 Toy Model Upshot: Algorithms making myopically optimal decisions may forever discriminate against a qualified population because of unlucky prefix. Less myopic algorithms balance exploration and exploitation – but this has its own unfairness issues. Suppose bank has made 𝑛 𝑖 loans to population 𝑖, 𝑠 𝑖 have paid them back , 𝑑 𝑖 have defaulted. ( 𝑛 𝑖 = 𝑠 𝑖 + 𝑑 𝑖 ) Expected payoff of next loan is 𝑠 𝑖 +1 𝑠 𝑖 + 𝑑 𝑖 +2 𝑝 1 = 𝑝 2 = 1 2 0.33 𝑛 𝑖 =1 𝑠 𝑖 =0 𝑑 𝑖 =1 𝑛 𝑖 =π‘₯ 𝑠 𝑖 β‰ˆπ‘₯/2 𝑑 𝑖 β‰ˆπ‘₯/2

30 A solution in a stylized model…
So what can we do? A solution in a stylized model… Joint work with: Matthew Joseph, Michael Kearns, Jamie Morgenstern Seth Neel.

31 A Richer Model

32 A Richer Model

33 A Richer Model

34 The Contextual Bandit Problem
π‘˜ populations π‘–βˆˆ{1,…, π‘˜} Each with an unknown function 𝑓 𝑖 : ℝ 𝑑 β†’[0,1] e.g. 𝑓 𝑖 π‘₯ =〈 𝛽 𝑖 ,π‘₯βŒͺ In rounds π‘‘βˆˆ 1,…,𝑇 : A member of each population 𝑖 arrives with an application π‘₯ 𝑖 𝑑 ∈ ℝ 𝑑 The bank chooses an individual 𝑖 𝑑 , obtains reward π‘Ÿ 𝑖 𝑑 𝑑 such that 𝔼 π‘Ÿ 𝑖 𝑑 𝑑 = 𝑓 𝑖 𝑑 ( π‘₯ 𝑖 𝑑 𝑑 ) Bank does not observe reward for individuals not chosen.

35 The Contextual Bandit Problem
Measure performance of the algorithm via Regret: Regret = 1 𝑇 𝑑 max 𝑖 𝑓 𝑖 π‘₯ 𝑖 𝑑 βˆ’ 𝑑 𝑓 𝑖 𝑑 π‘₯ 𝑖 𝑑 𝑑 β€œUCB” style algorithms can get regret 𝑂 π‘‘β‹…π‘˜ 𝑇

36 A Fairness Definition β€œFair Equality of Opportunity”: β€œβ€¦ assuming there is a distribution of natural assets, those who are at the same level of talent and ability, and have the same willingness to use them, should have the same prospects of success regardless of their initial place in the social system.” – John Rawls

37 A Fairness Definition β€œFair Equality of Opportunity”: The algorithm should never preferentially favor (i.e. choose with higher probability) a less qualified individual over a more qualified individual.

38 A Fairness Definition β€œFair Equality of Opportunity”: For every round 𝑑: Pr Alg chooses 𝑖 >Pr⁑[Alg chooses 𝑗] only if: 𝑓 𝑖 π‘₯ 𝑖 𝑑 > 𝑓 𝑗 ( π‘₯ 𝑗 𝑑 )

39 Is Fairness in Conflict with Profit?
Note --- optimal policy is fair! Every day, select arg max 𝑖 𝑓 𝑖 ( π‘₯ 𝑖 𝑑 ) Constraint only prohibits favoring worse applicants. Maybe goals of learner are already aligned with this constraint? Main Take Away: Wrong. Fairness still an obstruction to learning the optimal policy. Even for 𝑑=0, requires Θ( π‘˜ 3 ) rounds to obtain non- trivial regret.

40 FairUCB 𝜷 𝟏 𝒕 , 𝒙 𝟏 𝒕 = 𝝁 𝒕 If two intervals overlap,
don’t know which is better. If two intervals overlap, play with = probability What about me? 𝜷 𝟏 𝒕 , 𝒙 𝟏 𝒕 = 𝝁 𝒕 β€œStandard” choice [Auer et al ’02] Reward 40οΏΌ

41 β€œStandard Approach” Biases Towards Uncertainty
Example with 2 features: π‘₯ 1 is β€œlevel of education” π‘₯ 2 is β€œnumber of tophats owned” For Group 1: 𝛽=(1, 0) 90%: Random s.t. π‘₯ 1 = π‘₯ 2 (perfectly correlated) 10%: Random ( π‘₯ 1 , π‘₯ 2 independent) For Group 2: 𝛽= 0.5,0.5 All applications uniform.

42 β€œStandard Approach” Biases Towards Uncertainty
β€œUnfair Rounds”: 𝑑 s.t. Algorithm selects sub-optimal applicant. Discrimination Index for Individual: Probability of victimization conditioned on participating in being present at unfair round.

43 Chaining: Chained

44 FairUCB Pseudo-Code For t = 1 to ∞
Using OLS estimators 𝛽 𝑖 , construct reward estimators and widths 𝑀 𝑖 such that with high probability: πœ‡ 𝑖 𝑑 = 𝛽 𝑖 , π‘₯ 𝑖 𝑑 πœ‡ 𝑖 𝑑 ∈ πœ‡ 𝑖 𝑑 βˆ’ 𝑀 𝑖 , πœ‡ 𝑖 𝑑 + 𝑀 𝑖 Compute 𝑖 βˆ— = arg max 𝑖 ( 𝑒 𝑖 𝑑 + 𝑀 𝑖 ) Select an individual 𝑖 𝑑 uniformly at random from among the set of individuals chained to 𝑖 βˆ— Observe reward π‘Ÿ 𝑖 𝑑 𝑑 and update 𝛽 𝑖 𝑑 .

45 FairUCB’s Guarantees:
Regret is: 𝑂 𝑑 π‘˜ 3 𝑇 Worse than without fairness – but only by a factor of 𝑂 π‘˜β‹… 𝑑 When 𝑓 𝑖 are linear, fairness has a cost, but only a mild one. What about when 𝑓 𝑖 are non-linear? Can quickly, fairly learn ⇔ Can quickly find tight confidence intervals. β€œCost of Fairness” can sometimes be exponential in 𝑑.

46 Discrimination Index Standard UCB Fair UCB

47 Performance

48 Discussion Even without a taste for discrimination β€œbaked in”, natural (optimal) learning procedures can disproportionately benefit certain populations. Imposing β€œfairness” constraints is not without cost. Must quantify tradeoffs between fairness and other desiderata No Pareto dominant solution – how should we choose between different points on the tradeoff curve? None of these points is unique to β€œalgorithmic” decision making. Just allows formal analysis to bring these universal issues to the fore.

49 Fairness Definition

50 Where does cost of fairness come from?
Distribution over instances k Bernoulli arms Only uncertainty is their means Arm i has mean ΞΌi which is either

51 Lower bound particulars
Mean of Distribution b’k bk b’k-1 bk-1 Must play w. = probability until confident they’re different 5151οΏΌ

52 Lower bound particulars
Mean of Distribution Must play w. = probability until confident they’re different bk-1 b’k-2 bk-2 b’k-1 5252οΏΌ

53 Lower bound particulars
Mean of Distribution Must play w. = probability until confident they’re different 5353οΏΌ

54 Lower bound particulars
Mean of Distribution Must play w. = probability until confident they’re different b’2 b2 b’1 b1 5454οΏΌ

55 Lower bound particulars
Mean of Distribution p’k pk Must play w. = probability until confident they’re different pk-1 p’k-1 p’2 p2 p’1 p1

56 So, there’s a separation between fair/unfair learning
Without fairness one can achieve regret but this lower bound implies fair regret of


Download ppt "Tradeoffs Between Fairness and Accuracy in Machine Learning"

Similar presentations


Ads by Google