Fair Clustering through Fairlets ( NIPS 2017)

Slides:



Advertisements
Similar presentations
Combinatorial Auction
Advertisements

Approximation algorithms for geometric intersection graphs.
Min-Max Relations, Hall’s Theorem, and Matching-Algorithms Graphs & Algorithms Lecture 5 TexPoint fonts used in EMF. Read the TexPoint manual before you.
Triangle partition problem Jian Li Sep,2005.  Proposed by Redstar in Algorithm board in Fudan BBS.  Motivated by some network design strategy.
Clustering.
Approximation Algorithms
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
1 Minimizing Movement Erik D. Demaine, MohammadTaghi Hajiagahayi, Hamid Mahini, Amin S. Sayedi-Roshkhar, Shayan Oveisgharan, Morteza Zadimoghaddam SODA.
Combinatorial Algorithms
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Great Theoretical Ideas in Computer Science.
Approximating Maximum Edge Coloring in Multigraphs
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Approximation Algorithms: Combinatorial Approaches Lecture 13: March 2.
Balanced Graph Partitioning Konstantin Andreev Harald Räcke.
Optimization problems INSTANCE FEASIBLE SOLUTIONS COST.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
Graph-Cut Algorithm with Application to Computer Vision Presented by Yongsub Lim Applied Algorithm Laboratory.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
V. V. Vazirani. Approximation Algorithms Chapters 3 & 22
Algorithms for Network Optimization Problems This handout: Minimum Spanning Tree Problem Approximation Algorithms Traveling Salesman Problem.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Design Techniques for Approximation Algorithms and Approximation Classes.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
CSE 421 Algorithms Richard Anderson Lecture 24 Network Flow Applications.
Lecture 16 Maximum Matching. Incremental Method Transform from a feasible solution to another feasible solution to increase (or decrease) the value of.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
1 On Completing Latin Squares Iman Hajirasouliha Joint work with Hossein Jowhari, Ravi Kumar, and Ravi Sundaram.
1/24 Introduction to Graphs. 2/24 Graph Definition Graph : consists of vertices and edges. Each edge must start and end at a vertex. Graph G = (V, E)
“Fault Tolerant Clustering Revisited” - - CCCG 2013 Nirman Kumar, Benjamin Raichel خوشه بندی مقاوم در برابر خرابی سپیده آقاملائی.
The full Steiner tree problem Theoretical Computer Science 306 (2003) C. L. Lu, C. Y. Tang, R. C. T. Lee Reporter: Cheng-Chung Li 2004/06/28.
Young CS 331 D&A of Algo. NP-Completeness1 NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and.
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
Introduction to Approximation Algorithms
Data Driven Resource Allocation for Distributed Learning
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Core-Sets and Geometric Optimization problems.
Haim Kaplan and Uri Zwick
Lecture 22 Network Flow, Part 2
Great Theoretical Ideas in Computer Science
Chapter 5. Optimal Matchings
Computability and Complexity
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
k-center Clustering under Perturbation Resilience
ICS 353: Design and Analysis of Algorithms
Lecture 16 Maximum Matching
Lecture 9 Greedy Strategy
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Minimum Spanning Trees
Integer Programming (정수계획법)
Approximation Algorithms
On the k-Closest Substring and k-Consensus Pattern Problems
Problem Solving 4.
3.3 Network-Centric Community Detection
Minimum Spanning Trees
Richard Anderson Lecture 10 Minimum Spanning Trees
Integer Programming (정수계획법)
NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979.
The Full Steiner tree problem Part Two
Minimum Spanning Trees
Clustering.
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Approximation Algorithms
Lecture 22 Network Flow, Part 2
Lecture 24 Vertex Cover and Hamiltonian Cycle
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

Fair Clustering through Fairlets ( NIPS 2017) Flavio Chierichetti Ravi Kumar Silvio Lattanzi Sergei Vassilvitskii

Objective A Fair Clustering algorithm under the Disparate Impact doctrine, where each protected class must have approximately equal representation in every cluster Formulation of fair clustering under the k-center and k-median objectives

Clustering and Fairness Given a set X of points lying in some metric space, the goal is to find a partition of X into k different clusters, optimizing a particular objective function Unprotected- Coordinates, Protected- Color Disparate impact translates to that of Color Balance in each cluster

The two objectives K- Center Given a set of data points X with distances d(xi, xj) ∈ N satisfying the triangle inequality, find a subset C ⊆ X with |C| = k while minimizing such that the maximum distance of a point in X to the closest point in C is minimized: 𝜑 𝑋, 𝐶 = max 𝑥∈𝑋 min 𝑐∈𝒞 𝑑(𝑥, 𝑐) K-Median Given a set of data points X, the k centers ci are to be chosen so as to minimize the sum of the distances from each x to the nearest ci 𝜓 𝑋, 𝐶 = 𝑥∈𝑋, min 𝑐∈𝒞 𝑑(𝑥, 𝑐)

Balance For, 𝒀⊆𝑿, 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝒀 = 𝐦𝐢𝐧 #𝑹𝑬𝑫(𝒀) #𝑩𝑳𝑼𝑬(𝒀) , #𝑩𝑳𝑼𝑬(𝒀) #𝑹𝑬𝑫(𝒀) ∈ 𝟎, 𝟏 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝑪 = 𝐦𝐢𝐧 𝒄∈𝑪 𝒃𝒂𝒍𝒂𝒏𝒄𝒆(𝒄) A subset with equal number of red and blue points has balance 1, while a monochromatic subset has balance 0.

LEMMA Lemma A: Let 𝒀, 𝒀′⊆𝑿 be disjoint. If 𝑪 is a clustering of 𝒀 and 𝑪′ be a clustering of 𝒀′, then 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝑪⋃ 𝑪 ′ =𝐦𝐢𝐧⁡(𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝑪 , 𝒃𝒂𝒍𝒂𝒏𝒄𝒆( 𝑪 ′ )). Lemma B: Let 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝑿 = 𝒃 𝒓 for some integers 𝟏≤𝒃≤𝒓 such that 𝐠𝐜𝐝 𝒃, 𝒓 =𝟏, then there exists a clustering 𝓨= 𝒀 𝟏 , …, 𝒀 𝒎 of 𝑿 such that 𝒀 𝒋 ≤𝒃+𝒓 for each 𝒀 𝒋 ∈𝓨, i.e., each cluster is small 𝒃𝒂𝒍𝒂𝒏𝒄𝒆 𝓨 = 𝒃 𝒓 =𝒃𝒂𝒍𝒂𝒏𝒄𝒆(𝑿 𝓨 is 𝑏, 𝑟 −𝑓𝑎𝑖𝑟𝑙𝑒𝑡 𝑑𝑒𝑐𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑋 and each 𝒀∈𝓨 a 𝑓𝑎𝑖𝑟𝑙𝑒𝑡

𝑡, 𝑘 −𝑓𝑎𝑖𝑟 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑖𝑛𝑔 In the 𝑡,𝑘 -fair center (𝑟𝑒𝑠𝑝. (𝑡, 𝑘) 𝑓𝑎𝑖𝑟 𝑚𝑒𝑑𝑖𝑎𝑛) problem, the goal is to partition 𝑋 into 𝐶 such that 𝐶 =𝑘, 𝑏𝑎𝑙𝑎𝑛𝑐𝑒 𝐶 ≥𝑡, 𝑎𝑛𝑑 𝜑(𝑋, 𝐶) (𝑟𝑒𝑠𝑝. 𝜓(𝑋, 𝐶)) is minimized.

Fair k- center: (1, 1)- fairlets Create a graph 𝐺 𝐵⋃𝑅, 𝐸 , 𝐸={ 𝑏 𝑖 , 𝑟 𝑗 , 𝑤 𝑖𝑗 =𝑑( 𝑏 𝑖 , 𝑟 𝑗 )} Decomposition into fairlets corresponds to some perfect matching in the graph. 𝜑(𝑋, 𝑌) is exactly the cost of the maximum weight edge in the matching. Define 𝐺 𝜏 as a threshold graph that has the same nodes as 𝐺but only those edges who has weight at most 𝜏 We can then look for the minimum 𝜏 where the corresponding graph has a perfect matching Finally for each fairlet 𝑌 𝑖 we can arbitrarily set one of the two nodes as the center

Fair k-center: (1, 𝑡 ′ )-fairlets Transform the problem into a minimum cost flow(MCF) problem A (𝛽, 𝜌) edge with cost 0 and capacity min⁡( 𝐵 , 𝑅 ) A (𝛽, 𝑏 𝑖 ) edge for each 𝑏 𝑖 ∈𝐵 and an ( 𝑟 𝑖 ,𝜌) for each 𝑟 𝑖 ∈𝑅 [cost 0 capacity 𝑡 ′ −1] For each 𝑏 𝑖 ∈𝐵 and for each 𝑗∈ 𝑡′ , a ( 𝑏 𝑖 , 𝑏 𝑖 𝑗 ) edge and similarly for each 𝑟 𝑖 ∈𝑅 [cost 0 and capacity 1] For each 𝑏 𝑖 ∈𝐵, 𝑟 𝑗 ∈𝑅 and for each 1≤𝑘,𝑙≤𝑡, 𝑎 ( 𝑏 𝑖 𝑘 , 𝑟 𝑗 𝑙 ) edge with capacity 1. The cost of each edge is 1 if 𝑑 𝑏 𝑖 , 𝑟 𝑗 ≤𝜏 and ∞ otherwise.

Fair k-center: (1, 𝑡 ′ )-fairlets

LEMMA Lemma C: Let 𝒴 be an optimal solution of cost C to the MCF instance, then it is possible to construct a 1, 𝑡 ′ -fairlet decomposition for ( 1 𝑡 ′ , 𝑘)- fair center problem of cost at most C.

Theorem For each fixed 𝑡′≥3, finding an optimal (1, 𝑡 ′ )-fairlet decomposition is NP-hard. Finding the minimum cost ( 1 𝑡 ′ ,𝑘)-fair median clustering is NP-hard.

Greedy Furthest point Algorithm

Datasets Diabetes (1000 records, gender to be balanced) Bank (1000 records, Married or unmarried to be balanced) Census (600 records, gender to be balanced)

Results

Future Work Extend this idea to situations where the protected class is not binary Extend the idea to other clustering objective functions

References Gonzalez, Teofilo F. "Clustering to minimize the maximum intercluster distance." Theoretical Computer Science 38 (1985): 293-306.[PDF]

THANK YOU