Download presentation
Presentation is loading. Please wait.
1
Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships Michael T. Goodrich Univ. of California, Irvine joint w/ Arthur U. Asuncion
2
Problem Definition Discover the friendships
3
Problem Definition Discover the friendships
4
Leveraging Information Leaks Leak: Friendship list can be viewed by friends-of-friends. This allows: –Given two people, X and Y, we can tell whether X and Y have a friend in common. Leverage: We use this to discover the friends list for members of the network
5
Abstracting the Problem Viewed abstractly, we are trying to learn binary attribute vectors.
6
Group Testing Input: n items, numbered 0,1, …, n-1, at most d of which are defective. Output: the indices of the defective items. Items can be grouped into subsets, each of which can be tested to see it contains a defective item or not. Goal: minimize the total number of tests Original problem: Testing blood samples.
7
Testing Schemes Non-adaptive: All tests must be done in parallel Adaptive: Tests can be done sequentially Adaptive is easier, but our framework requires a non- adaptive approach
8
Facebook Application Each member has a “vector” of friendships For any member M, the system returns a bit for whether M has a friend in common with the attacker, even if M restricts this information to friends-of-friends We can use non-adaptive scheme to learn friendship relationships in any sub-community in Facebook.
9
DNA Application DNA sequences are stored in a database, D. For any sequence Q, the database returns a score for how close Q is to each sequence in D We form a binary vector w.r.t. places where mutations happen relative to a reference string R We can use non-adaptive scheme to learn DNA strings in D.
10
Netflix Application Movie ratings vectors are stored in a database, D. For any vector V, the database returns a score for how close V is to each vector in the database We can form a binary attribute vector for movies We can use non-adaptive scheme to learn ratings vectors in D.
11
Matrix View of Testing A non-adaptive testing regimen can be viewed as a t x n binary matrix M: –M[i,j] = 1 if and only if test i includes item j M is d-disjunct if the Boolean sum of any d columns does not contain any other column. –An item is defective iff all its tests are positive M is d-separable if the Boolean sums of each set of at most d columns are distinct (harder analysis algorithm) t n M
12
Randomized Approach Use a randomized approach motivated by Bloom filtering. Construct a matrix M, but relax requirements Given a set D of d columns in M and a column j, say j is distinguishable from D if there is a row i such that M[i,j]=1 but M[i,j’]=0 for each j’ in D. M is D -distinguishable if, for a particular collection D of subsets, the matrix M will find them distinguishable.
13
Constructing the Matrix Given t (set in the analysis), let M be a 2t x n matrix defined randomly: –For each column j, choose t/d rows of M at random and set these entries to 1. –that is, we “inject” j into those t/d tests
14
Technique for Social Networks Insert a small set of network members Form connections with random network members Test common- friends condition for the fictional members Image from http://www.politicsforum.org/images/flame_warriors/flame_53.php
15
Exploiting Sparse Data Sets Histogram of differences from R: Table of sizes, lengths, and differences from R:
16
Number of Tests Needed in Theory 1 st column: To clone entire database with high probability 2 nd column: To clone sparsest 50% of database with high probability 3 rd column: To clone entire database with probability 1
17
Different Choices for “d” Tradeoff: –The smaller the “d”, the faster we can recover sparse vectors –With very small “d”, it can take a long time to recover the vectors that are not so sparse. But most vectors are sparse so we generally want a pretty small “d” Attack on a Netflix user who has rated 98 movies. With smaller “d”, the rate of convergence is faster.
18
Different choices for “d” Here we vary “d” on the x-axis and we plot the mean and median number of tests required across the vectors in the database.
19
Distance from R More tests are needed for vectors which are further from the reference R (but note most vectors are close to R). We also see the tradeoff between various “d”
20
Thresholding Behavior There are critical values of our estimated value for d:
21
Conclusion and Future Work We have presented a way to turn privacy leaks into floods, with a number of applications: –Social networks –DNA databases –Ratings vectors Future work: extend our approach to non-binary vectors (e.g., friends and foes)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.