Inferring Privacy Information from Social Networks Presenter: Ieng-Fat Lam Date: 2007 / 12 / 25.

Inferring Privacy Information from Social Networks Presenter: Ieng-Fat Lam Date: 2007 / 12 / 25

Paper to be presented Jianming He 1, Wesley W. Chu 1, and Zhenyu (Victor) Liu 2, “Inferring Privacy Information from Social Networks”. Lecture Notes in Computer Science, page 154-165, Springer- Verlag Berlin Heidelberg, 2006. 1 Computer Science Department UCLA, Los Angeles, CA 90095, USA 2 Google Inc. USA 2

Motivation Online social network services Become popular Privacy confidentiality ( 機密 ) problem Increasingly challenging Urgent research issues In building next-generation information systems Existing techniques and policies such as Cryptography and security protocols Government policies Aim to block direct disclosure of sensitive personal information 3

Block Direct Disclosure 4

Motivation (cont.) How about indirect disclosure? Can be achieved by pieces of seemingly Innocuous ( 無害的 ) information Unrelated information In Social network Same Dance Club Same Interest Same office Similar Professions Privacy can be indirectly disclosed By their social relations 5

Indirect Disclosure 6

Problem Study the privacy disclosure in social networks Indirect disclosure ? To what extent ( 程度 ) ? Under what conditions ? 7

The Research Perform privacy inference Map Bayesian networks to social networks Model the causal relations ( 因果關係 ) among people Discuss factors might affect the inference Prior probability ( 事前機率 ) Influence strength Society openness Conduct extensive ( 大規模 ) experiment On a real online social network structure 8

Probability: Rules for consistent reasoning Cox’s Two axioms First If we specify how much we belief something is true We must have specified (implicitly) how much we belief is false. Sum rule P(X|I) + P(X|I) = 1 Second If we first specify how much we believe Y is true State how much we believe X is true given that Y is true We must have specified how much we believe both X and Y are true Product rule P(X,Y|I) = P(X|Y, I) × P(Y|I) 9

Probability: Rules for consistent reasoning (cont.) The given condition I Denote relevant background information There is no such thing in absolute probability Although I is often omitted We must never forget its existence 10

Bayes’ theorem (cont.) Replace X and Y by hypothesis ( 假說 ) and data: P (hypothesis | I) : Prior probability ( 事前機率 ) Our state of knowledge about the truth of the hypothesis before we have analysis the current data P (data | hypothesis, I) : Conditional Probability Of seeing the data given that the hypothesis is true P (hypothesis | data, I) : Posterior probability ( 事後機率 ) Our state of knowledge about the truth of the hypothesis in the light of the data 12

Bayes’ theorem (cont.) P ( data | I ) : Marginal Probability ( 邊際機率 ) of data Probability of witnessing data Under all mutually exclusive hypothesis Can be calculate as P ( data | I ) = P (data | hypothesis, I ) / P ( data | I ) Represent the impact that the data has on belief in hypothesis. Bayes’ theorem measure how much data should alter a belief in a hypothesis Use toss result to infer if a coin a fair 13

Bayesian Networks A Bayesian network is Graph presentation of Joint probability distribution (physical or Bayesian) Over a set of variables Include the consideration of network structure It is consisted by A network structure Conditional probability tables (CPT) 14

Bayesian Networks (cont.) Network structure Presented as Directed Acyclic Graph (DAG) (directed graph without cycle) Each node corresponds to a random variable Associated with a CPT Each edge indicate dependent relationship Between connected variables Capture causal relationship Conditional probability tables (CPT) Enumerates the conditional probability of node Quantify causal relationships 15

Bayesian Networks (cont.) Detecting credit-card fraud (example only) We want to solve 16 Node Relation (cause to effect) CPT

Bayesian Inference Problem Statement (indirect inference) It is possible to predict someone’s attributes By looking their friends’ type In the real world People are acquainted via all types of relations A personal attribute may only be sensitive to certain types of relations To infer people’s privacy from social relations Must filter out other types of relations Investigate in homogeneous societies ( 同質社會 ) 17

Bayesian Inference (cont.) Homogeneous societies Reflect small closely related groups Office, class or clubs Individuals are connected by a single type of social relations In this case, “Friendship” Impact of every person on friends is the same 18

Bayesian Inference (cont.) To perform the inference Model the causal relation among people Infer attribute A for a person X 1. Construct Bayesian network from X’s social network 2. Analyze the Bayesian network For the Probability of X has attribute A Inference performed Single hop Inference Only involved direct friends Multiple Hops Inference Consider friends from multiple hops 19

Single hop Inference (method) The Case We know all direct friends’ attribute of node X Define Y ij as jth friend of X at i hops away. If a friend can be reached via more than one route Use shortest path, smaller i Let Y i be the set of Y ij (1 ≤ j ≤ n i ) Where n i is the number of X’s friends at I hops away For instance, Y 1 = {Y 11, Y 12,..., Y 1n 1 } Direct friends which are one hop away 20

Single hop Inference (cont.) An example Y 11, Y 12 and Y 13 are direct friends of X. The attribute values of Y 11, Y 12 and Y 13 are known (shaded nodes). 21

Single hop Inference (cont.) Bayesian Network Construction Two assumptions Localization Assumption Consider only direct friends is sufficient Naive Bayesian Assumption Remove relationship between friends 22

Single hop Inference (cont.) Localization Assumption Given the attribute values of X’s direct friends Y 1 Friends at more than one hop away (i.e., Y i for i > 1) Are conditionally independent of X. Inference only involved direct friends Removed Y 21 and Y 31 Decide DAG linking No cycle -> Obtain Bayesian Network Immediately Otherwise -> Remove cycles Deletion of edges with the weakest relations (approximation conversion) 23

Single hop Inference (cont.) Reduction via Localization Assumption 24

Single hop Inference (cont.) Naive Bayesian Assumption (For DAG) Given the attribute value of the query node X Attribute values of direct friends Y 1 Are conditionally independent of each other. Final DAG is obtained Removing connection between Y 11 and Y 12 25

Single hop Inference (cont.) Reduction via Naive Bayesian Assumption 26

Single hop Inference (cont.) Bayesian Inference Use the Bayes Decision Rule To predict the attribute of X For a general Bayesian network with maximum depth i For X, be the attribute value with the maximum conditional probability (posterior probability) Given the observed attribute values of other nodes in the network 27

Single hop Inference (cont.) Bayesian Inference (cont.) Use only direct friends (Y 1 ) (localization) P( X = x | Y 1 ) Y 1 are independent of each other (naïve bayes) P(Y 11, Y 12 | X = x) = P(Y 11 | X = x) P(Y 12 | X = x) x and y 1j are attribute of X and Y 1j, (1 ≤ j ≤ n 1, x,y ij ∈ {t, f } ) 28

Single hop Inference (cont.) Bayesian Inference (cont.) Assumed homogeneous network CPT for each node is the same P(Y 1j = y 1j | X = x) represent P(Y = y | X = x) Posterior probability Depends on N 1t, number of friends with attribute t P(X = x | N 1t = n 1t ) represent P(X = x | Y 1 ) If N 1t = n 1t, we obtain 29

Single hop Inference (cont.) Bayesian Inference (cont.) To compute (3) we need conditional probability P(Y = y | X = x) We apply the parameter estimation Substituting (4) and (3) into (1) yields x. 30

Multiple hop Inference In the real world People may hide their information Localization Assumption is not applicable Proposed generalized localization assumption Generalized Localization Assumption Given the attribute of jth friend of X as i hops, Y ij Attribute of X is conditionally independent of Descendants ( 子孫 ) of Y ij 31

Multiple hop Inference (cont.) Generalized Localization Assumption (cont.) If the attribute of X’s direct friend Y 1j is unknown The attribute of X is conditionally dependent on Attribute for direct friends of Y 1j Continue until we reach a Y 1j ’s descendent with known attribute. 32

Multiple hop Inference (cont.) Generalized Localization Assumption (cont.) Interpretation of this model When we predict attribute of X We treat him / her as egocentric person Influences his / her friends but not vice versa Attribute value of X can reflected by friends We still apply the Bayes Decision Rule Calculation of posterior probability is more complicated Use variable elimination Adopt same techniques to derive the x in (1) 33

Experimental Study The performance metric we consider Inference accuracy Percentage of nodes predicted correctly by inference Three Characters of social network Prior Probability Influence Strength Society Openness Might affect Bayesian influence 34

Experimental Study (cont.) Prior Probability P(X = t) The probability of people in social network have attribute A. Naïve inference if P(X = t) ≥ 0.5, we predict every query node has value t Otherwise has value f Average naive inference accuracy max (P(X = t), 1 − P(X = t)). We use as reference to compare Bayesian Inference 35

Experimental Study (cont.) Influence Strength P(Y = t | X = t) Conditional Probability Y has attribute A Given that direct friend X has same attribute Measures how X influence its friend Y Higher influence strength, higher probability that X and Y have attribute A Society Openness O(A) Percentage of people in society who release attribute A 36

Experimental Study (cont.) Data Set 66,766 profiles from Livejournal (2.6 million active members) 4, 031, 348 Friend relations Attribute assignment For each member, as assign a CPT Determine the actual attribute value Based on parent’s values and assigned CPT Start from set of nodes whose in-degree is 0 Explore rest of network though friendship links All members assigned with same CPT Use different CPT to evaluate inference performance. 37

Experimental Study (cont.) After attribute assignment We obtain a social network To infer each individual Built a corresponding Bayesian network Conduct Bayesain Inference 38

Experimental Result Comparison of Bayesian and Naive Inference Prior probability : 0.3 Influence strength : 0.1 to 0.9 39

Experimental Study (cont.) Effect of Influence Strength and Prior Probability Prior probability : 0.05, 0.1, 0,3 and 0.5 Influence strength : 0.1 to 0.9 Lowest accuracy occurs when Influence strength = prior probability Knowing friend relations provide no more information than knowing the prior probability People are actually interdependent Better Bayesian Inference High difference between influence strength and prior probability Stronger influence of parent on children ( no matter positive or negative) 40

Experimental Result 41

Experimental Study (cont.) Society Openness Assume the society openness is 100% All friends’ attribute value are known Study inference at different levels of openness Randomly hide the attribute of a certain percentage of members (from 10% to 90%) Setting Prior probability P(X = t) = 0.3 Society Openness O(A) = 10%, 50% and 90% 42

Experimental Result 43

Experimental Result Inference Accuracy Decrease when more attributes are hidden But is relatively small Generally should drop drastically Discuss on Society Openness 44

Discussions on Society Openness Single Hop Inference The Bayesian network is two-level tree Derive the variation of posterior probability Due to change of openness N 1t and N’ 1t is number of friends have attribute t Before and after hiding h friends Hiding is the same as remove in inference ΔP(X = t | N 1t = n 1t,N’ 1t = n’ 1t ) Result 70% to 90% of the cases, the variation is less then 0.1 Unlikely to be varied greatly due to hiding nodes randomly 45

Discussions on Society Openness (cont.) 46

Discussions on Society Openness (cont.) Multiple Hop Inference Use complete k-ary trees All internal nodes have k children Hide a node with all of its ancestors ( 祖先 ) Check the variation of posterior probability Number of children k Maximum depth of hidden nodes d Prior probability : 0.3 Influence strength : 0.7 Result When k = 1, posterior probability varies significant by more hidden nodes When k > 1, posterior probability does not varies very much 47

Discussions on Society Openness (cont.) Multiple Hop Inference (cont.) k = 2 and d = 2 (Y 11 and Y 21 are hidden) 48

Discussions on Society Openness (cont.) 49

Conclusions Privacy may be indirectly released via social relations Inference accuracy of privacy information Closely related to the inference strength between friends Even in a society where people hide their attributes, privacy still could be inferred from Bayesian inference. Protect Privacy Hide friendship relation Or ask friends to hide attributes 50

Thank you! References [1] David Heckerman, A Tutorial on Learning With Bayesian Networks. Microsoft Research, Advanced Technology Division, 1996 [2] D.S. Sivia, Data Analysis: A Bayesian Tutorial. Oxford University Press, 1996. Questions? 51

Inferring Privacy Information from Social Networks Presenter: Ieng-Fat Lam Date: 2007 / 12 / 25.

Similar presentations

Presentation on theme: "Inferring Privacy Information from Social Networks Presenter: Ieng-Fat Lam Date: 2007 / 12 / 25."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inferring Privacy Information from Social Networks Presenter: Ieng-Fat Lam Date: 2007 / 12 / 25.

Similar presentations

Presentation on theme: "Inferring Privacy Information from Social Networks Presenter: Ieng-Fat Lam Date: 2007 / 12 / 25."— Presentation transcript:

Similar presentations

About project

Feedback