Shumin Guo, Keke Chen Data Intensive Analysis and Computing (DIAC) Lab

Slides:

Advertisements

Similar presentations

Critical Reading Strategies: Overview of Research Process

Advertisements

Chapter 9 Introduction to the t-statistic

Asset Pricing. Pricing Determining a fair value (price) for an investment is an important task. At the beginning of the semester, we dealt with the pricing.

Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.

Identity Management Based on P3P Authors: Oliver Berthold and Marit Kohntopp P3P = Platform for Privacy Preferences Project.

INFORMATION REVELATION AND PRIVACY IN ONLINE SOCIAL NETWORKS (THE FACEBOOK CASE) Rachid Hamcha IMM.

Investment Science D.G. Luenberger

Privacy in Social Networks CSCE 201. Reading Dwyer, Hiltz, Passerini, Trust and privacy concern within social networking sites: A comparison of Facebook.

 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.

Adjusting your Facebook Privacy Settings Privacy, please!

Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,

Chapter 9 Hypothesis Testing.

Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.

10 Privacy Settings Every Facebook User Should Know.

You can customize your privacy settings. The privacy page gives you control over who can view your content. At most only your friends, their friends and.

ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

CriteriaExemplary (4 - 5) Good (2 – 3) Needs Improvement (0 – 1) Identifying Problem and Main Objective Initial QuestionsQuestions are probing and help.

The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.

Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,

Estimating pension discount rates David McCarthy.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.

A Data-Reachability Model for Elucidating Privacy and Security Risks Related to the Use of Online Social Networks S. Creese, M. Goldsmith, J. Nurse, E.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.

Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.

1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.

Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.

Intro to Research Methods

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Security in Outsourcing of Association Rule Mining

Risk and Return.

Chapter 21 More About Tests.

HYPOTHESIS TESTING Asst Prof Dr. Ahmed Sameer Alnuaimi.

Data Analysis and Standard Setting

Classical Test Theory Margaret Wu.

Drum: A Rhythmic Approach to Interactive Analytics on Large Data

Paul K. Crane, MD MPH Dan M. Mungas, PhD

Essay writing Politics and Society.

Dieudo Mulamba November 2017

Enhanced-alignment Measure for Binary Foreground Map Evaluation

Differential Privacy in Practice

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

Lesson 5. Lesson 5 Extraneous variables Extraneous variable (EV) is a general term for any variable, other than the IV, that might affect the results.

Chapter 9 Hypothesis Testing.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Chapter 10: Estimating with Confidence

Chapter 8: Estimating with Confidence

Writing Learning Outcomes

Bias-variance Trade-off

Privacy Protection for Social Network Services

Technology For Tomorrow - Intro to Facebook

Chapter 8: Estimating with Confidence

Data Warehousing Data Mining Privacy

Neural networks (1) Traditional multi-layer perceptrons

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

DESIGN OF EXPERIMENTS by R. C. Baker

Putting a face to a name: students’ use of profiles in VLE forums

Reasoning in Psychology Using Statistics

1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Objectives 6.1 Estimating with confidence Statistical confidence

Objectives 6.1 Estimating with confidence Statistical confidence

Presentation transcript:

Mining Privacy Settings to Find Optimal Privacy-Utility Tradeoffs for Social Network Services Shumin Guo, Keke Chen Data Intensive Analysis and Computing (DIAC) Lab Kno.e.sis Center Wright State University

Outline Introduction Our modeling methods The experiments Conclusions Background Research goals Contributions Our modeling methods The IRT model Our research hypothesis Modeling social network privacy and utility The weighted/personalized utility model Trade off between privacy and utility The experiments Social network data from Facebook Experimental Results Conclusions

Introduction

Background Social network services (SNS) are popular SNS are filled up with private info  privacy risks Online identity theft Insurance discrimination … Protecting SNS privacy is complicated Many new young users Do not realize privacy risks Do not know how to protect their privacy Privacy settings consist of tens of options involve implicit privacy-utility tradeoff A privacy guidance for new young users?

Some facts Privacy settings of Facebook 27 items Each item is set to one of the four levels of exposure (“me only”, “friends only”, “friends of friends”, “everyone”) By default, most items are set to the highest exposure level the best interest to the SNS provider is to get people exposed and connected to each other

Research goals Understand the SNS privacy problem The level of “privacy sensitivity” for each personal item Quantification of privacy The balance between privacy and SNS utility Enhancement of SNS privacy How to help users express their privacy concerns? How to help users automate the privacy configuration with utility preference in mind?

Our contributions Develop a privacy quantification framework that considers both privacy and utility Understand common users’ privacy concerns Help users achieve optimal privacy settings based on their utility preferences We study the framework with real data obtained from Facebook

Modeling SNS Users’ Privacy Concerns

Basic idea Use the Item Response Theory (IRT) model to understand existing SNS users’ privacy settings Derive the quantification of privacy concern with the privacy IRT model Map a new user’s privacy concern to the IRT model  find the best privacy setting

The Item Response Theory(IRT) model A classic model used in standard test evaluation Example, estimate the ability level of an examinee based on his/her answers to the a number of questions

The two-parametric model The black curve represents Question 1 and the red Question 2. X-axis represents the quantification of “ability” – or attitudes such as “privacy concern” Y-axis represents the probability of giving the right answer for that question Beta represents the difficulty level – beta2 is larger than beta1, which means Q2 is more difficult; at the same ability level, the probability of giving the right answer for Q2 is lower than Q1 Alpha represents the discrimination level of the question – whether it can clearly discriminate low-ability users from high ability ones; flat curves (alpha is small) certainly have lower discrimination power The IRT model is learned from users’ answers to a bunch of questions (or the privacy settings for a number of items) The two-parametric model α  level of discrimination for a certain question β  Level of difficulty for a certain question θ  Level of a person’s certain trait

Mapping to privacy problem Question answer  profile item setting Ability level of privacy concern Beta  sensitivity of profile item Alpha  contribution to overall privacy concern

What we get… Probability of hiding the item relationships network Level of privacy concern  Probability of hiding the item network relationships Current_city

Our Research Approach Observation: Users disclose some profile items while hide others If a user believes an item is not too sensitive, he/she will disclose this item If a user perceives an item as critical to realize his/her social utility, he/she may also disclose it Otherwise, user will hide this item Hypothesis: Users have some implicit balance judgment behind their SNS activities If utility gain > privacy risk  disclose If utility gain < privacy risk  hide

Modeling SNS privacy Use the two-parametric IRT model New interpretation of the IRT model α  profile-item weight for a user’s overall privacy concern β  Sensitivity level of the profile item θ  Level of a user’s privacy concern

The complete result looks like…

Finding optimal settings Theorem: Privacy rating at i User settings for items: 1: hidden, 0: disclosed Probability of hiding the item

Modeling SNS utility – the same method λ  profile-item weight for a user’s SNS utility μ  importance level of the profile item φ  Level of a user’s utility preference We can derive: λ = α and μ = -β For utility model, we have: Exposing an item  lose privacy but gain utility of that item is the flip of sij

An important result For a specific privacy setting over theta_i Privacy rating + utility rating ≈ a constant Privacy-utility are linearly related

The weighted/personalized utility model Users often have clear intention for using SN but have less knowledge on privacy Users want to put higher utility weight on (a) certain group(s) of profile items than others Users can assign specific weights to profile items to express his/her preference The utility IRT model can be revised with a weighted model (skip the details here)

Illustration of Tradeoff between privacy and utility

The Experiments

The Real Data from Facebook Data crawled from Facebook with two accounts Account normal: a normal Facebook account, which has a certain number of friends Account fake: a fake account with no friends Data crawling steps For the friends and “friends of friends” (FOF) of account normal, crawl the profile item visibility of each user For the same group of users, crawl the visibility of the fake account’s FoFs’ profile items again We have the following inference rules

Deriving privacy settings of users Based on the data crawled with the FoF of the two accounts, we derive the privacy setting of a user based on the following rules E: everyone, FoF: friends of Friends, O:the account owner, F: Friends only

Experimental Results “Cleaned # of FoF”: Shared FoFs are removed to reduce the bias in modeling

Note: The items that are not filled up by the user are also treated as “hidden”. - Some people simply ignore some items or those items have no value, e.g., “graduate school” - it is consistent with our rationale of “disclosing/hiding” items

Validated with 5-fold cross-validation With p-value <0.05

Privacy rating real setting For each theta_i : the level of privacy concern, there is one “privacy rating” (defined in previous slide”) The real settings may deviate from the ideal setting Privacy rating real setting

Results of learning weighted utility model The weighting scheme is to be studied.

Tradeoff between privacy and utility (unweighted) Utility rating Privacy rating Few people have very high level of privacy concern More people tend to have lower privacy ratings, or implicitly higher utility ratings The relationship is about linear, Very few have very high privacy rating.

Tradeoff between privacy and weighted utility

Conclusion A framework to address the tradeoff between privacy and utility Latent trait model (IRT) is used for modeling privacy and utility We develop a personalized utility model and a tradeoff method for users to find optimal configuration based on utility preferences The models are validated with a large dataset crawled from Facebook