Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation.

Slides:



Advertisements
Similar presentations
21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.
Advertisements

Leting Wu Xiaowei Ying, Xintao Wu Aidong Lu and Zhi-Hua Zhou PAKDD 2011 Spectral Analysis of k-balanced Signed Graphs 1.
Information System Audit : © South-Asian Management Technologies Foundation Chapter 4: Information System Audit Requirements.
Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud Detection in Social Networks 1.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Demo, May 2005 Privacy Preserving Database Application Testing Xintao Wu, Yongge Wang, Yuliang Zheng, UNC Charlotte.
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Security of Computerized Medical Information: Threats from Authorized Users James G. Anderson, Ph.D. Purdue University.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
Big Data A big step towards innovation, competition and productivity.
GAYATRI SWAMYNATHAN, CHRISTO WILSON, BRYCE BOE, KEVIN ALMEROTH AND BEN Y. ZHAO UC SANTA BARBARA Do Social Networks Improve e-Commerce? A Study on Social.
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
3-1 Chapter Three. 3-2 Secondary Data vs. Primary Data Secondary Data: Data that have been gathered previously. Primary Data: New data gathered to help.
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
BIG DATA NICOLAS MUNOZ. Topics What is Big Data? Benefits & Drawbacks How does it work? Companies doing Big Data Market for Big Data Applications of Big.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Chapter 11 Databases.
This work was supported by the TRUST Center (NSF award number CCF ) Introduction In 1995 Mary J. Culnan stated that ‘fair information practices.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Improving Intrusion Detection System Taminee Shinasharkey CS689 11/2/00.
Multimedia Databases (MMDB)
© 2003 East Collaborative e ast COLLABORATIVE ® eC SoftwareProducts TrackeCHealth.
FaceTrust: Assessing the Credibility of Online Personas via Social Networks Michael Sirivianos, Kyungbaek Kim and Xiaowei Yang in collaboration with J.W.
Security of the Distributed Electronic Patient Record: A Case-Based Approach James G. Anderson, Ph.D. Purdue University.
Privacy in computing Material/text on the slides from Chapter 10 Textbook: Pfleeger.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
Xiaowei Ying, Leting Wu, Xintao Wu University of North Carolina at Charlotte Privacy and Spectral Analysis on Social Network Randomization.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Organizing Data and Information
Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media.
OWL Representing Information Using the Web Ontology Language.
Xintao Wu Nov 19,2015 Social Computing in Big Data Era – Privacy Preservation and Fairness Awareness 1.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
Copyright © 2015 by Saunders, an imprint of Elsevier Inc. All rights reserved. Chapter 3 Privacy, Confidentiality, and Security.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
E-Commerce E-Commerce Security?? Instructor: Safaa S.Y. Dalloul E-Business Level Try to be the Best.
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
Sybil Attacks VS Identity Clone Attacks in Online Social Networks Lei Jin, Xuelian Long, Hassan Takabi, James B.D. Joshi School of Information Sciences.
Big Data Analytics Are we at risk? Dr. Csilla Farkas Director Center for Information Assurance Engineering (CIAE) Department of Computer Science and Engineering.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
Big Data Security Issues in Cloud Management. BDWG Big Data Working Group Researchers 1: Data analytics for security 2: Privacy preserving 3: Big data-scale.
Computer Forensics. OVERVIEW OF SEMINAR Introduction Introduction Defining Cyber Crime Defining Cyber Crime Cyber Crime Cyber Crime Cyber Crime As Global.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Information Security and Privacy in HRIS
Judicial Training on Data Protection and Privacy Rights
Pengantar Sistem Informasi
Privacy-preserving Release of Statistics: Differential Privacy
563.10: Bloom Cookies Web Search Personalization without User Tracking
Component 4: Introduction to Information and Computer Science Unit 2: Internet and the World Wide Web Lecture 4 This material was developed by Oregon.
Dieudo Mulamba November 2017
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Big Data: Four Vs Salhuldin Alqarghuli.
GANG: Detecting Fraudulent Users in OSNs
Data Warehousing Data Mining Privacy
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Presentation transcript:

Xintao Wu Aug 25,2014 Research Overview 1

Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation Fraud Detection in Social Networks  Spectral analysis of graph topology  Detecting Random Link Attacks Detecting weak anomalies Sample Projects Conclusions and Future work 2

Trustworthy Computing Trustworthy = reliability, security, privacy, usability Sample research challenges  Understand and capture emergent behaviors/interactions among regular users, fraudsters, and victims  Design secure, survivable, persistent systems when under attack  Enable privacy protection in collecting/analyzing/sharing personal data 3

Privacy Breach Cases Nydia Velázquez (1994)  Medical record on her suicide attempt was disclosed AOL Search Log (2006)  Anonymized release of 650K users’ search histories lasted for less than 24 hours NetFlix Contest (2009)  $1M contest was cancelled due to privacy lawsuit 23andMe (2013)  Genetic testing was ordered to discontinue by FDA due to genetic privacy 4

Acxiom Privacy  In 2003, the EPIC alleged Acxiom provided consumer information to US Army "to determine how information from public and private records might be analyzed to help defend military bases from attack."  In 2013 Acxiom was among nine companies that the FTC investigated to see how they collect and use consumer data. Security  In 2003, more than 1.6 billion customer records were stolen during the transmission of information to and from Acxiom's clients. 5

6 Most restrictedRestricted Some restrictions Minimal restrictions Effectively no restrictions No legislation or no information Privacy Regulation -- Forrester

Privacy Protection Laws USA HIPAA for health care Grann-Leach-Bliley Act of 1999 for financial institutions COPPA for children online privacy State regulations, e.g., California State Bill 1386 Canada PIPEDA Personal Information Protection and Electronic Documents Act European Union Directive 94/46/EC - Provides guidelines for member state legislation and forbids sharing data with states that do not protect privacy Contractual obligations Individuals should have notice about how their data is used and have opt-out choices 7

Privacy Preserving Data Mining 8 ssnnameziprace…ageSexincome…disease 28223Asian…20M85k…Cancer 28223Asian…30F70k…Flu 28262Black…20M120k…Heart 28261White…26M23k…Cancer..…...… Asian…20M110k…Flu 69% unique on zip and birth date 87% with zip, birth date and gender Generalization (k-anonymity, l- diversity, t-closeness) Randomization

Social Network Data 9 Data owner Data miner release namesexagediseasesalary AdaF18cancer25k BobM25heart110k CathyF20cancer70k DellM65flu65k EdM60cancer300k FredM24flu20k GeorgeM22cancer45k HarryM40flu95k IreneF45heart70k idSexagediseasesalary 5FYcancer25k 3MYheart110k 6FYcancer70k 1MOflu65k 7MOcancer300k 2MYflu20k 9MYcancer45k 4MMflu95k 8FMheart70k

Threat of Re-identification 10 idSexagediseasesalary 5FYcancer25k 3MYheart110k 6FYcancer70k 1MOflu65k 7MOcancer300k 2MYflu20k 9MYcancer45k 4MMflu95k 8FMheart70k Attacker attack Privacy breaches Identity disclosure Link disclosure Attribute disclosure

Privacy Preservation in Social Network Analysis Input Perturbation K-anonymity Generalization Randomization Output Perturbation Background on differential privacy Differential privacy preserving social network mining 11

Our Work Feature preservation randomization  Spectrum preserving randomization (SDM08)  Markov chain based feature preserving randomization (SDM09) Reconstruction from randomized graph (SDM10) Link privacy (from the attacker perspective)  Exploiting node similarity feature (PAKDD09 Best Student Paper Runner-up Award)  Exploiting graph space via Markov chain (SDM09) 12

PSNet (NSF ) 13

Output Perturbation 14 Data owner Data miner namesexagediseasesalary AdaF18cancer25k BobM25heart110k CathyF20cancer70k DellM65flu65k EdM60cancer300k FredM24flu20k GeorgeM22cancer45k HarryM40flu95k IreneF45heart70k Query f Query result + noise Cannot be used to derive whether any individual is included in the database

Differential Guarantee [Dwork, TCC06] 15 namedisease Adacancer Bobheart Cathycancer Dellflu Edcancer Fredflu f count(#cancer) f(x) + noise namedisease Adacancer Bobheart Cathycancer Dellflu Edcancer Fredflu K K f count(#cancer) f(x’) + noise 3 + noise 2 + noise achieving Opt-Out

Our Work DP-preserving cluster coefficient ( ASONAM12 )  Divide and conquer  Smooth sensitivity DP-preserving spectral graph analysis (PAKDD13)  LNPP: based on the Laplace Noise Perturbation  SBMF: based on the Exponential Mechanism and MBF density Linear-refinement of DP-preserving query answering (PAKDD13 Best Application Paper) DP-preserving graph generation based on degree correlation (TDP13) 16

SMASH (NIH R01GM103309) 17

Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation Fraud Detection  Spectral analysis of graph topology  Detecting Random Link Attacks  Detecting weak anomalies Sample Projects Conclusions and Future work 18

Cyber Fraud Cyber crime  cost US economy $400 Billion annually OSN Fraud and Attack  Sybil attack, spam, viral marketing, fraudulent auction, brand jacking, denial of service, etc.  Fake followers on Twitter (used in viral marketing) worth $360 million annually on the black market. 19

Fraud Characterization Individual vs. collusive Robot vs. money-motivated regular user Random vs. selective target Static vs. dynamic Traditional topology-based detection methods incur high computational cost difficult to detect collaborative attacks or subtle anomalies Topology-based Detection 20

An abstraction of collaborative attacks including spam, viral marketing, etc. The attacker creates some fake nodes and uses them to attack a large set of randomly selected regular nodes; Fake nodes also mimic the real graph structure among themselves to evade detection. Random Link Attack [Shirvastava ICDE08] 21

Spectral Graph Analysis based Fraud Detection Examine the spectral space of graph topology. A network with n nodes and m edges that is undirected, un- weighted, and without considering link/node attribute information Adjacency Matrix A (symmetric) Adjacency Eigenspace 22

Eigenspace 23 PrincipalMinor

Projecting Node in Spectral Space [SDM09] 24 Spectral coordinate: k-orthogonal line pattern when nodes u, v from the same community when nodes u, v from different communities

Example 25 Spectral coordinate: Polbook Network

A snapshot of websites in domain.UK (2007) (114K nodes and 1.8M links), add a mix of 8 RLAs with varied sizes and connection patterns. SPCTRA: based on spectral space GREEDY: based on outer-triangles [Shrivastava, ICDE08] Evaluation on Web spam challenge data [ICDE11] 26 Much faster 36s vs. 26h

Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation Fraud Detection Spectral analysis of graph topology Detecting random link attacks Detecting weak anomalies Sample Projects Conclusions and Future work 27

28 Privacy Preserving Data Mining (NSF CAREER) 28

Genetic Privacy ( NSF SCH pending) 29 BIBM13 Best Paper Award

oSafari ( NSF SaTC) 30

Manipulation in E-Commerce (NSF III pending) 31 Structured Topic Analysis Spectral Bipartite Graph Analysis D-S based Evidence Fusion Bot-committed Money-motivated Reviews Ratings Ranks

GWAS Genome-wide association studies (GWAS) typically focus on associations between single- nucleotide polymorphism s (SNPs) and human traits like common diseases. 32

Privacy Preserving Database Application testing (NSF ) ER Data DDL Catalog Production db RNRS Conflict resolution Disclosure Assessment Rule Analyzer R’NR’S’ Schema & Domain Filter Schema’Domain’ Data GeneratorMock DB User 33

Data Generation for Testing DB Applications (NSF ) How to generate data to cover paths? 34

Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation Fraud Detection  Spectral analysis of graph topology  Detecting Random Link Attacks  Detecting weak anomalies Sample Projects Conclusions and Future work 35

Big Data Computing Drowning in data  Volume, Velocity, Variety, and Veracity  2.5 Exabyte every day  Web data, healthcare, e-commerce, social network Advancing technology  Cheap storage/processing power  Growth in huge data centers  Data is in the “cloud”- Amazon AWS, Hadoop, Azure  Computing is in the “cloud” 36

Social Media Customer Analytics 37 Network topology (friendship,followship,interaction) namesexagediseasesalary AdaF18cancer25k BobM25heart110k … idSexageaddressIncome 5FYNC25k 3MYSC110k Structured profile Retweet sequence Product and review Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy Unstructured text (e.g., blog, tweet) Transaction database Velocity, Variety 10GB tweets per day Belk and Lowe’s Chancellor’s special fund

38

39

Samsung AVC Denial Log Analysis 40 Volume and Velocity:1 million log files per day and each has thousands entries S3, Hive and EMR

Drivers of Data Computing 41 6A’s Anytime Anywhere Access to Anything by Anyone Authorized 4V’s Volume Velocity Variety Veracity Reliability Security Privacy Usability

Thank You! Questions? 42 Collaborators: Aidong Lu, Xinghua Shi, Jun Li (Oregon), Dejing Dou (Oregon), Tao Xie (UIUC) Doctoral graduates: Songtao Guo, Ling Guo, Kai Pan, Leting Wu, Xiaowei Ying Doctoral Students: Yue Wang, Yuemeng Li, Zhilin Luo (visiting)