Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.

Slides:



Advertisements
Similar presentations
Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao.
Advertisements

Using the Self Service BMC Helpdesk
Operating System Security
Information System Audit : © South-Asian Management Technologies Foundation Chapter 4: Information System Audit Requirements.
Differential Privacy on Linked Data: Theory and Implementation
SplitX: High-Performance Private Analytics Ruichuan Chen (Bell Labs / Alcatel-Lucent) Istemi Ekin Akkus (MPI-SWS) Paul Francis (MPI-SWS)
Protecting User Data in Ubiquitous Computing: Towards Trustworthy Environments Yitao Duan and John Canny UC Berkeley.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Chapter 9 - Control in Computerized Environment ATG 383 – Spring 2002.
Proof System HY-566. Proof layer Next layer of SW is logic and proof layers. – allow the user to state any logical principles, – computer can to infer.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
11 3 / 12 CHAPTER Databases MIS105 Lec14 Irfan Ahmed Ilyas.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Processing Integrity and Availability Controls
Knowledge is Power Marketing Information System (MIS) determines what information managers need and then gathers, sorts, analyzes, stores, and distributes.
Anonymizing Web Services Through a Club Mechanism With Economic Incentives Mamata Jenamani Leszek Lilien Bharat Bhargava Department of Computer Sciences.
Hippocratic Databases Paper by Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu CS 681 Presented by Xi Hua March 1st,Spring05.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
Privacy By Design Sample Use Case Privacy Controls Insurance Application- Vehicle Data.
Conceptual Architecture of PostgreSQL PopSQL Andrew Heard, Daniel Basilio, Eril Berkok, Julia Canella, Mark Fischer, Misiu Godfrey.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
1 1 Interoperating: MIT’s Fusion Center Prototype & JHU/APL’s Back End Attribute Exchange (Identity Management Testbed) January 2013.
● Problem statement ● Proposed solution ● Proposed product ● Product Features ● Web Service ● Delegation ● Revocation ● Report Generation ● XACML 3.0.
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
Application Training — Lead Management System. Slide 2 Module Agenda Module Break-upDuration (minutes) Lesson 1: Introduction to Lead Management System10.
Preventing SQL Injection Attacks in Stored Procedures Alex Hertz Chris Daiello CAP6135Dr. Cliff Zou University of Central Florida March 19, 2009.
Advanced Web Forms with Databases Programming Right from the Start with Visual Basic.NET 1/e 13.
User Group Housekeeping in Gold. Regular routines make housekeeping easier.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Using Semantic Web Data: Inference Lalana Kagal Decentralized Information Group MIT CSAIL Eric Prud'hommeaux Engineer World Wide Web Consortium.
Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.
SECURE WEB APPLICATIONS VIA AUTOMATIC PARTITIONING S. Chong, J. Liu, A. C. Myers, X. Qi, K. Vikram, L. Zheng, X. Zheng Cornell University.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
MEMBERSHIP AND IDENTITY Active server pages (ASP.NET) 1 Chapter-4.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Securing Passwords Against Dictionary Attacks Presented By Chad Frommeyer.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Database Security Lesson Introduction ●Understand the importance of securing data stored in databases ●Learn how the structured nature of data in databases.
Differential Privacy on Linked Data: Theory and Implementation Yotam Aron.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
ASSIGNMENT 2 Salim Malakouti. Ticketing Website  User submits tickets  Admins answer tickets or take appropriate actions.
UFCFY5-30-1Multimedia Studio Scripting for Interactive Media Times Table Quiz This will contribute towards your online portfolio for this module.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
PMRM Revision Discussion Slides Illustrations/Figures 1-3 o Model, Methodology, “Scope” options Functions, Mechanisms and “Solutions” Accountability and.
PREPARED BY: MS. ANGELA R.ICO & MS. AILEEN E. QUITNO (MSE-COE) COURSE TITLE: OPERATING SYSTEM PROF. GISELA MAY A. ALBANO PREPARED BY: MS. ANGELA R.ICO.
/16 Final Project Report By Facializer Team Final Project Report Eagle, Leo, Bessie, Five, Evan Dan, Kyle, Ben, Caleb.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Streaming Semantic Data COMP6215 Semantic Web Technologies Dr Nicholas Gibbins –
Johannesburg, October 2010 THE CHILEAN INTEGRATED SOCIAL INFORMATION SYSTEM (SIIS)
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Advanced Higher Computing Science
University of Texas at El Paso
Cloud based linked data platform for Structural Engineering Experiment
Using Semantic Web Data: Proof
Privacy-preserving Release of Statistics: Differential Privacy
Designing Private Forums
Division of Air AirCom DARM’s New Compliance and Enforcement Database and Field Inspection Tool.
Inference and Flow Control
Policy reasoning A policy is a set of norms that define optimal behavior of agents in a system What does policy reasoning usually entail ? Proving that.
Published in: IEEE Transactions on Industrial Informatics
Some contents are borrowed from Adam Smith’s slides
Best Practices in Higher Education Student Data Warehousing Forum
Differential Privacy (1)
Security in Computing, Fifth Edition
Presentation transcript:

Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron

Overview Motivation and Goal Background Proposed Solution and Design Example Conclusion

Motivation Data mining continues to become more widespread. ◦ Useful for research, public policy, etc. Want to maintain privacy of participants in the database. Little work has been done for privacy for semantic web data.

Previous Work Anonymization K-Anonimity 1 Differential Privacy systems: PINQ 2, AIRAVAT 3. Drawbacks: ◦ Do not apply to semantic web data. ◦ Do not support SPARQL.

Goal Develop a system to protect dataset participants’ personal data in SPARQL. Integrates well with existing SPARQL endpoints. Relatively easy for the user and the administrator to use.

Background Rule-based Privacy Policies in AIR Differential Privacy

Rule-based Privacy Policies in AIR 4 Rules define patterns in a SPARQL query. If pattern is matched, rule infers compliance or non-compliance of incoming SPARQL query.

AIR Example 5 air:if { :W s:TriplePattern :T. :T log:includes { :X type:F :V }. }; air:then [ air:description (“type:F was selected in " q:QUERY) ; air:assert { q:QUERY air:non-compliant-with q:Policy4. } ]. SELECT ?s WHERE {?s type:F ?p} AIR Policy (extract) Query AIR will show that the query is non- compliant with Policy4.

Differential Privacy Overview Minimize probability of privacy breach. Maximize statistical accuracy. Definition requires that given two similar datasets, a function query on those two datasets give similar results with high probability. Makes no assumptions on the underlying dataset.

Differential Privacy Definition: We say a randomized computation M provides ɛ- differential privacy if for any two data sets A and B, and any set of possible outputs S ⊆ Range(M), Pr[M(A) ∈ S] ≤ Pr[M(B) ∈ S] × exp( ɛ × |A ⊕ B|).

Differential Privacy in Practice

Limitations of Differential Privacy Only statistical data protected. High variance in data yields poor query results. Theory not always perfect in practice. ◦ Assume no collusion among users. ◦ Covert channel attacks. 6 ◦ What value of ɛ to choose ?

Example, No DP NameSalary Alice31,000 Bob47,000 Charlie20,000 David21,000 SELECT COUNT(Name) WHERE (Age < 25) 2

Example, No DP NameSalary Alice31,000 Bob47,000 Charlie20,000 SELECT COUNT(Name) WHERE (Age < 25) 1 Big difference in answers!!

Example, With DP NameSalary Alice31,000 Bob47,000 Charlie20,000 David21,000 SELECT COUNT(Name) WHERE (Age < 25) 2 + noise = ~2 (with high probability)

Example, With DP NameSalary Alice31,000 Bob47,000 Charlie20,000 SELECT COUNT(Name) WHERE (Age < 25) 1+ noise = ~2 (with high probability) With high probability, records are indistinguishable!

Practical Consequences of DP An individual’s inclusion in the dataset is not likely a privacy risk. The answers to the queries can still be useful.

Achieving Differential Privacy in RDF Current techniques for differential privacy are developed for relational databases. As a first approximation, reduce triple- store to a relational database. Improved mechanism as project progresses.

Example of RDF-RDBS Reduction :Person1 foaf:name “Alice”; foaf:member :DIG foaf:age “21” foaf:knows :Person2 :Person3. :Person2 foaf:name “Bob”; foaf:member :DIG; foaf:knows :Person3. :Person3 foaf:name “Charlie”; foaf:age “22”. IDFoaf:nameFoaf:memberFoaf:knowsFoaf:age Person1“Alice”DIG[Person2,Pers on3 “21” Person2“Bob”DIG[Person3]None Person3“Charlie”None “22”

Proposed Solution SPARQL Privacy Insurance Module (SPIM) Build layer between user and endpoint. Integrate both AIR and differential privacy. Integrate credential-checking system. Modify existing differential privacy framework for use with triple-stores.

Contributions Complete privacy protection for triplestores. Differential Privacy sensitivity for SPARQL 1.1 aggregate functions including count, sum, avg, sum, min, and max.

System Overview

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description TAAC Will: Verify user has permission to access Send central module data about user

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM: Controls order of privacy operations. Interfaces with the SPARQL endpoint.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description AIR: Reasoner that uses rule-based policies to check queries for privacy hazards. Extracts information for differential privacy.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Policy Files: Contain the rules for AIR.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Differential Privacy Module: Checks to see for query limits (based off ɛ use. Applies noise to statistical data.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description User Data: Contains user ɛ data.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM: Controls order of privacy operations. Interfaces with the SPARQL endpoint.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Service Description: Contains information to be used for the addition of noise.

Miscellaneous: Interface to SPARQL Endpoint Transaction File Improved Differential Privacy Output Service Description Generator

Potential Extensions: Robustness against attacks Concurrency Optimization for large systems Customizable UI Accountability

Sample Scenario Triplestore datamining in biotechnological applications. Biofirm provides data about hospitals in the US. Alice is a PhD student at MIT. Alice would like to query Biofirm’s database for research purposes. She just got permissions yesterday and is logging in for the first time.

Preprocessing Biofirm installs SPIM, and runs the service description generation code. ◦ May need to create the correct interface. Makes sure the UI is accessible online.

Sample Compliant Query Alice would like to know the total number of visits that Boston hospitals received. SELECT (SUM(?s) as ?people) WHERE{ ?h a biofirm:Hospital. ?h biofirm:visits ?s. ?h biofirm:location geo:Boston. } Epsilon value: 1.0

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Alice enters query into the provided user interface.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description TAAC insures that biofirm has given Alice access to its triple-store.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Query request arrives at SPIM central module.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Policyrunner is called upon to check query for triple patterns that are in violation. No violations found. Since this is Alice’s first time, AIR extracts what type of permissions Alice has.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM creates a profile for Alice. Gives her an ɛ value (suppose it 2.0). Stores it in triple store.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM extracts which variables will yield statistical results and will have differential privacy applied.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Differential Privacy module assures that query’s results will not exceed given epsilon value.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description This is Alice’s first time, and her epsilon value is 2.0 and the epsilon for this query is 1.0. Everything looks good.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Query is sent to the endpoint. Results are received.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Differential privacy module adds noise to appropriate fields, and updates epsilon values.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description SPIM is ready to return the results.

SPIM Privacy Module TAAC Credential Checking AIR Rule Based Privacy Differential Privacy Module SPARQL Endpoint User Interface Policy Files User Data Service Description Alice receives results.

Summary System will combine rule-based privacy with differential privacy. Develop differential privacy techniques for semantic web data. Make privacy module client and administrator friendly.

References K-Anonimity: PINQ: AIRAVAT: AIR: AIR Policy Example: policies.n3http://dig.csail.mit.edu/2009/IARPA-PIR/usecase1/generic- policies.n3 Differential Privacy Under Fire: