Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya.

Slides:



Advertisements
Similar presentations
Bandwidth Estimation for IEEE Based Ad Hoc Networks.
Advertisements

A DISTRIBUTED CSMA ALGORITHM FOR THROUGHPUT AND UTILITY MAXIMIZATION IN WIRELESS NETWORKS.
Abstract There is significant need to improve existing techniques for clustering multivariate network traffic flow record and quickly infer underlying.
Secure Data Storage in Cloud Computing Submitted by A.Senthil Kumar( ) C.Karthik( ) H.Sheik mohideen( ) S.Lakshmi rajan( )
Detecting Data Leakage Panagiotis Papadimitriou Hector Garcia-Molina
Vishal Patil Paresh Rawat Pratik Nikam Satish Patil By: Under The Guidance Of Prof.Rucha Samant.
S EMINAR A SELF DESTRUCTING DATA SYSTEM BASED ON ACTIVE STORAGE FRAMEWORK ONON P RESENTED BY S HANKAR G ADHVE G UIDED BY P ROF.P RAFUL P ARDHI.
Requirements Specification
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
 Our system is a Web based system.  The question is what does a recruitment consultant do?  A recruitment consultant is the intermediary between companies.
Online Job Portal with Exam
Toward a Statistical Framework for Source Anonymity in Sensor Networks.
A Secure Protocol for Spontaneous Wireless Ad Hoc Networks Creation.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Abstract Provable data possession (PDP) is a probabilistic proof technique for cloud service providers (CSPs) to prove the clients' data integrity without.
Secure Encounter-based Mobile Social Networks: Requirements, Designs, and Tradeoffs.
01-Feb-12Data Leakage Detection1. CONTENTS  ABSTRACT  INTRODUCTION  OBJECTIVES  STUDY AND ANALYSIS  FLOW CHART  FUTURE SCOPE  LIMITATIONS  APPLICATIONS.
Layered Approach using Conditional Random Fields For Intrusion Detection.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Security Evaluation of Pattern Classifiers under Attack.
DATA DYNAMICS AND PUBLIC VERIFIABILITY CHECKING WITHOUT THIRD PARTY AUDITOR GUIDED BY PROJECT MEMBERS: Ms. V.JAYANTHI M.E Assistant Professor V.KARTHIKEYAN.
Privacy Preserving Data Sharing With Anonymous ID Assignment
TEACHER FEEDBACK WEBSITE HTVN TEAM. AGENDA 1.TEAM INTRDUCTION 2.PROJECT OVERVIEW 3.PLAN 4.PRODUCT ARCHITECTURE 5.DATABASE DESIGN 6.TEST PLAN & TEST REPORT.
m-Privacy for Collaborative Data Publishing
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.
A Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data.
Optimal Client-Server Assignment for Internet Distributed Systems.
Enterprise Resource Planning(ERP)
Protecting Sensitive Labels in Social Network Data Anonymization.
Identity-Based Secure Distributed Data Storage Schemes.
Enabling Dynamic Data and Indirect Mutual Trust for Cloud Computing Storage Systems.
Hiding in the Mobile Crowd: Location Privacy through Collaboration.
Anonymization of Centralized and Distributed Social Networks by Sequential Clustering.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Identity-Based Distributed Provable Data Possession in Multi-Cloud Storage.
Content Sharing over Smartphone-Based Delay- Tolerant Networks.
Modeling the Pairwise Key Predistribution Scheme in the Presence of Unreliable Links.
Member 1Member 2Member 3Member 4. Agenda Introduction Current Scenario Proposed Solution Block Diagram Technical Implementation Hardware & Software Requirements.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
Facilitating Document Annotation using Content and Querying Value.
Privacy Preserving Back- Propagation Neural Network Learning Made Practical with Cloud Computing.
Participatory Privacy: Enabling Privacy in Participatory Sensing
Preventing Private Information Inference Attacks on Social Networks.
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency in Wireless Mobile Networks.
Twitsper: Tweeting Privately. Abstract Although online social networks provide some form of privacy controls to protect a user's shared content from other.
Privacy-preserving data publishing
VIGNAN'S NIRULA INSTITUTE OF TECHNOLOGY & SCIENCE FOR WOMEN TOOLS LINKS PRESENTED BY 1.P.NAVEENA09NN1A A.SOUJANYA09NN1A R.PRASANNA09NN1A1251.
m-Privacy for Collaborative Data Publishing
Data Leakage Detection by R.Kartheek Reddy 09C31D5807 (M.Tech CSE)
Harnessing the Cloud for Securely Outsourcing Large- Scale Systems of Linear Equations.
Security Analysis of a Privacy-Preserving Decentralized Key-Policy Attribute-Based Encryption Scheme.
Privacy-Preserving and Content-Protecting Location Based Queries.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Guided By: Prof. Rajarshree Karande JSPM’S IMPERIAL COLLEGE OF ENGINEERING & RESEARCH WAGHOLI, PUNE Group MemberRoll No. Abhijeet Aralgundkar03.
Waste Management Inspection Tracking System (WMITS)
Prepared by K.Phani Kumar.  Introduction:  This project is aimed at developing a system by which the employees in the organization submit the bills.
UNIVERSITY MANAGEMENT SYSTEM
“Ensuring distributed accountability for data sharing in the cloud”
Data Leakage Detection Major Project Report Submitted by Ankit Kumar Tater:08J41A1206 Y. V. Pradeep Kumar Reddy:08J41A1235 Pradeep Saklani:08J41A1236 Under.
Speed Cash System. Purpose of the Project  online Banking Transaction Information.  keeping in view of the distributed client server computing technology,
Fragile Watermarking Scheme for Relational Database Fragile Watermarking Scheme for Relational Database.
 Abstract  Introduction  Literature Survey  Conclusion on Literature Survey  Threat model and system architecture  Proposed Work  Attack Scenarios.
Under the Guidance of V.Rajashekhar M.Tech Assistant Professor
Hacker Detection in Wireless sensor network
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
APARTMENT MAINTENANCE SYSTEM
Department Of Computer Science Engineering
A Presentation on online voting system
Practice Management & Patient Health Record sharing system
Presentation transcript:

Data Leakage Detection by Akshay Vishwanathan ( ) Joseph George ( ) S. Prasanth ( ) Guided by: Ms. Krishnapriya

Data Leakage Detection-Introduction In the course of doing business, sometimes sensitive data must be handed over to supposedly trusted third parties. For example, a hospital may give patient records to researchers who will devise new treatments. We call the owner of the data the distributor and the supposedly trusted third parties the agents. Our goal is to detect when the distributor’s sensitive data has been leaked by agents, and if possible to identify the agent that leaked the data.

Existing System We develop a model for assessing the “guilt” of agents. We also consider the option of adding “fake” objects to the distributed set. Such objects do not correspond to real entities but appear realistic to the agents. In a sense, the fake objects act as a type of watermark for the entire set, without modifying any individual members. If it turns out that an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty.

PROBLEM DEFINITION The distributor’s data allocation to agents has one constraint and one objective. The distributor’s constraint is to satisfy agents’ requests, by providing them with the number of objects they request or with all available objects that satisfy their conditions. His objective is to be able to detect an agent who leaks any portion of his data.

Problem Setup And Notation Entities and Agents: A distributor owns a set T = {t1,..., tm} of valuable data objects. The distributor wants to share some of the objects with a set of agents U1, U2,...,Un, but does not wish the objects be leaked to other third parties. Guilty Agents: Suppose that after giving objects to agents, the distributor discovers that a set S є T has leaked. This means that some third party called the target, has been caught in possession of S.

Agent Guilt Model To compute the probability that the agent is guilty given set S, Pr{Gi|S}, we need an estimate for the probability that values in S can be “guessed” by the target. Assumption 1. For all t, t є S such that t ≠ t1 provenance of t is independent of the provenance of t1. Assumption 2. An object t є S can only be obtained by the target in one of two ways: A single agent Ui leaked t from its own Ri set; or The target guessed (or obtained through other means) t without the help of any of the n agents.

Disadvantages of the Existing System In a sense, the fake objects act as a type of watermark for the entire set. If the agent comes to know of the existence of the fake object, he can easily remove it using various software which can easily remove watermarking from the data. There is no way to intimate the distributor when the data is leaked.

Proposed System We present algorithms for distributing objects to agents, in a way that improves our chances of identifying a leaker. We also design a system where an is sent to the distributor when the fake object is downloaded by another agent.

Advantages of the proposed system It is possible to assess the likelihood that an agent is responsible for a leak, based on the overlap of his data with the leaked data and the data of other agents. The algorithms we have presented implement a data distribution strategies that can improve the distributor’s chances of identifying a leaker.

Data Allocation Problem The main focus of the proposed system is the data allocation problem: how can the distributor “intelligently” give data to agents in order to improve the chances of detecting a guilty agent? The two types of requests we handle are sample and explicit. Fake objects are objects generated by the distributor that are not in set T which are designed to look like real objects, and are distributed to agents together with the T objects, in order to increase the chances of detecting agents that leak data.

Explicit Data Requests Explicit request Ri = EXPLICIT(T,condi): Agent Ui receives all T objects that satisfy condi. Algorithm 1. Allocation for Explicit Data Requests (EF) Input: R1;... ; Rn, cond1;... ; condn, b1;... ; bn, B Output: R1;... ; Rn, F1;... ; Fn 1: R <- Ф ; //Agents that can receive fake objects 2: for i = 1,...,n do 3: if bi > 0 then 4: R <- R U {i} 5. Fi <- Ф 6: while B > 0 do 7: i <- SELECTAGENT(R, R1,..., Rn) 8: f <- CREATEFAKEOBJECT(Ri, Fi, condi) 9: Ri <- Ri U {f} 10: Fi <- Fi U {f} 11: bi <- bi : if bi = 0 then 13: R <- R/{Ri} 14: B <- B - 1

Sample Data Requests Sample request Ri = SAMPLE(T, mi): Any subset of mi records from T can be given to Ui. Algorithm 2. Allocation for Sample Data Requests (SF) Input: m1,..., mn, |T| // Assuming mi <= |T| Output: R1,..., Rn 1: a <- 0|T| // a[k]:number of agents who have received object tk 2: R1 <- Ф,..., Rn<-Ф 3: remaining <- Σi=1 to n mi 4: while remaining > 0 do 5: for all i = 1,..., n : |Ri| < mi do 6: k <- SELECTOBJECT (i, Ri) //May also use additional parameters 7: Ri <- Ri U {tk} 8: a[k] <- a[k] + 1 9: remaining <- remaining - 1

Software Requirements Language : C#.NET Technology : ASP.NET IDE : Visual Studio 2008 Operating System : Microsoft Windows XP SP2 Backend : Microsoft SQL Server 2005

Hardware Requirements Processor: Intel Pentium or more RAM: 512 MB (Minimum) Hard Disk : 40 GB

Conclusion In a perfect world there would be no need to hand over sensitive data to agents that may unknowingly or maliciously leak it. And even if we had to hand over sensitive data, in a perfect world we could watermark each object so that we could trace its origins with absolute certainty.

References Papadimitriou, P. Garcia-Molina, H., “Data Leakage Detection”, IEEE Trans. on Knowledge and Data Engineering, pp. 51 – 63, P. Buneman and W.-C. Tan, “Provenance in Databases,” Proc.ACM SIGMOD, pp , R. Agrawal and J. Kiernan, “Watermarking Relational Databases,” Proc. 28th Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp , B. Mungamuru and H. Garcia-Molina, “Privacy, Preservation and Performance: The 3 P’s of Distributed Data Management,” technical report, Stanford Univ., 2008.

Thank You