Download presentation
Presentation is loading. Please wait.
Published byDarlene Rose Modified over 9 years ago
1
Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa
2
INTRODUCTION Record Matching Process of identifying records representing same real world entity Can be executed in Single source Across sources Goal: Record matching that preserves privacy of both data and schema
3
RECORD MATCHING Record matching involves: Sharing and integrating data Protecting privacy of data Two major innovations: Approximate matching Awareness of schema information
4
EMBEDDING Embed records in Euclidean space Method used SparseMap Comparison Functions edit distance Matching Decision Rule Classify records as a match/ non-match Record Matching
5
EXAMPLE EDIT DISTANCE e( “Virginia”, “Vermont”) = 5 Virginia Verginia Verminia Vermonia Vermonta Vermont
6
HYPOTHESIS Two hypothesis: Parties P and Q store the records to be matched in the relations R P (A 1,…A n ) and R Q (B 1,…B n ) respectively, 1. having identical schemas 2. having possible schema-level conflicts Record matching between R P and R Q P will know only a set P Match, consisting of records in R P that match with records in R Q. Similarly Q will know only the set Q Match.
7
SECURE DATA MATCHING Pairs of records compared by means of comparison function Third party introduced to assure privacy SparseMap reference set metric space No. of subsets = [log 2 N] 2
8
HEURISTIC Distance Approximation Input: Object o, Set S i Output: Approx d(o, S i ) Greedy Sampling Input: m co-ordinates Output: t <= m most discriminating co-ordinates
9
DATA MATCHING PROTOCOL assume parties P and Q store records to be matched in the relations R P (A 1,…A n ) and R Q (B 1,…B n ) respectively a third party-based protocol consists of the three following phases Phase 1: Setting of the embedding space Phase 2:Embedding of R P and R Q values Phase 3:Comparison to decide matching records
10
Phase 1
11
Phase 2
12
ILLUSTRATION Stress Eg: Academic(8.0,5.0,7.0,7.0) and usefull(6.0,6.0,6.0,7.0) Using 1 st co-ordinate – 0.5625, Using 2 nd co-ordinate – 0.7656 Using 3 rd co-ordinate – 0.7656 Using 4 th co-ordinate – 1.0 Choose 1 st co-ordinate Using 1 st and 2 nd co-ordinate – 0.5191 Using 1 st and 3 rd co-ordinate – 0.5191 Using 1 st and 4 th co-ordinate – 0.5625
13
Phase 3 Given a vector v in P str and w in Q str, the Euclidean distance calculated Decision rule applied to all records comparisons: If true, records of P str and Q str inserted in two sets P Match and Q Match respectively Final sets sent to two parties respectively
14
SECURE SCHEMA MATCHING S W : global schema owned by third party W L W : language α w : alphabet S P and S Q are the source schemas owned by two parties if S W is Customer (Name, DateofBirth, ResidenceAddress) and S P is Cust( FirstName, LastName, DateofBirth), it is mapped as concatenate( Cust.FirstName, Cust.LastName) = Customer.Name
15
SECURE SCHEMA MATCHING (contd) P generates SP’ (D1,..., Ds) from the mapping of SP with SW(D1,..., DL); Q generates SQ’(D1,..., Dx) from the mapping of SQ with SW(D1,..., DL); P and Q negotiate: secret key k Embedding parameters ( Lx, N, dist); Hash function h P sends HP =(h(D1, k),..., h(Ds, k)) to W; Q sends HQ = (h(D1, k)..., h(Dx, k)) to W; W computes the intersection HP ∩ HQ
16
SECURITY ANALYSIS Length of the database Database size Set of matching records Set of matching attributes Number of matching attributes
17
EXPERIMENTAL EVALUATION
19
CONCLUSION Privacy-preserving record matching between two parties that can have different schemas Requires privacy at schema level Obtain privacy by embedding records in vector space Applications: DNA sequences, Images, Proteins, etc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.