Privacy Preserving Subgraph Matching on Large Graphs in Cloud

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Comments We consider in this topic a large class of related problems that deal with proximity of points in the plane. We will: 1.Define some proximity.
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
The Instant Insanity Game
The Shortest Path Problem

Properties of Graphs of Quadratic Functions
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Secure Cloud Database using Multiparty Computation.
5-4 Elimination Using Multiplication aka Linear Combination Algebra 1 Glencoe McGraw-HillLinda Stamper.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
GStore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu ¨, Dongyan Zhao {
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
1 Decomposition into bipartite graphs with minimum degree 1. Raphael Yuster.
EMIS 8373: Integer Programming Combinatorial Relaxations and Duals Updated 8 February 2005.
Relation. Combining Relations Because relations from A to B are subsets of A x B, two relations from A to B can be combined in any way two sets can be.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
Chapter 8: Relations. 8.1 Relations and Their Properties Binary relations: Let A and B be any two sets. A binary relation R from A to B, written R : A.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Presented By Ravindra Babu, Pentyala.  Real World Problem  What is Coloring  Planar Graphs  Vertex Coloring  Edge Coloring  NP Hard Problem  Problem.
Outline Introduction State-of-the-art solutions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Source: IEEE Signal Processing Letters (Accepted)2016
Discrete ABC Based on Similarity for GCP
Database Management System
Improving Chinese handwriting Recognition by Fusing speech recognition
By: Sibo Wang, Xiaokui Xiao, Yin Yang, Wenqing Lin
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
Efficient Join Query Evaluation in a Parallel Database System
SocialMix: Supporting Privacy-aware Trusted Social Networking Services
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
11 Graphs of Linear Equations, and Inequalities, in Two Variables.
Privacy Preserving Ranked Multi-Keyword
Personalized Privacy Protection in Social Networks
Degree and Eigenvector Centrality
Pyramid Sketch: a Sketch Framework
On Efficient Graph Substructure Selection
Haim Kaplan and Uri Zwick
Personalized Privacy Protection in Social Networks
Delivery and Forwarding of
• Draw rectangle ABCD on a piece of graph paper.
Diversified Top-k Subgraph Querying in a Large Graph
Efficient Subgraph Similarity All-Matching
SEG5010 Presentation Zhou Lanjun.
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Copyright © Cengage Learning. All rights reserved.
Licensed Electrical & Mechanical Engineer
International Data Encryption Algorithm
Actively Learning Ontology Matching via User Interaction
Two – One Problem Legal Moves: Slide Rules: 1s’ move right Hop
Yan Huang - CSCI5330 Database Implementation – Query Processing
Two – One Problem Legal Moves: Slide Rules: 1s’ move right Hop
The Complexity of Approximation
Multiplication facts Your Help Guide.
Donghui Zhang, Tian Xia Northeastern University
Constructing a m-connected k-Dominating Set in Unit Disc Graphs
Equilibrium Metrics for Dynamic Supply-Demand Networks
Presentation transcript:

Privacy Preserving Subgraph Matching on Large Graphs in Cloud Zhao Chang*, Lei Zou†, Feifei Li* *University of Utah, USA †Peking University, China Hi, my name is Zhao Chang. I come from University of Utah. Today, let me introduce our paper Privacy Preserving Subgraph Matching on Large Graphs in Cloud.

Outline Background K-Automorphism Subgraph Matching Experimental Results Here is a brief outline.

Background Subgraph Matching on Large Graphs Data Privacy Problem: subgraph matching Solution: cloud computing Data Privacy Problem: untrusted cloud Solution: anonymize the original data graph and query graphs The problem we need to solve is performing subgraph matching on large graphs. Public cloud can provide the service of subgraph matching. However, uploading the data graph and query graphs directly to the untrusted cloud would leak data privacy. Thus, our solution is to anonymize the data graph and query graphs before uploading them to the cloud.

K-Automorphism Original Graph G We use the k-automorphism privacy model to anonymize graphs, which was proposed in an existing work of VLDB 2009. Here is the original data graph G from a professional social network. Each node has node type and node attributes. P means the node is a person. S means the node is a school. C means the node is a company. The relations between p node and p node, p node and c node, p node and s node stand for spouse relation, work at relation and graduate from relation. For example, p1 and p2 are married couple who graduated from the same school UIUC and work at the same company Google.

K-Automorphism Original Graph G Original Query Q Here is an original query graph Q. We should find matches of Q on G. For each match M, the vertex should have the same vertex type and contain vertex attributes of the corresponding vertex in Q. Original Query Q

K-Automorphism Original Graph G Original Query Q For example, here is a match of Q on G. Original Query Q

K-Automorphism K-Automorphic Graph Gk K = 2 To keep the data privacy, the client transforms the original data graph into a k-automorphic graph Gk. We need to protect both label privacy and structural privacy. For label privacy, we generate label groups by combining different labels to label groups using k-anonymity. The label correspondence table shows the relation between the label group and the corresponding labels. For structural privacy, the k-automorphic graph Gk can be divided into k identical partitions. Each vertex has at least k-1 other symmetric vertices. Label Correspondence Table (LCT)

K-Automorphism K-Automorphic Graph Gk K = 2 Original Query Q For the query graph, the client uses the same label combination to generate the outsourced query Qo. Original Query Q Label Correspondence Table (LCT)

K-Automorphism K-Automorphic Graph Gk K = 2 Outsourced Query Qo For the query graph, the client uses the same label combination to generate the outsourced query Qo. Outsourced Query Qo Label Correspondence Table (LCT)

K-Automorphism Cloud Client G (Original Graph) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client G (Original Graph)

Gk (K-Automorphic Graph) K-Automorphism Cloud Gk (K-Automorphic Graph) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client G (Original Graph)

Gk (K-Automorphic Graph) K-Automorphism Cloud Gk (K-Automorphic Graph) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client G (Original Graph) Q (Original Query)

Gk (K-Automorphic Graph) K-Automorphism Cloud Gk (K-Automorphic Graph) Qo (Anonymized Query) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client G (Original Graph) Q (Original Query)

Gk (K-Automorphic Graph) K-Automorphism Cloud Gk (K-Automorphic Graph) Qo (Anonymized Query) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client G (Original Graph) Q (Original Query)

Gk (K-Automorphic Graph) K-Automorphism Cloud Gk (K-Automorphic Graph) Qo (Anonymized Query) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client

We denote the matches of any query Q on any data graph G by R(Q, G). K-Automorphism Cloud R(Qo, Gk) We denote the matches of any query Q on any data graph G by R(Q, G). However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client

K-Automorphism Cloud Client R(Qo, Gk) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client

K-Automorphism Cloud Client R(Qo, Gk) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client

K-Automorphism Hash Index Cloud Client R(Qo, Gk) R(Q, G) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client Hash Index R(Q, G)

K-Automorphism Cloud Client R(Q, G) However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage. Client R(Q, G)

K-Automorphism Optimization 1 Upload Go rather than Gk to the cloud Save cloud storage However, if the client uploads the k-automorphic graph Gk to the cloud, it will need large cloud storage.

K-Automorphism Optimization 1 Upload Go rather than Gk to the cloud Save cloud storage Outsourced Graph Go K = 2 Thus our Optimization 1 is to upload a small subset Go rather than Gk to the cloud. Go only includes the first block of Gk and their one-hop neighbors. It can largely save cloud storage.

Alignment Vertex Table (AVT) K-Automorphism Optimization 1 Upload Go rather than Gk to the cloud Save cloud storage Outsourced Graph Go K = 2 The client also needs to upload the outsourced query Qo and the alignment vertex table to the cloud. Outsourced Query Qo Alignment Vertex Table (AVT)

Subgraph Matching Star Matching Outsourced Query Qo Then, the cloud performs the subgraph matching algorithm on the outsourced graph. Give an outsourced query Qo, Outsourced Query Qo

Subgraph Matching Star Matching Query Decomposition the cloud first performs query decomposition. It divides the outsourced query Qo into several stars. Query Decomposition Outsourced Query Qo Star S1 Star S2

Subgraph Matching Star Matching Outsourced Graph Go Star Matching Then the cloud performs star matching for each star query on the outsourced graph Go. Query Decomposition Outsourced Query Qo Star S1 Star S2

Subgraph Matching Optimization 2 Estimate the search space of matching a star query Better label generalization, better query decomposition Save query time cost Since the cloud needs to perform star matching, our Optimization 2 is to estimate the search space of matching a star query. It gives us better label generalization and better query decomposition, which can save query time cost.

Subgraph Matching Optimization 2 Estimate the search space of matching a star query Better label generalization, better query decomposition Save query time cost This example shows how we estimate the search space of matching a star query.

Subgraph Matching Optimization 2 Estimate the search space of matching a star query Better label generalization, better query decomposition Save query time cost Given the vertex type P of the center of star S1, the search space looks like this.

Subgraph Matching Result Join After star matching, the cloud generates the matching results of each star on Go. Then the cloud needs to join these intermediate results R(S1, Go) and R(S2, Go).

Subgraph Matching Result Join Alignment Vertex Table (AVT) According to the alignment vertex table, the cloud can generate the matches of R(S1, Gk) and R(S2, Gk). Alignment Vertex Table (AVT) Automorphic Function F1

Subgraph Matching Result Join Alignment Vertex Table (AVT) Then the cloud joins R(S1, Gk) and R(S2, Gk) and generates the matches of the outsourced query Qo on the whole k-automorphic graph Gk, denoted by R(Qo, Gk). Alignment Vertex Table (AVT) Automorphic Function F1

Subgraph Matching Optimization 3 Generate Rin rather than all the results Save query time cost and communication cost However, according to the symmetry of the k-automorphic graph Gk, it’s not necessary for the cloud to generate R(Qo, Gk). In fact, joining R(S1, Go) and R(S2, Gk) is enough. We denote such result set by Rin. The size of Rin is only 1 over k of that of R(Qo, Gk). Alignment Vertex Table (AVT) Automorphic Function F1

Generating Candidate Results Subgraph Matching Client Filtering The cloud just transmits Rin to the client side. Then, the client can recover R(Qo, Gk) by using the alignment vertex table. Generating Candidate Results

Generating Candidate Results Subgraph Matching Client Filtering According to some hash index, the client filters out false positives and only keeps the real matches of the original query Q on the original data graph G. Generating Candidate Results Filtering

Generating Candidate Results Subgraph Matching Client Filtering Original Query Q After that, the client gains the final matching result. Generating Candidate Results Matching Result Filtering

Experimental Results Original Data Graphs: Original Query Graphs: 105 – 107 vertices, 106 – 108 edges The ratio of # of vertices to # of different labels: around 103 Original Query Graphs: 4 – 12 edges In the last part, I will show some of our experimental results. The original data graphs have ten to the five to ten to the seven vertices and ten to the six to ten to the eight edges. The ratio of the number of vertices to the number of different labels is around one thousand. The original query graphs have 4 to 12 edges.

Experimental Results Methods: EFF: applies all the three optimizations We compare the performance of four methods.

Experimental Results Methods: EFF: applies all the three optimizations BAS: directly uploading the whole k-automorphic graph to the cloud

Experimental Results Methods: EFF: applies all the three optimizations BAS: directly uploading the whole k-automorphic graph to the cloud RAN: randomly combines labels into label groups

Experimental Results Methods: EFF: applies all the three optimizations BAS: directly uploading the whole k-automorphic graph to the cloud RAN: randomly combines labels into label groups FSIM: combines labels with similar frequencies into the same label groups

Experimental Results Cloud processing time on DBpedia (K = 3) The two figures show the cloud processing time and the client processing time on DBpedia when k equals 3. EFF shows the best performance in terms of cloud processing time. Since BAS has already generated R(Qo, Gk) in the cloud, it has less time cost in the client processing than other methods. Cloud processing time on DBpedia (K = 3) Client processing time on DBpedia (K = 3)

Thanks! Thanks. Do you have any questions?