Private and Secure Secret Shared MapReduce

Slides:



Advertisements
Similar presentations
Optical Architecture for (Restricted) Exponential Time Hard Problems Nova Fandina Ben-Gurion University of the Negev, Israel Joint work with: Prof. Shlomi.
Advertisements

Multi-Party Computation Forever for Cloud Computing and Beyond Shlomi Dolev Joint works with Limor Lahiani, Moti Yung, Juan Garay, Niv Gilboa and Vladimir.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
SkewTune: Mitigating Skew in MapReduce Applications
Assignment of Different-Sized Inputs in MapReduce Shantanu Sharma 2 joint work with Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, and Jeffrey D.
Tries Standard Tries Compressed Tries Suffix Tries.
A Privacy Preserving Repository for Securing Data across the Cloud ENMING LI UIN: CS775 Presentation.
Bounds for Overlapping Interval Join on MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 National Technical University.
Information Security for Sensors Overwhelming Random Sequences and Permutations Shlomi Dolev, Niv Gilboa, Marina Kopeetsky, Giuseppe Persiano, and Paul.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Public Key Encryption that Allows PIR Queries Dan Boneh 、 Eyal Kushilevitz 、 Rafail Ostrovsky and William E. Skeith Crypto 2007.
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
MapReduce.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Meta-MapReduce A Technique for Reducing Communication in MapReduce Computations Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman.
Secure Cloud Database using Multiparty Computation.
SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Data-Parallel.
1 Vasant Tendulkar, Joe Pletcher, Ashwin Shashidharan, Ryan Snyder, Kevin Butler, William Enck 2012 Annual Computer Security Applications Conference.
Aggregation in Sensor Networks
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Wai Kit Wong 1, Ben Kao 2, David W. Cheung 2, Rongbin Li 2, Siu Ming Yiu 2 1 Hang Seng Management College, Hong Kong 2 University of Hong Kong.
Distributed Computing with Turing Machine. Turing machine  Turing machines are an abstract model of computation. They provide a precise, formal definition.
Secure Cloud Database with Sense of Security. Introduction Cloud computing – IT as a service from third party service provider Security in cloud environment.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Mining High Utility Itemset in Big Data
Processing Theta-Joins using MapReduce
Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
多媒體網路安全實驗室 Anonymous Authentication Systems Based on Private Information Retrieval Date: Reporter: Chien-Wen Huang 出處: Networked Digital Technologies,
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Swarming Secrets Shlomi Dolev (BGU), Juan Garay (AT&T Labs), Niv Gilboa (BGU) Vladimir Kolesnikov (Bell Labs) Allerton 2009.
Assignment Problems of Different- Sized Inputs in MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, Shantanu Sharma 2, and Jeffrey D. Ullman.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
Talal H. Noor, Quan Z. Sheng, Lina Yao,
Big Data is a Big Deal!.
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Privacy-Preserving Clustering
Large-scale file systems and Map-Reduce
Hadoop MapReduce Framework
Distributed Query Processing using different Semijoin operations.
Privacy Preserving Similarity Evaluation of Time Series Data
Central Florida Business Intelligence User Group
Assignment Problems of Different-Sized Inputs in MapReduce
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Ministry of Higher Education
Chapter 15 QUERY EXECUTION.
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Author: Ahmed Eldawy, Mohamed F. Mokbel, Christopher Jonathan
A Privacy-Preserving Index for Range Queries
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Word Co-occurrence Chapter 3, Lin and Dyer.
One-Pass Algorithms for Database Operations (15.2)
CS639: Data Management for Data Science
MapReduce: Simplified Data Processing on Large Clusters
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
Distributed Databases
Presentation transcript:

Private and Secure Secret Shared MapReduce 30th Annual IFIP WG 11.3 Conference on Data and Applications Security and Privacy (DBSec 2016), Trento, Italy Private and Secure Secret Shared MapReduce Shlomi Dolev1, Yin Li2, and Shantanu Sharma1 1 Ben-Gurion University of the Negev, Israel 2 Xinyang Normal University, China

Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion

Introduction Why is it required to ensure privacy? Users send data on the clouds Curious mappers and reducers can Store useful data Know the given job Where is it required? Banking, financial, retail, and healthcare

Make information secure data Introduction What others do? Work on encrypted data Authentication & Compress + Encrypt data Encrypt data-at-rest Secure storage of data in HDFS Provide authentication before using Hadoop cluster They are making ‘computational secure’ data. But, for how long is it secured?? Make information secure data

Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion

System Settings Secret-shares of the database Step3: Master Process M R M R Step 2: send queries Step 1: Distribute secret-shares M Step 3: obtaining addresses of outputs Step 4: Fetching outputs Database Secret-shares of the database Step3: Master Process Step 4: Interpolation and obtain the final results M R M R M Data owner User-side Secret-shares of the database Step3: Master Process M R Notations: M: Mapper R: Reducer M R M

Adversarial Setting Honest-but-curious adversary Wants to gain knowledge But executes computations honestly

Parameters of Analysis Communication cost Computational cost Number of rounds

Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion

Overview of the Approach Accumulating Automata* Make shares of data (or input split) Send these shares to mappers Mappers do not know the computation and data Mappers have a defined accumulating automata Example: Search a pattern “LO” in the string “LOXLO” *S. Dolev, N. Giboa, X. Li, “Accumulating Automata and Cascaded Equations Automata for Communicationless Information Theoretically Secure Multi-Party Computation: Extended Abstract,” SCC@ASIACCS, pages 21—29, 2015.

Overview of the Approach Example: Search a pattern “LO” in the string “LOXLO” L = {v3,v4} O = {v5,v10} X ={v1,v1} L = {v4,v5} O = {v15,v20} Mapper 1 𝑁 1 𝑀 𝑘+1 =𝑣0 𝑁 2 𝑀 𝑘+1 = 𝑁 1 𝑀 𝑘 . 𝑣1 𝑁 3 𝑚 𝑘+1 = 𝑁 3 𝑀 𝑘 +𝑁 2 𝑀 𝑘 . 𝑣2 N1 N2 N3 v140 2 1 1 L = {v5,v6} O = {v10,v19} X = {v2,v2} L = {v6,v7} O = {v20,v29} Mapper 2 N1 N2 N3 v698 Reducer 3 4 8 LO, 2 L = {v7,v9} O ={v15,v28} X = {v3,v3} Mapper 3 N1 N2 N3 v1964 4 9 27 L = {v9,v12} O={v20,v37} X = {v4,v4} Mapper 4 N1 N2 N3 v4226 5 16 64

Creating Secret-Shares Consider only English words Represent an alphabet as: ‘A’ is represented as (11, 02, 03, . . ., 026) Make secret-shares of every bit by selecting different polynomials of an identical degree Since we use different polynomials for creating secret-shares of each bit, multiple occurrences of a word in a database have different secret- shares

Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion

Count Operation String-matching based Two phases Matches a value of a relation with a pattern, where the value and the pattern are of the form of secret-shares Two phases Phase 1: Privacy-preserving counting in the clouds Phase 2: Result reconstruction at the user-side

Count Operation Working in the cloud: A mapper Creates an automaton of x+1 nodes where x is the length of p Initializes values of these nodes The first node is assigned a value one (N1 = 1) and all the other nodes are assigned values zero (Ni = 0)

Count Operation Working in the cloud: Count ‘John’ v1 = J * J Name Adam John v1 = J * J v2 = o * o v4 = n * n v1 = J * A v2 = o * d v3 = h * h v4 = m * n v3 = a * h v1 v1 v2 v3 v4 N1 = 1 N2 = 0 N3 = 0 N4 = 0 N5 = 0 N1 = 1 N2 = 0 N3 = 0 N4 = 0 N5 = 1 N1 = 1 N2 = 0 N3 = 0 N4 = 0 N5 = 2

Count Operation Working at the user side Result construction – a simple interpolation operation

Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion

Search and Fetch Operation Working in the cloud: A mapper PHASE 1: Finding addresses of tuples containing p PHASE 2: Fetching all the tuples containing p

Search and Fetch Operation Unary Occurrence Working in the cloud: A mapper No need to know the address Multiply Results will be 0 or 1 of the form of secret-shares Multiply the result with the tuple Add the values of an attribute Name Department Adam CS John EC Name Department 1 Adam 1 1

Search and Fetch Operation Unary Occurrence Working at the user side A simple interpolation

Search and Fetch Operation Multiple Occurrences Tradeoff Number of rounds vs computational load at the user side Naïve algorithm and a database partitioning algorithm

Search and Fetch Operation Multiple Occurrences The first way: Naïve Algorithm Requires a lot of computation at the user side while only 2 rounds are required Now the user can know the address Name Department Adam CS John EC Name 1 John Multiply

Search and Fetch Operation Multiple Occurrences The first way: Naïve Algorithm – But HOW TO FETCH Say L occurrences are there Create a matrix M of L*n Name 1 Name Department Adam CS John EC Name Department John EC CS 1 * M

Search and Fetch Operation Multiple Occurrences The second way Requires less computation at the user side while more than 2 rounds are required Partitions database and knows address Then fetches tuples using the solution suggested in the naïve algorithm

Search and Fetch Operation Multiple Occurrences Database #Occurrence = 1 #Occurrences = 2 #Occurrence = 1 #Occurrences = 2 #Occurrence = 0 #Occurrences = 2 #Occurrence = 0 Q&A Round 3 Q&A Round 1 Q&A Round 2

Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion

Other Operations Equijoin Range query Use two layers of clouds, where the first layer performs fetch operation and the second layer performs equijoin operation Range query By using 2’s complement Count the occurrence of number that lies in the range and then fetch those tuples

Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion

Conclusion Privacy-preserving operations based on MapReduce A way to create secret-shares Count, search, and fetch operations Equijoin and range quires

Presentation is available at http://www.cs.bgu.ac.il/~sharmas/publication.html Shlomi Dolev1, Yin Li2, and Shantanu Sharma1 1 Department of Computer Science, Ben-Gurion University of the Negev, Israel {dolev,sharmas}@cs.bgu.ac.il 2 Department of Computer Science, Xinyang Normal University, China yunfeiyangli@gmail.com