Private and Secure Secret Shared MapReduce 30th Annual IFIP WG 11.3 Conference on Data and Applications Security and Privacy (DBSec 2016), Trento, Italy Private and Secure Secret Shared MapReduce Shlomi Dolev1, Yin Li2, and Shantanu Sharma1 1 Ben-Gurion University of the Negev, Israel 2 Xinyang Normal University, China
Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion
Introduction Why is it required to ensure privacy? Users send data on the clouds Curious mappers and reducers can Store useful data Know the given job Where is it required? Banking, financial, retail, and healthcare
Make information secure data Introduction What others do? Work on encrypted data Authentication & Compress + Encrypt data Encrypt data-at-rest Secure storage of data in HDFS Provide authentication before using Hadoop cluster They are making ‘computational secure’ data. But, for how long is it secured?? Make information secure data
Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion
System Settings Secret-shares of the database Step3: Master Process M R M R Step 2: send queries Step 1: Distribute secret-shares M Step 3: obtaining addresses of outputs Step 4: Fetching outputs Database Secret-shares of the database Step3: Master Process Step 4: Interpolation and obtain the final results M R M R M Data owner User-side Secret-shares of the database Step3: Master Process M R Notations: M: Mapper R: Reducer M R M
Adversarial Setting Honest-but-curious adversary Wants to gain knowledge But executes computations honestly
Parameters of Analysis Communication cost Computational cost Number of rounds
Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion
Overview of the Approach Accumulating Automata* Make shares of data (or input split) Send these shares to mappers Mappers do not know the computation and data Mappers have a defined accumulating automata Example: Search a pattern “LO” in the string “LOXLO” *S. Dolev, N. Giboa, X. Li, “Accumulating Automata and Cascaded Equations Automata for Communicationless Information Theoretically Secure Multi-Party Computation: Extended Abstract,” SCC@ASIACCS, pages 21—29, 2015.
Overview of the Approach Example: Search a pattern “LO” in the string “LOXLO” L = {v3,v4} O = {v5,v10} X ={v1,v1} L = {v4,v5} O = {v15,v20} Mapper 1 𝑁 1 𝑀 𝑘+1 =𝑣0 𝑁 2 𝑀 𝑘+1 = 𝑁 1 𝑀 𝑘 . 𝑣1 𝑁 3 𝑚 𝑘+1 = 𝑁 3 𝑀 𝑘 +𝑁 2 𝑀 𝑘 . 𝑣2 N1 N2 N3 v140 2 1 1 L = {v5,v6} O = {v10,v19} X = {v2,v2} L = {v6,v7} O = {v20,v29} Mapper 2 N1 N2 N3 v698 Reducer 3 4 8 LO, 2 L = {v7,v9} O ={v15,v28} X = {v3,v3} Mapper 3 N1 N2 N3 v1964 4 9 27 L = {v9,v12} O={v20,v37} X = {v4,v4} Mapper 4 N1 N2 N3 v4226 5 16 64
Creating Secret-Shares Consider only English words Represent an alphabet as: ‘A’ is represented as (11, 02, 03, . . ., 026) Make secret-shares of every bit by selecting different polynomials of an identical degree Since we use different polynomials for creating secret-shares of each bit, multiple occurrences of a word in a database have different secret- shares
Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion
Count Operation String-matching based Two phases Matches a value of a relation with a pattern, where the value and the pattern are of the form of secret-shares Two phases Phase 1: Privacy-preserving counting in the clouds Phase 2: Result reconstruction at the user-side
Count Operation Working in the cloud: A mapper Creates an automaton of x+1 nodes where x is the length of p Initializes values of these nodes The first node is assigned a value one (N1 = 1) and all the other nodes are assigned values zero (Ni = 0)
Count Operation Working in the cloud: Count ‘John’ v1 = J * J Name Adam John v1 = J * J v2 = o * o v4 = n * n v1 = J * A v2 = o * d v3 = h * h v4 = m * n v3 = a * h v1 v1 v2 v3 v4 N1 = 1 N2 = 0 N3 = 0 N4 = 0 N5 = 0 N1 = 1 N2 = 0 N3 = 0 N4 = 0 N5 = 1 N1 = 1 N2 = 0 N3 = 0 N4 = 0 N5 = 2
Count Operation Working at the user side Result construction – a simple interpolation operation
Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion
Search and Fetch Operation Working in the cloud: A mapper PHASE 1: Finding addresses of tuples containing p PHASE 2: Fetching all the tuples containing p
Search and Fetch Operation Unary Occurrence Working in the cloud: A mapper No need to know the address Multiply Results will be 0 or 1 of the form of secret-shares Multiply the result with the tuple Add the values of an attribute Name Department Adam CS John EC Name Department 1 Adam 1 1
Search and Fetch Operation Unary Occurrence Working at the user side A simple interpolation
Search and Fetch Operation Multiple Occurrences Tradeoff Number of rounds vs computational load at the user side Naïve algorithm and a database partitioning algorithm
Search and Fetch Operation Multiple Occurrences The first way: Naïve Algorithm Requires a lot of computation at the user side while only 2 rounds are required Now the user can know the address Name Department Adam CS John EC Name 1 John Multiply
Search and Fetch Operation Multiple Occurrences The first way: Naïve Algorithm – But HOW TO FETCH Say L occurrences are there Create a matrix M of L*n Name 1 Name Department Adam CS John EC Name Department John EC CS 1 * M
Search and Fetch Operation Multiple Occurrences The second way Requires less computation at the user side while more than 2 rounds are required Partitions database and knows address Then fetches tuples using the solution suggested in the naïve algorithm
Search and Fetch Operation Multiple Occurrences Database #Occurrence = 1 #Occurrences = 2 #Occurrence = 1 #Occurrences = 2 #Occurrence = 0 #Occurrences = 2 #Occurrence = 0 Q&A Round 3 Q&A Round 1 Q&A Round 2
Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion
Other Operations Equijoin Range query Use two layers of clouds, where the first layer performs fetch operation and the second layer performs equijoin operation Range query By using 2’s complement Count the occurrence of number that lies in the range and then fetch those tuples
Outline Introduction System Settings Overview of the Approach Count Operation Search and Fetch Operation Other Operations Conclusion
Conclusion Privacy-preserving operations based on MapReduce A way to create secret-shares Count, search, and fetch operations Equijoin and range quires
Presentation is available at http://www.cs.bgu.ac.il/~sharmas/publication.html Shlomi Dolev1, Yin Li2, and Shantanu Sharma1 1 Department of Computer Science, Ben-Gurion University of the Negev, Israel {dolev,sharmas}@cs.bgu.ac.il 2 Department of Computer Science, Xinyang Normal University, China yunfeiyangli@gmail.com