Download presentation
1
Presented By Amarjit Datta
Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data Presented By Amarjit Datta
2
Authors and Publication Information
Ning Cao PhD in ECE from the Worcester Polytechnic Institute Cong Wang PhD in ECE from Illinois Institute of Technology Ming Li PhD in ECE from the Worcester Polytechnic Institute Kui Ren PhD in ECE from the Worcester Polytechnic Institute. Wenjing Lou PhD in ECE from the University of Florida
3
Table of Contents Introduction Problem Domain
Some Important Definitions MRSE framework and version MRSE schema analysis MRSE schema improvements
4
Introduction Cloud computing is becoming more and more popular nowadays… Why cloud is so popular? Minimum startup cost Pay-as-you-go Easily scalable No server administration overhead
5
Introduction While uploading contents in cloud…there can be many security issues. There can be; man-in-the-middle attack sniffing packets spoofing IP addresses and many more… In this research paper, authors analyzed the privacy issues that can happen…after the content is uploaded on cloud.
6
Introduction Cloud server acts as “honest-but-curious” “honest” – It follows the designated protocols “curious” – It want to infer and analyze data in it’s storage “So we will have to search encrypted data, hosted in cloud environment, without sharing private information with the cloud”
7
Introduction So what data owners can do about it?
Data owner can encrypt his files before uploading it to cloud. But how can they search encrypted files in cloud? Traditional plain text keyword search won’t work.
8
Introduction We also need search results in a ranked order (Example: most relevant) Coordinate matching: Search for as many keyword matches as possible in the document.
9
Privacy must be preserved
Problem Definition Performing single keyword based search over encrypted data is already widely researched. This paper explores 2 new use-cases… Multi keyword based search over encrypted cloud data. Ranked search (Sort results based on relevance). Here the paper used “coordinated matching for rank analysis. Privacy must be preserved
10
Problem Model
11
Problem Formulation Data owner has collection of data documents and their encrypted forms. Data owner creates a encrypted searchable index. Both encrypted file and encrypted searchable index are copied to the cloud server. To search, data users need corresponding trapdoor T
12
Problem Formulation Based on the amount of information cloud server knows… Known Ciphertext model - Cloud server will only know the encrypted dataset and searchable indexes. Known background model - Cloud server will know the encrypted dataset + searchable index + additional information (Example: Correlation of data search query)
13
This is what we want! Data privacy Encrypting data file and searchable index file Keyword privacy Hide what users are searching Trapdoor unlinkability Trapdoor generation function should be randomized instead of deterministic one.
14
Lets Check MRSE Schema! Main Idea is to confuse the cloud server
So that it cannot detect the search key words and document type. We can do that using randomization on different steps… Lets Check MRSE Schema!
15
Notations
16
MRSE – Basic Framework
17
MRSE Framework Upload encrypted files and indexes Query Key Data owner
Setup Trapdoor Build Index Key Data owner Data user
18
How to Do Ranking? - Similarity Calculation
Di is a binary data vector for document Fi where each bit Di is either 0 or 1 represents the existence of the corresponding keyword Wj in that document Q is a binary query vector indicating the keywords of interest where each bit Qj represents the existence of the corresponding keyword Wj in the query. The similarity score of document Fi to query is therefore expressed as the inner product of their binary column vectors, i.e., Di . Q.
19
MRSE_I Scheme
20
MRSE_I Scheme Setup: The data owner randomly generates a (n+2)-bit vector as S and two (n+2) x (n+2) invertible matrices M1;M2. Generate secret key SK is in the form of a 3-tuple as {S;M1;M2} n is the number of fields for each record n + 2 is = n {dummy random keyword} Build-Index: The data owner generates a binary data vector Di for every document Fi, where each binary bit Di[j] represents whether the corresponding keyword Wj appears in the document Fi.
21
MRSE_I Scheme Trapdoor: With t keywords of interest, one binary vector Q is generated where each bit Qj indicates whether Wj belongs to W is true or false. Based on this vector, trapdoor is generated. Query: With the trapdoor, the cloud server computes the similarity scores of each document Fi. After sorting all scores, the cloud server returns the top-k ranked id list
22
MRSE_I - Analysis Functionality: Random dummy keyword introduced can follow a normal distribution where the standard deviation functions as a flexible tradeoff parameter among search accuracy and security.
23
MRSE_I - Analysis Data privacy: Is preserved by the encryption of data. Index privacy: Secret until the secure key is protected. With the randomness introduced by the splitting process and the random numbers r, and t, our basic scheme can generate two totally different trapdoors for the same query.
24
Improvement of MRSE_1 MRSE is secure enough for known Cyphertext model. But for known background model, this is not sufficient. For example: Document frequency, which can be further combined with background information to identify the keyword in a query at high probability.
25
Improvement of MRSE_1 - Scale Analysis Attack
Given two correlated trapdoors T1 and T2 for query keywords {K1;K2} and {K1;K2;K3} and three documents, respectively, the cloud server could deduce that whether all the three documents contain K3 or none of them contain K3. From this cloud server can find out document frequency
26
MRSE_2 Scheme U is the number of dummy keywords inserted.
In MRSE_1, only 1 dummy keyword was used in 1 document. Both Build Index and Query considers U
27
More Improvement So far we have used number of keywords available in the document count only for doing ranking. But there can be some other important facts too. For example: When a keyword appears in all documents, it’s important is less. So considering keyword weight while ranking documents can be an improvement
28
More Improvement MRSE_I_TF schema is the improved version of MRSE that considers weight of the keyword during similarity calculation. MRSE_2_TF schema incorporate both the idea of MRSE_I_TF (weighted keyword) and MRSE_2 (List of random dummy keywords)
29
Future Possible Work For future work, authors will explore checking the integrity of the rank order in the search result assuming the cloud server is untrusted.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.