Presented By Amarjit Datta

Slides:



Advertisements
Similar presentations
A CGA based Source Address Authentication Method in IPv6 Access Network(CSA) Guang Yao, Jun Bi and Pingping Lin Tsinghua University APAN26 Queenstown,
Advertisements

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Trusted Data Sharing over Untrusted Cloud Storage Provider Gansen Zhao, Chunming Rong, Jin Li, Feng Zhang, and Yong Tang Cloud Computing Technology and.
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Watermarking 3D Objects for Verification Boon-Lock Yeo Minerva M. Yeung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Intelligent Information Retrieval 1 Vector Space Model for IR: Implementation Notes CSC 575 Intelligent Information Retrieval These notes are based, in.
Ranking models in IR Key idea: We wish to return in order the documents most likely to be useful to the searcher To do this, we want to know which documents.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
INTRODUCTION PROBLEM FORMULATION FRAMEWORK AND PRIVACY REQUIREMENTS FOR MRSE PRIVACY-PRESERVING AND EFFICIENT MRSE PERFORMANCE ANALYSIS RELATED WORK CONCLUSION.
Private Information Retrieval Benny Chor, Oded Goldreich, Eyal Kushilevitz and Madhu Sudan Journal of ACM Vol.45 No Reporter : Chen, Chun-Hua Date.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Efficient Conjunctive Keyword-Searchable Encryption,2007 Author: Eun-Kyung Ryu and Tsuyoshi Takagi Presenter: 顏志龍.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
1 Secure Indexes Author : Eu-Jin Goh Presented by Yi Cheng Lin.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Public Key Encryption that Allows PIR Queries Dan Boneh 、 Eyal Kushilevitz 、 Rafail Ostrovsky and William E. Skeith Crypto 2007.
Chapter 5: Information Retrieval and Web Search
SSH Secure Login Connections over the Internet
Practical Techniques for Searches on Encrypted Data Yongdae Kim Written by Song, Wagner, Perrig.
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
作者 :Jin Li, Qian Wang, Cong Wang, Ning Cao, Kui Ren, and Wenjing Lou 出處 :IEEE Transactions on Knowledge and Data Engineering(2011) 日期 :2012/05/15 報告人 :
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Lecturer: Ghadah Aldehim
HPCC 2015, August , New York, USA Wei Chang c Joint work with Qin Liu a, Guojun Wang b, and Jie Wu c a. Hunan University, P. R. China b. Central.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Computer Security: Principles and Practice
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
1 Common Secure Index for Conjunctive Keyword-Based Retrieval over Encrypted Data Peishun Wang, Huaxiong Wang, and Josef Pieprzyk: SDM LNCS, vol.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Secure Conjunctive Keyword Search Over Encrypted Data Philippe Golle Jessica Staddon Palo Alto Research Center Brent Waters Princeton University.
Computer System Design Lab 1 Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Strong Privacy Guarantee Bing Wang * Wei Song *†
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
1 Data Mining: Text Mining. 2 Information Retrieval Techniques Index Terms (Attribute) Selection: Stop list Word stem Index terms weighting methods Terms.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
2011 IEEE TrustCom-11 Sushmita Ruj Amiya Nayak and Ivan Stojmenovic Regular Seminar Tae Hoon Kim.
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Scalable Verifiable Encrypted Search Encrypted Search with Third Party Support and Protection From Dishonest Data Stores.
SDSM IN MOBILE CLOUD COMPUTING By- ID NO-1069 K.C. SHARMILAADEVI Sethu Institute Of Tech IV year-ECE Department CEC Batch: AUG 2012.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
SSE-2 Step1: keygen(1 k ):s {0,1} k,output K=s Step2:Buildindex(K,D): 建立 table T, p=word bit+max bit R 假設 w 1 出現在 D 1,D 3 T[π s (w 1 ||1)]=D 1 T[π s (w.
Advanced Science and Technology Letters Vol.31 (MulGraB 2013), pp An Efficient and Privacy-Preserving.
Shucheng Yu, Cong Wang, Kui Ren,
Searchable Encryption in Cloud
Efficient Multi-User Indexing for Secure Keyword Search
Privacy Preserving Ranked Multi-Keyword
Design open relay based DNS blacklist system
Chapter 5: Information Retrieval and Web Search
Verifiable Attribute Based Keyword Search with Fine-Grained Owner-Enforced Search Authorization in the Cloud They really need a shorter title.
Presentation transcript:

Presented By Amarjit Datta Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data Presented By Amarjit Datta

Authors and Publication Information Ning Cao PhD in ECE from the Worcester Polytechnic Institute Cong Wang PhD in ECE from Illinois Institute of Technology Ming Li PhD in ECE from the Worcester Polytechnic Institute Kui Ren PhD in ECE from the Worcester Polytechnic Institute. Wenjing Lou PhD in ECE from the University of Florida

Table of Contents Introduction Problem Domain Some Important Definitions MRSE framework and version MRSE schema analysis MRSE schema improvements

Introduction Cloud computing is becoming more and more popular nowadays… Why cloud is so popular? Minimum startup cost Pay-as-you-go Easily scalable No server administration overhead

Introduction While uploading contents in cloud…there can be many security issues. There can be; man-in-the-middle attack sniffing packets spoofing IP addresses and many more… In this research paper, authors analyzed the privacy issues that can happen…after the content is uploaded on cloud.

Introduction Cloud server acts as “honest-but-curious” “honest” – It follows the designated protocols “curious” – It want to infer and analyze data in it’s storage “So we will have to search encrypted data, hosted in cloud environment, without sharing private information with the cloud”

Introduction So what data owners can do about it? Data owner can encrypt his files before uploading it to cloud. But how can they search encrypted files in cloud? Traditional plain text keyword search won’t work.

Introduction We also need search results in a ranked order (Example: most relevant) Coordinate matching: Search for as many keyword matches as possible in the document.

Privacy must be preserved Problem Definition Performing single keyword based search over encrypted data is already widely researched. This paper explores 2 new use-cases… Multi keyword based search over encrypted cloud data. Ranked search (Sort results based on relevance). Here the paper used “coordinated matching for rank analysis. Privacy must be preserved

Problem Model

Problem Formulation Data owner has collection of data documents and their encrypted forms. Data owner creates a encrypted searchable index. Both encrypted file and encrypted searchable index are copied to the cloud server. To search, data users need corresponding trapdoor T

Problem Formulation Based on the amount of information cloud server knows… Known Ciphertext model - Cloud server will only know the encrypted dataset and searchable indexes. Known background model - Cloud server will know the encrypted dataset + searchable index + additional information (Example: Correlation of data search query)

This is what we want! Data privacy Encrypting data file and searchable index file Keyword privacy Hide what users are searching Trapdoor unlinkability Trapdoor generation function should be randomized instead of deterministic one.

Lets Check MRSE Schema! Main Idea is to confuse the cloud server So that it cannot detect the search key words and document type. We can do that using randomization on different steps… Lets Check MRSE Schema!

Notations

MRSE – Basic Framework

MRSE Framework Upload encrypted files and indexes Query Key Data owner Setup Trapdoor Build Index Key Data owner Data user

How to Do Ranking? - Similarity Calculation Di is a binary data vector for document Fi where each bit Di is either 0 or 1 represents the existence of the corresponding keyword Wj in that document Q is a binary query vector indicating the keywords of interest where each bit Qj represents the existence of the corresponding keyword Wj in the query. The similarity score of document Fi to query is therefore expressed as the inner product of their binary column vectors, i.e., Di . Q.

MRSE_I Scheme

MRSE_I Scheme Setup: The data owner randomly generates a (n+2)-bit vector as S and two (n+2) x (n+2) invertible matrices M1;M2. Generate secret key SK is in the form of a 3-tuple as {S;M1;M2} n is the number of fields for each record n + 2 is = n + 1 + {dummy random keyword} Build-Index: The data owner generates a binary data vector Di for every document Fi, where each binary bit Di[j] represents whether the corresponding keyword Wj appears in the document Fi.

MRSE_I Scheme Trapdoor: With t keywords of interest, one binary vector Q is generated where each bit Qj indicates whether Wj belongs to W is true or false. Based on this vector, trapdoor is generated. Query: With the trapdoor, the cloud server computes the similarity scores of each document Fi. After sorting all scores, the cloud server returns the top-k ranked id list

MRSE_I - Analysis Functionality: Random dummy keyword introduced can follow a normal distribution where the standard deviation functions as a flexible tradeoff parameter among search accuracy and security.

MRSE_I - Analysis Data privacy: Is preserved by the encryption of data. Index privacy: Secret until the secure key is protected. With the randomness introduced by the splitting process and the random numbers r, and t, our basic scheme can generate two totally different trapdoors for the same query.

Improvement of MRSE_1 MRSE is secure enough for known Cyphertext model. But for known background model, this is not sufficient. For example: Document frequency, which can be further combined with background information to identify the keyword in a query at high probability.

Improvement of MRSE_1 - Scale Analysis Attack Given two correlated trapdoors T1 and T2 for query keywords {K1;K2} and {K1;K2;K3} and three documents, respectively, the cloud server could deduce that whether all the three documents contain K3 or none of them contain K3. From this cloud server can find out document frequency

MRSE_2 Scheme U is the number of dummy keywords inserted. In MRSE_1, only 1 dummy keyword was used in 1 document. Both Build Index and Query considers U

More Improvement So far we have used number of keywords available in the document count only for doing ranking. But there can be some other important facts too. For example: When a keyword appears in all documents, it’s important is less. So considering keyword weight while ranking documents can be an improvement

More Improvement MRSE_I_TF schema is the improved version of MRSE that considers weight of the keyword during similarity calculation. MRSE_2_TF schema incorporate both the idea of MRSE_I_TF (weighted keyword) and MRSE_2 (List of random dummy keywords)

Future Possible Work For future work, authors will explore checking the integrity of the rank order in the search result assuming the cloud server is untrusted.