Regular Expression Search over Encrypted Big Data in the Cloud Mohsen Amini Salehi Visiting Assistant Professor CACS Department Spring ‘15 1.

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
GridNets2009, Athens 08/09/09 Business Models, Accounting and Billing Concepts in Grid-aware Networks Serafim Kotrotsos 1, Peter Racz 2, Cristian Morariu.
Cloud Computing Security Monir Azraoui, Kaoutar Elkhiyaoui, Refik Molva, Melek Ӧ nen, Pasquale Puzio December 18, 2013 – Sophia-Antipolis, France.
Cobalt: Separating content distribution from authorization in distributed file systems Kaushik Veeraraghavan Andrew Myrick Jason Flinn University of Michigan.
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
多媒體網路安全實驗室 Towards Secure and Effective Utilization over Encrypted Cloud Data 報告人 : 葉瑞群 日期 :2012/05/09 出處 :IEEE Transactions on Knowledge and Data Engineering.
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 10 04/18/2011 Security and Privacy in Cloud Computing.
Integrating Semantics-Based Access Mechanisms with P2P File Systems Yingwu Zhu, Honghao Wang and Yiming Hu.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
Using Local Information for Personalized Search Haward Jie CS 290C.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
Jeremy Boyd Director – Mindscape MSDN Regional Director
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Document Management Systems For Human Resource Department Infocrew Solutions Pvt.Ltd.
Databases & Data Warehouses Chapter 3 Database Processing.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Data and its manifestations. Storage and Retrieval techniques.
Information Explosion. Reality: New Machine-Generated Data Non-relational and relational data outside of the EDW † Source: Analytics Platforms – Beyond.
Martin-1 CSE 5810 CSE 5810 Individual Research Project: Integration of Named Data Networking for Improved Healthcare Data Handling Robert Martin Computer.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Semantic based P2P System for local e-Government Fernando Ortiz-Rodriguez 1, Raúl Palma de León 2 and Boris Villazón-Terrazas 2 1 1Universidad Tamaulipeca.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
Secure Conjunctive Keyword Search Over Encrypted Data Philippe Golle Jessica Staddon Palo Alto Research Center Brent Waters Princeton University.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
Data Integrity Proofs in Cloud Storage Author: Sravan Kumar R and Ashutosh Saxena. Source: The Third International Conference on Communication Systems.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Secure Data Outsourcing
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
INFORMATION STROAGE AND RETRIEVAL SYSTEM By Ms. Preeti Patel Lecturer School of Library And Information Science DAVV, Indore
配色参考方案: 建议同一页面内 不超过四种颜色, 以下是 13 组配色 方案,同一页面 内只选择一组使 用。(仅供参考) 客户或者合作 伙伴的标志放 在右上角. 英文标题 :32-35pt 颜色 : R153 G0 B0 内部使用字体 : FrutigerNext LT Medium 外部使用字体 :
Data Centers and Cloud Computing 1. 2 Data Centers 3.
All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
A presentation on ElasticSearch
Searchable Encryption in Cloud
Efficient Multi-User Indexing for Secure Keyword Search
Cloud based linked data platform for Structural Engineering Experiment
Stochastic Robust Scheduling in Heterogeneous Computing Systems
Privacy Preserving Ranked Multi-Keyword
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Chair of Tech Committee, BetterGrids.org
Lecture 12: Data Wrangling
Physical Database Design
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad
Database Systems Instructor Name: Lecture-3.
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Optimal Partitioning of Data Chunks in Deduplication Systems
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Regular Expression Search over Encrypted Big Data in the Cloud Mohsen Amini Salehi Visiting Assistant Professor CACS Department Spring ‘15 1

Research Interests 2 Data Sanitization in Cloud Robust Cloud Resource Allocation for real time processing of big data

Outline Background Security Challenges in the Cloud Our Solution: “RESeED ” Evaluations Future Research Plans 3

Motivation Storage Clouds have emerged in response to the data explosion The need for more data security on Consumer Platforms o o Social Networks o Clouds 4

Motivation Clouds are not trustworthy!  10% reduce in foreign contracts with Cloud providers  Half of businesses are uncomfortable to deal with Cloud Solution?  User-side Encryption!  Clouds become dumb block storage Disadvantages of encryption:  Not transparent to the user  Involves storage and processing overheads  No search capabilities on the stored data

Keyword Search and Beyond Boneh et al. provide keyword search over encrypted data What if we need more than keyword search?  We are looking for all documents authored by Andrew Stuart Tanenbaum A S Tanenbaum Andrew s Tanenbaum Boneh complexity for regular expression: O(2 n ) (n number of tokens) (Andrew|A) \s*(stuart|s) \s*Tanenbaum

Problem Definition How can we perform a regular expression search over the encrypted data in the Cloud? The solution  Should not share any information with the Cloud  Should not require any infrastructural change from the Cloud  Should work in multi-cloud scenarios

Scenario 8

RESeED Architecture 9

Extracting Meta Data Column Store  Map from Token to files in which it appeared Order Store  A fuzzy (i.e., hashed) representation of the file  Used to match the correct order of words in the search query  For each token in a file, create an N byte hash o What is the proper hash-width (N)?

RESeED Architecture 11

RESeED Search Operation To search for a regular expression r: 1.Convert r into an Nondeterministic Finite Automata (NFA) 2.The NFA is partitioned to a set of sub-NFAs based on ω-transformation 3.For each token in Column Store: A.Check if it is matched with a sub-NFA 4.For files that accept all sub-NFAs: A.Using Order Store, verify if they have the expression in the expected order

RESeED Commercialization on Fortivault (Fortinet Gateway) 13

RESeED Demo:

Dropbox as a Dumb Storage 15

Proposal 1: Parallel RESeED How RESeED can handle big-data scale data sets? Proposal: parallelize data processing in RESeED Parallelization levels  Column Store matching  Order Store matching Hadoop/spark based processing for matching against Column store? 16

Parallelizing Bottlenecks 17

Project Proposal: Semantic search over encrypted data Semantic search is required to explore increasingly stored data on the Cloud How can we do semantic search on encrypted data? One current approach: synonym based.  User adds some keywords for each file  System indexes the keywords and their synonyms 18

Our Approach Current system does not consider ranking of the results Current system does not consider multiple keywords in the search query Current system does not differentiate between synonyms and keywords  It does not consider how related a query with a file Current system does not work for big data 19

Thank you for your time. Any Question? 20