University of Maryland

Slides:



Advertisements
Similar presentations
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Advertisements

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
CS7380: Privacy Aware Computing Oblivious RAM 1. Motivation  Starting from software protection Prevent from software piracy A valid method is using hardware.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
E.G.M. Petrakissearching1 Searching  Find an element in a collection in the main memory or on the disk  collection: (K 1,I 1 ),(K 2,I 2 )…(K N,I N )
Data Structures Hashing Uri Zwick January 2014.
Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
ObliviStore High Performance Oblivious Cloud Storage Emil StefanovElaine Shi
1 Physical Data Organization and Indexing Lecture 14.
Oblivious Data Structures Xiao Shaun Wang, Kartik Nayak, Chang Liu, T-H. Hubert Chan, Elaine Shi, Emil Stefanov, Yan Huang 1.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
DBI313. MetricOLTPDWLog Read/Write mixMostly reads, smaller # of rows at a time Scan intensive, large portions of data at a time, bulk loading Mostly.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Equivalence Between Priority Queues and Sorting in External Memory
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
Searching Over Encrypted Data Charalampos Papamanthou ECE and UMIACS University of Maryland, College Park Research Supported By.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
All Your Queries Are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.
All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.
1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.
Algorithm Analysis 1.
Practical Private Range Search Revisited
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
Query Processing Part 1: Managing Disks 1.
Searchable Encryption in Cloud
Tian Xia and Donghui Zhang Northeastern University
Rekeying for Encrypted Deduplication Storage
Module 11: File Structure
Efficient Multi-User Indexing for Secure Keyword Search
Oblivious Parallel RAM: Improved Efficiency and Generic Constructions
Tree-Structured Indexes: Introduction
Indexing & querying text
Lecture 16: Data Storage Wednesday, November 6, 2006.
Tree-Structured Indexes
Database Management Systems (CS 564)
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Computing Connected Components on Parallel Computers
External Sorting Chapter 13 (Sec ): Ramakrishnan & Gehrke and
RE-Tree: An Efficient Index Structure for Regular Expressions
Fast Searchable Encryption with Tunable Locality
CSCI 104 Log Structured Merge Trees
Chapter 12: Query Processing
Oblivious RAM: A Dissection and Experimental Evaluation
Sorting atomic items Chapter 5 Distribution based sorting paradigms.
Join Processing for Flash SSDs: Remembering Past Lessons
Selectivity Estimation of Big Spatial Data
Lecture 11: DMBS Internals
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Verifiable Oblivious Storage
File Systems Implementation
Advanced Topics in Data Management
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
Lecture 7: Index Construction
Lecture 2- Query Processing (continued)
File Storage and Indexing
File Storage and Indexing
8th Workshop on Massive Data Algorithms, August 23, 2016
Indexing 4/11/2019.
CENG 351 Data Management and File Structures
Path Oram An Extremely Simple Oblivious RAM Protocol
Efficient Aggregation over Objects with Extent
CSE 190D Database System Implementation
Virtual Memory 1 1.
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Presentation transcript:

University of Maryland Searchable Encryption with Optimal Locality: Achieving Sublogarithmic Read Efficiency Ioannis Demertzis Dimitris Papadopoulos Charalampos Papamanthou University of Maryland Hong Kong UST University of Maryland yannis@umd.edu dipapado@cse.ust.hk cpap@umd.edu

? What is Searchable Encryption (SE)? + search query: Untrusted Cloud ? Client Search pattern: whether a search query is repeated + search query: keyword Setup leakage: total leakage prior to query execution e.g. size of each encrypted file, size of encrypted index Access pattern: encrypted document ids and files that satisfy the search query Security (informal): The adversary does not learn anything beyond the above leakages! 2

X X X X X X X Searchable Encryption – Locality and Read Efficiency Locality is an important efficiency dimension ([CJS+14],[DP17],… Scalable SE requires low locality and read efficiency Locality: #non-continuous reads for each query Read Efficiency: #memory locations per result item locality = 3 & read efficiency = 1 search query: keyword id1 id4 id2 X X X X X X id4 id5 id3 id1 id2 id6 X : false positives locality = 1 & read efficiency = O(N) 3

[ANS+16] – TwoChoiceAlloc * [ANS+16] – OneChoiceAlloc *under some assumptions for the SE scheme Previous Works & Our Result “Cash and Tessaro Eurocrypt 2014” Locality (L): O(1) and Read Efficiency (R): O(1) requires Space (S): ω(Ν) * Schemes with limitation on the maximum keyword-list size General Schemes [ANS+16] – NlogN scheme L: O(1), R: O(1), S: O(NlogN) [ANS+16] – TwoChoiceAlloc * L: O(1), R: O(loglogN), S: O(N) ~ [DP17] – ReadOpt L: O(N1/s), R: O(1), S: O(sN) * keyword lists in the dataset have size less than N1-1/loglogN . [ASS18]** L: O(1), R: O(ω(1)*ε-1(n) + logloglogN) for n = N1-ε(N), S: O(N) [ANS+16] – OneChoiceAlloc L: O(1), R: O(logN), S: O(N) ~ ** keyword lists in the dataset have size less than N/log3N Our Approach L: O(1), R: O(logγN), S: O(N), for γ>2/3 4

locality = 1 & read efficiency = 1 & optimal space Searchable Encryption – Naïve Approach 1 k1= k2= k3= <=4 <=3 locality = 1 & read efficiency = 1 & optimal space 5

Searchable Encryption – Naïve Approach 2 k1= k2= k3= ? 6

… [ANS+16]– OneChoiceAllocation O(N) space, O(1) locality and O(logN) read efficiency ~ k1= k2= k3= k1 … 3 logN loglogN M = N / logN loglogN 7

… [ANS+16]– TwoChoiceAllocation O(N) space, O(1) locality and O(loglogN) read efficiency ~ k1= k2= k3= ** Assuming all the keyword lists in the dataset have size less than N1-1/loglogN ** … c loglogN log2loglogN M = N / loglogN log2loglogN 7

… [ANS+16]– TwoChoiceAllocation O(N) space, O(1) locality and O(loglogN) read efficiency ~ k2= k1= k3= ** Assuming all the keyword lists in the dataset have size less than N1-1/loglogN ** k3 … c loglogN log2loglogN M = N / loglogN log2loglogN 7

[ANS+16]-OneChoiceAlloc Our Approach O(N) space, O(1) locality and O(logγN), for γ>2/3 Read Efficiency ~ [ANS+16]-OneChoiceAlloc Ο(logΝ) [ANS+16] TwoChoiceAlloc ~ O(loglogN) N1-1/loglogN N Keyword-list size 8

[ANS+16]-OneChoiceAlloc Our Approach O(N) space, O(1) locality and O(logγN), for γ>2/3 Read Efficiency ~ [ANS+16]-OneChoiceAlloc Ο(logΝ) [ANS+16] TwoChoiceAlloc ~ O(loglogN) N1-1/loglogN N Keyword-list size 8

[ANS+16]-OneChoiceAlloc Our Approach O(N) space, O(1) locality and O(logγN), for γ>2/3 Read Efficiency ~ [ANS+16]-OneChoiceAlloc Ο(logΝ) Ο(logγΝ) [ANS+16] TwoChoiceAlloc ~ O(loglogN) N1-1/loglogN N1-1/log N 1-γ N Keyword-list size 8

[ANS+16]-OneChoiceAlloc Our Approach O(N) space, O(1) locality and O(logγN), for γ>2/3 Read Efficiency ~ [ANS+16]-OneChoiceAlloc Ο(logΝ) Small Huge Ο(logγΝ) [ANS+16] TwoChoiceAlloc Sequential Scan ~ O(loglogN) N1-1/loglogN N1-1/log N 1-γ N/log N γ N Keyword-list size 8

Our Approach O(N) space, O(1) locality and O(logγN), for γ>2/3 Read Efficiency Focus of this talk! ~ [ANS+16]-OneChoiceAlloc Ο(logΝ) Small Medium Large Huge Ο(logγΝ) [ANS+16] TwoChoiceAlloc Multi-level keyword-size compression Sequential Scan ~ O(loglogN) N1-1/loglogN N1-1/log N 1-γ N/log N 2 N/log N γ N Keyword-list size 8

… Starting Point: Offline Two Choice Allocation (OTA) – [SEK03] OfflineTwoChoiceAlloc for m balls and n bins: MaxFlow( ) … n bins 9

… Starting Point: Offline Two Choice Allocation (OTA) – [SEK03] Γ L OfflineTwoChoiceAlloc for m balls and n bins: Key IDEA: One OTA per size and then Merge!! with probability at least 1 – O(1/n) … Max load <= m/n + 1 Γ L n bins 10

… … … … … Σs( ) = Our Approach: OTA per size + Merge Γ L ? A4s A2s As ks: #keyword lists with size s bs=M/s (#superbuckets) … … A4s Overflow Probability = O(1/bs) … A2s ks/bs + 1 Γ L See Lemma 4 in our paper Σs( ) = … O(N/M + logγΝ) As M = Ν/logγΝ = Ο(logγΝ) M … ? 11

… … … … … … … … … Our Approach: New analysis for OTA Our Approach: Accessing keyword lists **Novel analysis for OTA** The probability that more than O(log2N) lists of size s overflow is negligible! – see Lemma 5 in our paper … … … … A4s B4s k3 … … A2s B2s … … As Bs M Stashes … Ο(logγΝ) 12

Our Approach: New locality-aware ORAM Ο(n1/3log2n) Bandwidth and O(1) Locality We need an ORAM with the following properties: O(1) locality, existing ORAMs with polylogn bandwidth have logn locality Zero failure probability, since it will be applied on only log2n elements o(√n) bandwidth, in order to achieve sublogarithmic read efficiency  o(√ log2n) = o(logn) Α πα: [nα]  [nα] n + n2/3 Square Root ORAM Β πb: [nb]  [nb] n2/3 + n1/3 Hierarchical ORAM C * n1/3 De-amortization techniques from Goodrich et al. [GMO+11] 13

… … … … … … … … … … … Our Approach: OTA Stashes Amax Bmax A4s B4s A2s Important: max ≤ N/log2N for maintaining O(N) index size … … Amax Bmax … … … … A4s B4s … … A2s B2s … … As Bs M Stashes … 14

? Conclusion – Future Work Ο(logγΝ) Read Efficiency Keyword-list size Locality-aware Dynamic SE Read Efficiency Closer to the lower bound Open Question: Closer to the lower bound? New ORAM: Ο(n1/3log2n ) bandwidth, O(1) locality [ANS+16]-OneChoiceAlloc ~ Ο(logΝ) Small Medium Large Huge Ο(logγΝ) OTA-based approach Multi-level keyword-size compression [ANS+16] TwoChoiceAlloc Sequential Scan ~ O(loglogN) New probability bounds for OTA N1-1/loglogN N1-1/log N 1-γ N/log N 2 N/log N γ N Keyword-list size 15

Thank You! Ο(logγΝ) https://eprint.iacr.org/2017/749 Read Efficiency Closer to the lower bound New ORAM: Ο(n1/3log2n ) bandwidth, O(1) locality [ANS+16]-OneChoiceAlloc ~ Ο(logΝ) Small Medium Large Huge Ο(logγΝ) OTA-based approach Multi-level keyword-size compression [ANS+16] TwoChoiceAlloc Sequential Scan ~ O(loglogN) New probability bounds for OTA N1-1/loglogN N1-1/log N 1-γ N/log N 2 N/log N γ N Keyword-list size

[ANS+16]-OneChoiceAlloc [ASS18] in CRYPTO O(N) space, O(1) locality and ω(1)⋅ϵ(n)−1+O(logloglogN) read efficiency where n = N1-ϵ(n) Read Efficiency [ANS+16]-OneChoiceAlloc Ο(logΝ) Ο(logΝ/loglogN) Small Medium Large Huge Ο(logγΝ) [ANS+16] TwoChoiceAlloc ~ O(loglogN) O(logloglogN) N1-1/loglogN N1-1/log N 1-γ N/log N 3 N/log N 2 N/log N γ N Keyword-list size

Studying locality for HDD Access Cost = (seek time) + (rotational delay) + (transfer time) Random I/O Cost Sequential I/O Cost ~4-12 ms ~10 μs for 1 byte

Studying locality for SDD Samsung 960 Pro M.2 NVMe SSD Read Write Sequential Transfer Page size = 2MB 2222.93 MB/sec 1786.72 MB/sec Locality High Random Transfer Page size = 2MB 1339.76 MB/sec 1237.57 MB/sec Random Transfer Page size = 2KB 34.30 MB/sec 150.83 MB/sec Low More detailed analysis  http://www.storagereview.com/samsung_960_pro_m2_nvme_ssd_review

Studying locality for RAM Untrusted Cloud Client Tw Tw1 Tw2 Tw3 search query: keyword keyword id1 id4 id2 id4 id5 id3 id1 id2 id6 Tw search query: keyword Tw id1 id5 id4 id2 id3 id6