Attila A. Yavuz Oregon State University attila.yavuz@oregonstate.edu 22nd International Conference on Selected Areas in Cryptography Dynamic Symmetric Searchable Encryption with Minimal Leakage and Efficient Updates on Commodity Hardware Attila A. Yavuz Oregon State University attila.yavuz@oregonstate.edu Jorge Guajardo Robert Bosch LLC – RTC, USA Jorge.GuajardoMerchan@us.bosch.com SAC 2015 Dr. Attila Altay Yavuz August 13, 2015
Challenge: Privacy versus Data Utilization Dilemma Sensitive information Storage on the cloud Client (encrypted) Outsource the data SEARCH? ANALYZE? Standard Encryption CAN’T SEARCH! CAN’T ANALYZE! One of the biggest challenges in the domain of cloud computing is privacy versus data utilization dilemma. Lets give an example of privacy versus data utilization dilemma. Consider a cloud-based application, in which a client outsource his data to the cloud for financial, maintanence and service benefits. However, sensitive data such as health, financial or personal information cannot be stored on cloud. Hence, the client encrypts the data before outsourcing it. The problem begins, when the client wants to retrieve or analyze the encrypted data on the cloud. Because, the standard encryption techniques do not permit search or analysis of data. This is just an example privacy versus data utilization dilemma, wherein we either encrypt and protect the privacy but sacrifice search/analysis, or don’t encrypt and lose privacy. Notice that it is such a fundamental challenge, it appears everywhere, such as healtcare system where you want keep medical data encrypted against breaches, you want privacy-preserving data mining, and similarly for financial applications. Therefore, practical solutions for this challenge will create a significant impact ion the society in general and science at large IMPACT
Searchable Representation Searchable Encryption (Generic Framework) f1 fn Client Cloud . . . Data Structure t1 tn . . . Searchable Representation c1 cn . . . Extract keywords w1 wn . . . Trapdoors tn . . . t1 c1 Towards addressing this research challenge, I developed and currently execute the search program, “Efficient Privacy …..” as a Co-PI, which is funded by Bosch Research Center. In this research program, my focus will be developing efficient and practical searchable encryption methods. Search keyword: w1 t1 t1 f1 Update file: fi (zi,V) (zi,V)
How to Evaluate a DSSE? No single DSSE is better than all others for all aspects Security/Performance Trade-off: Choice of data structure Operations on it Performance Search and update execution times Client storage (auxiliary info) Server storage (encrypted data structure) Communication overhead # rounds network latency Security Information leakage due to updates
Prior Work on Searchable Encryption (Milestones) Curtmola et al. (CCS 2006) Single linked list (+) Efficient encrypted searches (-) No update on files (addition/removal not possible) Variants of CCS 2006 with various properties: Ranked, multi-keyword, wildcard, … (-) No update and inefficient Kamara et. al. (CCS 2012) Multi-linked list (+) Updates: New files can be added/removed (-) Update leaks information (insecure updates) Kamara et. al. (FC 2013) Red-black trees (+) Secure updates (-) Searchable words are fixed (cannot add a new keyword later) (-) Extremely large cloud storage (multi TBs, impractical) Wang et. al. [TPDS 2012], Cao et. al. [Infocom 2011], Sedghi [SCN 10], …
Prior Work on Searchable Encryption (Milestones) Stefanov et al. (NDSS 2014) Multi-arrays + Oblivious sort (+) Higher security, efficient searches (-) Larger client storage and transmission, high server storage Cash et al. (NDSS 2014) Generic dictionaries (+) Conjunctive, boolean queries, balanced and efficient search/update, (+) Tests on very large scale DBMS (-) Database grows linearly with update, client permanent storage, leaks more than Stefanov et al. (NDSS 2014) Hann et al. (CCS 2014) (+) Update efficient, secure updates, leaks less than above (-) Client storage, slower search Naveed et al. (S&P 2015) Blind storage (+) High security, search/update efficiency (-) Single keyword only, interactive (e.g., network delays), cannot update file content, add/remove them only
Contribution: A New Dynamic Symmetric SE Scheme (+) The highest privacy among all compared alternatives (+) Simple design (+) Low update communication overhead, one round only (+) Low server storage 1 bits - per keyword/file pair No growth with updates, no revocation lists… (+) Dynamic keywords, parallelism (-) Linear search w.r.t # of files, O(m/b)/p (-) O(n+m) client storage due to hash tables (e.g., n=m=10^7, ~160 MB) Can store/fetch from cloud Monster Inc 2. game on Iphone ~ 200 MB… (+) Efficient practicality on commodity hardware
Our Scheme: Searchable Representation Searchable Representation: Binary matrix I Row i, {1,…,m} keyword wi, column j, {1,…,n} file fj If I[i,j]=1 then keyword wi appears in file fj, otherwise not Integrates index and inverted index, simple yet efficient Search via row operations inverted index Update via column operations index Files f1 f2 . . . fn Keywords w1 w2 . wm (i,j) 1 2 . n m
Our Scheme: Map keyword/file to the matrix Keyword w {1,…, m} and file f {1, … , n} : Dynamic and efficient Map a keyword to a row i: Open address hash tables: Collision-free (one-to-one), O(1) access Map a file to column j: TF 1, z100 2,z250 . . . 128,zl … 257,zr n,z6 (i,j) 1 2 . . . 128 256 … n . m TW 1,t55 2, t300 . m, t2
Our Scheme: Encrypt Searchable Representation (basics) Derive row key Encrypt each row i with ri (b=1, or AES b=128 CTR mode) (i,j) 1 . . . 128 256 n . m r1 . rm Achieving Dynamic Keywords: Static schemes: Derived keys from keywords Break static relation between keys and keywords Derive keys from a row number and link it to any keyword via HT
Our Scheme: Search on Encrypted Representation (only basics) Cloud Client Search keyword w on I’ : Decrypt i’th row of I’[i,*] with ri I[i,*] I’ 1 . . . 128 n . i m I[i,j]=1 then ciphertext cj contains tw I 1 .. 55 253 254 n i Decrypt with k4 Get f1,f55,…,fn c1 c55 c253 cn
Our Scheme: Update on Encrypted Representation (b=1) Cloud Client Add a new file f to I’ : Replace new column with j’th column of I’ I’ 1 . . . j n . m … 1 … 1 E(.)
Our Scheme: Update on Encrypted Representation (b=128) Cloud Client Add a new file f to I’ : Overrides on b-1 regions! Inconsistency I’ 1 . . . j n . m … 1 ? … ? … … 1 ? … ? … E(.) b=128
Our Scheme: Update on Encrypted Representation (b=128) Cloud Client Add a new file f to I’ : One round of interaction and key renewal I’ 1 . . . j n . m … 1 … 1 D(B_j) Renew keys 2) E(B_j’) b=128
Search-Update Coordination for High Privacy Various regions, various distinct keys! F_j, Update 100 F_n, Update 1000 I 1 . . . j n K_1 K_3 K_5 . K_x K_2 K_4 m K # of search on row i # update on column j Sequence of operations w=“email”, searched 100 w=“EU-CMA”, searched 1 Update Exposed Re-encrypt Search Search gc TF[j].st, state bit No expose Re-encrypt Update Update Key update encrypt Search Update TW[i].st
Security Analysis of Our DSSE (Very Brief) Confidentiality focus (integrity/auth can be added) Access Pattern: File identifiers that satisfy a search query (search results) Search Pattern: History of searches (whether a search token used at past) IND-CKA2 (Adaptive Chosen Keyword Attacks): Given {I’, c0,..,cn, z0, …,zn, t0,…,tm}, no adversary can learn any information about f0,…,fn and w0,…,wm other than the access and search pattern, even if queries are adaptive. Leakage Functions are critical for updates Theorem 1: Our DSSE scheme (L1,L2)-secure in ROM based on IND-CKA2, where L1 and L2 leak access and search pattern, respectively. Real and simulated views are indistinguishable due to PRF and IND-CPA cipher.
High-Level Comparison / 3
Implementation Details of Our DSSE C/C++ Own Lines of code : 10528 Tomcrypt API Symmetric Key Encryption: AES-CTR 128-bit MAC: CMAC-128 Key Derivation Function : CMAC-128 File encryption : CCM (Counter with CBC-MAC) Intel AESNI sample library For AES implementation using assembly language instructions. As KDF, we further exploit AES-ASM by using CMAC. Hash tables, Google open source static C++ data structure
Implementation ( Benchmarking Results ) Operation Avg time (msec) #keyword : 1,000,000 #file : 5,000 #keyword : 200,000 #file : 50,000 #keyword : 2,000 #file : 2,000,000 Build Index 822.6 493 461 Search Keyword 0.01 0.27 10.02 Add File 2772 472 8.83 Delete File 2362 329 8.77 Enron email dataset, Ubuntu 13.10 OS, 4 GB RAM, Intel i5 processor, 256 GB harddisk All operations are practical Search under a msec, and only 10 msec for 2 millions of files Update various 8 msec to 2 sec Neden add file daha pahali: Cunku her bir row icin ayri key ile AES invocation yapiliyor, search de ise ayni keyle invoke ediliyor
Conclusion A new DSSE with various desirable properties A new DSSE with various desirable properties (+) The highest level of privacy (+) Simple yet efficient, compact updates and storage (+) Keyword updates, parallelism, extendable to multiple keyword queries (-) Asymptotically linear search and client storage But still quite practical on commodity hardware TAKEAWAYS: Simplicity wins! Asymptotic results are not enough to assess the practicality (actual implementation, details, hidden constants) Practical storage at the client is NOT evil (actually beneficial) 91 submission, 26 long paper, 3 short papers 6:30 pm, Building 22, Purdy Crawford for Arts
Thank You!