On the Trade-Offs in Oblivious Execution Techniques Shruti Tople and Prateek Saxena National University of Singapore
Motivation Privacy preserving computation is important Health data, financial data, personal documents Secure computation FHE, PHE, Garbled Circuits Middle ground: Trusted Computing ARM Trustzones AMD Memory Encryption Intel SGX
Problem Beyond ensuring security of each low-level operation A Challenge: Input Oblivious Execution No leakage about input Via execution profile or any side channels We study this problem when: Program runs within hardware isolated enclaves Input and output are encrypted Program I O
Research Questions To understand privacy vs. performance trade-off? What are the fundamental limitations in ensuring input-obliviousness? Do these limitations manifest in practical applications? To understand privacy vs. performance trade-off?
Contributions What are the fundamental limitations in ensuring input-obliviousness? Composibility or logic reuse Leakage Channel Morphism Do these limitations manifest in practical applications? Affirmative based on CoreUtils applications To understand privacy vs. performance trade-off? 2 applications incur exponential overhead
Enclaved Execution Setting
Threat Model Attacker’s knowledge set: Enclave Program Untrusted Program Compromised OS Trusted program within enclave OS can invoke applications / enclaves Filesystem accesses via OS Input / Output Encrypted (Filesystem) Encrypted Storage Read Operating System Syscall Interface Filesystem Mgmt. Write Secure Processor Attacker’s knowledge set: Encrypted input and output, their size Enclave program logic Execution profile of enclave
Leakage Channels Execution profile consists of : Sequence of read/write calls Size of data bytes File Access Patterns Time Interval Goal : To leak no additional information about the input beyond attacker’s knowledge set
Fundamental Limitations
Composibility Composition of leakage from several programs Leaks the frequency of characters! $nl input.txt 1 E (1) E ( ) E (H) E (e) E (l) E(o) fold_out.txt E(Hello) Input.txt $split - l E(1) fold_out.txt 3 E(1) E( ) E(H) E(e) E(l) E(o) s1.txt s2.txt s3.txt s4.txt s5.txt s6.txt s7.txt E(1 Hello) nl_out.txt $fold - w E(1 Hello) nl_out.txt 2 $comm s5.txt s6.txt 4 E(l) Logic-Reuse Attack
Channel Morphism Fixing one channel exacerbates another! while((line2=getline(inbuffer)) != NULL){ if ((linecompare(line1, line2)) == true) match = true; else{ if (match == true){ write(repeat_out, line1, 1, strlen(line1)); match = false; } else write(uniq_out, line1, 1, strlen(line1)); line1 = line2; Original Program while(…) { … if (match ==true) strcat(r_buf, line1); match = false; else strcat(u_buf, line1); } write(repeat_out, r_buf,1, strlen(rbuf)); write(uniq_out, u_buf, 1, stlen(u_buf)); Transformed Program Time of write calls leaks no. of repeated and unique lines in input file To hide leakage from this channel Concatenate lines & Shift write calls outside the while loop Leakage morphs from Time to Size of data The calls write different size of data to each file Fixing one channel exacerbates another!
Exponential Overhead Eliminate all side-channels Exponentially worse runtime Eliminating leakage in ‘split’ program - Splits on new line (‘\n’) - Writes each line to a file - Leaks no. of lines in input file n_read=safe_read(STDIN_FILENO, buf, bufsize); while(true){ bp =memchr(bp, ‘\n’, eob-bp+1); if (bp ==eob) break; ++bp; if (++n >= n_lines){ cwrite(new_file_flag, bp_out, bp – bp_out); bp_out = bp; new_file_flag = true; n = 0;} } For a 1 GB input file Best case – Only 1 line Worst case – 230 lines Thus, hiding leakage incurs exponential blow up in overhead
Defenses and Limitations
Deterministic To ensure same profile for all inputs of same size Padding dummy data bytes Add fake instructions Linear scan all possible files Upper bound for loops Worst-case performance Undecidability of static analysis
Randomization To make execution profiles indistinguishable Addition of noise Random padding Inserting intermittent fake calls Knowledge of input distribution Infeasible profile Inufficient entropy
Evaluation
Case Studies To understand the “privacy vs. performance” trade-off in practice, we select: CoreUtils: Commonly used across platforms 30 file-based input applications Each program executes within enclave paste, sort, shuf, ptx, expand,pr, unexpand,tac, grep, cut, join, sum uniq, comm, fold, od,fmt, nl, cksum mdsum, tr, head, tail, tsort, split, csplit,file, wc, cat, base64
No Overhead 6 applications are input-oblivious by default sum, cksum, cat, base64, od and md5sum while((bytes_read = fread (buf, 1, BUFLEN, fp)) > 0){ unsigned char*cp = buf; length += bytes_read; while(bytes_read--){ crc = (crc << 8) ^ crctab | ((crc >> 24) ^ *cp++); if(feof(fp)) break; } printf (“%u, %s”, crc, fp);
Incur Performance Penalty Transformed with constant overhead 11 applications With O(N) overhead Other 11 applications 2 applications exhibit exponential overhead
Take away Enclaved execution does not directly enable privacy preserving computation Achieving input-obliviousness does exhibit fundamental limitations in practical applications Ensuring privacy incurs exponential overhead in worst case
Thanks! Email : shruti90@comp.nus.edu.sg