Securing Cloud-assisted Services

Securing Cloud-assisted Services
11:50 – 12:15 (25 mins) N. Asokan @nasokan

Services are moving to “the cloud”

Services are moving to “the cloud”
Example: cloud-based malware scanning service Example: cloud storage …

Cloud-based malware scanning service
Needs to learn about apps installed on client devices Can therefore infer personal characteristics of users

Securing cloud storage
Client-side encryption of user data is desirable But naïve client-side encryption conflicts with Storage provider’s business requirement: deduplication ([LPA15] ACM CCS ’15) End user’s usability requirement: multi-device access ([P+18] IEEE IC ‘18, CeBIT ‘16)

New privacy and security concerns arise
Example: cloud-based malware scanning service Example: cloud storage Naïve solutions conflict with other requirements privacy, usability, deployability

CloSer project: the big picture
Cloud Security Services , funded by Academy of Finland , funded by Tekes Academics collaborating with Industry Security Usability Deployability/Cost

The Circle Game: Scalable Private Membership Test Using Trusted Hardware
11:50 – 12:15 (25 mins) Sandeep Tamrakar 1 Jian Liu Andrew Paverd 1 Jan-Erik Ekberg 2 Benny Pinkas 3 N. Asokan 1 1. Aalto University, Finland 2. Huawei (work done while at Trustonic) 3. Bar-Ilan University, Israel

Malware checking On-device checking
User h(APK) Malware DB On-device checking High communication and computation costs Database changes frequently Database is revealed to everyone Cloud-based checking Minimal communication and computation costs Database can change frequently Database is not revealed to everyone User privacy at risk!

Private Membership Test (PMT)
The problem: How to preserve end user privacy when querying cloud-hosted databases? Malware DB q? User x1 x2 x3 … xn Lookup Server Server must not learn contents of client query (q). Current solutions (e.g. private set intersection, private information retrieval): Single server: expensive in both computation and/or communication Multiple independent servers: unrealistic in commercial setting Can hardware-assisted trusted execution environments provide a practical solution?

Trusted Execution Environments are pervasive
Other Software Trusted Software Protected Storage Root of Trust Hardware support for Isolated execution: Trusted Execution Environment Protected storage: Sealing Ability to report status to a remote verifier: Remote Attestation Cryptocards Trusted Platform Modules ARM TrustZone Intel Software Guard Extensions [EKA14] “Untapped potential of trusted execution environments”, IEEE S&P Magazine, 12:04 (2014)

Background: Kinibi on ARM TrustZone
Trusted Untrusted *Kinibi: Trusted OS from Trustonic Rich Execution Environment (REE) Trusted Execution Environment (TEE) Kinibi Trusted OS from Trustonic Remote attestation Establish a trusted channel Private memory Confidentiality Integrity Obliviousness Adversary Observe On-Chip Memory Client App Shared Memory Trusted App Rich OS Trusted OS (Kinibi) TrustZone Hardware Extensions

Background: Intel SGX CPU enforced TEE (enclave) Remote attestation
Trusted Untrusted OS System Memory CPU enforced TEE (enclave) Remote attestation Secure memory Confidentiality Integrity Obliviousness only within 4 KB page granularity Adversary Observe User Process Enclave Page Cache Enclave Enclave Code TEE (Encrypted & integrity-protected) App Data Enclave Data App Code Physical address space REE

Untrusted application
System model Lookup Server REE TEE x1 x2 . xn Dictionary: X Untrusted application Trusted application Dictionary provider Information leak: Memory access patterns xi r ← (q == ) Query: q Query buffer Response: r Response buffer h(APK) User Secure channel with remote attestation

Path ORAM b0 Untrusted Trusted b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 P0 P1 P2 P3 P4 P5 P6 P7 TEE Position Map b0 P3 b1 P0 b4 P7 b10 .. b14 Position Map b0 P3 b1 P0 b4 b10 .. b14 P7 P3 P0 P3 P7 P3 P1 P0 Stash O(log(n)) computational and constant communication overhead per query b0 b1 b4 b10 b15 b16 Not amenable for simultaneous queries O(mlog(n)) q  b4? q flocate_block(q) = b4 Stefanov et al. ACM CCS 2013,

Android app landscape On average a user installs 95 apps (Yahoo Aviate) Yahoo Aviate study Source: Unique new Android malware samples Source: G Data 2015: 2018: Current dictionary size < 224 entries Even comparatively “high” FPR (e.g., ~2-10) may have negligible impact on privacy

Cloud-scale PMT Verify Apps: cloud-based service to check for harmful Android apps prior to installation “… over 1 billion devices protected by Google’s security services, and over 400 million device security scans were conducted per day” Android Security 2015 Year in Review “2 billion+ Android devices checked per day” (c.f. < 17 million malware samples)

Requirements Query Privacy: Adversary cannot learn/infer query or response content User can always choose to reveal query content Accuracy: No false negatives However, some false positives are tolerable (i.e. non-zero false positive rate) Response Latency: Respond quickly to each query Server Scalability: Maximize overall throughput (queries per second)

Requirements revisited
Query Privacy: Adversary cannot learn/infer query or response content User can always choose to reveal queries Accuracy: No false negatives However, some false positives are tolerable (i.e. non-zero false positive rate) Response Latency: Respond quickly to each query Server Scalability: Maximize overall throughput (queries per second) FPR* = 2-10 Latency* ~ 1s Dictionary size* = entries (~ 67 million entries) * parameters suggested by a major anti-malware vendor

Carousel design pattern
Lookup Server REE TEE Untrusted application Trusted application x1 x2 . xn Dictionary: X Dictionary provider r = ( q  X ) Response: r Response buffer Query: q Query buffer h(APK) User

Carousel caveats Adversary can measure dictionary processing time
Spend equal time processing each dictionary entry Adversary can measure query-response time Only respond after one full carousel cycle Both impact response latency (recall Requirements) Therefore, aim to minimize carousel cycle time However, the carousel approach has two main caveats: The adversary can measure the time taken to process each entry in the dictionary, which means that we must spend equal time processing every entry, whether or not it matches one of the queries. The adversary can measure the time between when a query arrives and when the response is sent. Therefore, if the adversary knows that we have only processed part of the dictionary during that time, he learns something about the query. So we can only respond to a query after it has been in the TEE for a full carousel cycle, irrespective of where it is in the dictionary. [click] The chanllenge is that both of these affect the time taken to respond to individual queries (i.e. the response latency). Therefore, one of the sub-problems we had to solve was “how to satisfy these privacy constraints, whilst minimizing carousel cycle time?”

How to minimize carousel cycle time?
Represent dictionary using efficient data structure Various existing data structures support membership test: Bloom Filter Cuckoo hash Experimental evaluation required for carousel approach Our solution to minimizing carousel cycle time (and thus response latency) was to represent the dictionary using an efficient data structure. Intuitively this has various advantages: The dictionary representation could be shorter than the dictionary itself and thus take less time to complete a full cycle The representation can be pre-sorted, which decreases processing time for each element We already know of various data structures that support efficient membership tests (e.g. Bloom filter, hash tables etc.), but the question is, which one will perform best in the carousel approach in which the lookup server only has access to a fraction of the data structure at any given time? To answer this question, we selected a handful of the most promising data structures, implemented them in the carousel approach, and experimentally evaluated their performance.

Carousel design pattern
Lookup Server REE TEE x1 x2 . xn Dictionary: X Untrusted application Trusted application y1 y2 . ym Dictionary representation: Y Encode Dictionary provider r = ( q  Y ) Query representation Query: q Query buffer h(APK) Response: r Response buffer User Secure channel with remote attestation

Experimental evaluation
Kinibi on ARM TrustZone Samsung Exynos 5250 (Arndale) 1.7 GHz dual-core ARM Cortex-A17 Android 4.2.1 ARM GCC compiler and Kinibi libraries Maximum TA private memory: 1 MB Maximum shared memory: 1 MB Intel SGX HP EliteDesk 800 G2 desktop 3.2 GHz Intel Core i CPU 8 GB RAM Windows 7 (64 bit), 4 KB page size Microsoft C/C++ compiler Intel SGX SDK for Windows Note: Different CPU speeds and architectures

Performance: batch queries
Kinibi on ARM TrustZone Intel SGX

Performance: steady state
Breakdown points Kinibi on ARM TrustZone Intel SGX Beyond breakdown point query response latency increases over time

Other applications of PMT
Discovery of leaked passwords Private contact discovery in messaging apps … [KLSAP17] PETS 2017 Signal private contact discovery, Sep 2017

Carousel approach: pros and cons
Better scalability (for malware checking) Possible other application scenarios Membership test may not be enough Hardware security guarantees may fail

Oblivious Neural Network Predictions via MiniONN Transformations
Thanks for the introduction. I am glad to present our work about oblivious neural network predictions. N. Asokan @nasokan (Joint work with Jian Liu, Mika Juuti, Yao Lu) By Source, Fair use,

Cloud-assisted malware lookup: recap
q Malware DB response Nowadays, *machine learning as a service* is a popular paradigm to provide *online predictions*. The server usually trains a *model* and receives inputs from clients. The server runs the model on the input and returns the predictions. Let’s take *google translate* as an example. The server uses a model to translate the *sentences* submitted by the clients. In this case, clients have to *reveal* their input and predictions to the server. In some scenarios like *online medical diagnosis*, the input data can be very *sensitive*. If the server is malicious, it can *misuse* the data. For example, instead of using the input for diagnosis, it can use it to profile the users or sell the data. violation of clients’ privacy

Machine learning as a service (MLaaS)
Input Predictions Nowadays, *machine learning as a service* is a popular paradigm to provide *online predictions*. The server usually trains a *model* and receives inputs from clients. The server runs the model on the input and returns the predictions. Let’s take *google translate* as an example. The server uses a model to translate the *sentences* submitted by the clients. In this case, clients have to *reveal* their input and predictions to the server. In some scenarios like *online medical diagnosis*, the input data can be very *sensitive*. If the server is malicious, it can *misuse* the data. For example, instead of using the input for diagnosis, it can use it to profile the users or sell the data. violation of clients’ privacy

Running predictions on client-side
Model model theft evasion model inversion But this may not always be acceptable, because the model can be a *business asset* of the server. A malicious client can *steal* the model and use it to make money, Or it can adaptively *probe* the model by generating the so called *adversarial examples* to *evade detection*. This is not acceptable if the model is used for sth. Like *malware detection*. More seriously, the client can even *invert the model* and get the training data. How do we solve this problem? What we like is to have an approach that allows online prediction to be used in such a way that *either the server or the client* have to *disclose* their respective sensitive data In this talk, I am going to describe how to solve this problem in the case of deep neural networks, because it is currently the *most popular* machine learning technique.

Oblivious Neural Networks (ONN)
Given a neural network, is it possible to make it oblivious? server learns nothing about clients' input; clients learn nothing about the model. To this end, we ask the question: given *any* existing neural network model whether is it possible to make it *oblivious*. What I mean by oblivious is that during the prediction, … (stress nothing)

Example: CryptoNets FHE-encrypted input FHE-encrypted predictions
We are not the first one to consider this problem. For example last year, Microsoft research proposed a solution called *CryptoNets*. It has clients encrypt their input using fully homomorphic encryption and the server runs its model on the encrypted data returns the encrypted predictions. It provides *high throughput* for batch queries from the *same* client, but it has two problems First, it has *high overhead*. For example in the MNIST dataset that contains handwritten digits, it takes *5 min* to identify one character. For comparison, without privacy, it takes less than *1 millisecond*. Second, it can only support *limited* neural networks because homomorphic encryption cannot deal with *comparison operations* and *high-degree polynomials* which are commonly used in neural networks. For instance, in their experiments, they used a model with *square* activation function, which is not typically used. High throughput for batch queries from same client High overhead for single queries: 297.5s and 372MB (MNIST dataset) Cannot support: high-degree polynomials, comparisons, … [GDLLNW16] CryptoNets, ICML 2016 FHE: Fully homomorphic encryption (

MiniONN: Overview Blinded input Blinded predictions Low overhead: ~1s
By Source, Fair use, Blinded input oblivious protocols Blinded predictions We present a solution called MiniONN, which stands for minimizing the overhead for oblivious neural networks. We minimize overhead by only using lightweight primitives such as additively homomorphic encryption and secure two-party computation As a result, we achieves a *low* overhead with around *1s* for a single prediction. In additions, we support *all* commonly operations in neural networks. Low overhead: ~1s Support all common neural networks

All operations are in a finite field
Skip to performance Example It only has three layers: some layers represents linear transformation that consists multiplication by weight matrices and additions of biased vectors. These alternate with layers representing non-linear transformations that are called *activation layers* or *pooling layers*. Notice that X, z are sensitive to the client, whereas W and b are sensitive to the server. In order to transform all neural networks we need to be able transform linear transformations, activation functions and pooling operations This is what I am going to do in the next a few slides. For now, I assume all operations are in a finite field. Later I will show how to deal with floating-point numbers. All operations are in a finite field

Core idea: use secret sharing for oblivious computation
Skip to performance Core idea: use secret sharing for oblivious computation client & server have shares and s.t. That is, at the beginning of each layer, each party will hold a share which is indistinguishable from random to that party, and which: if added together, can recover the original input. … They will do the same thing for *all* layers and combine the shares in the end. This guarantees that if we can securely evaluate each layer, the *combination of all layers* will be secure as well. Next, I will show how to do this for each layer. In other words, I will show how to do this for linear transformation, activation function and pooling operation. client & server have shares and s.t. Use efficient cryptographic primitives (2PC, additively homomorphic encryption)

Secret sharing initial input
Skip to performance Secret sharing initial input To share the initial input, the client just randomly generates its own share x^c and calculates server’s share x^s as x minus x^c. Note that x^c is independent of x. So it can be pre-chosen. Note that xc is independent of x. Can be pre-chosen

Oblivious linear transformation
Skip to performance Oblivious linear transformation For the oblivious linear transformation layer, we start from the basic matrix multiplication. We can expand it and re-arrange it into this format. The key observation here is that this part can be computed locally by the server. The tricky part is here: W is only known by the server and x^c is only known by the client. They need to obliviously calculate the dot-product. Compute locally by the server Dot-product

Oblivious linear transformation: dot-product
Skip to performance Oblivious linear transformation: dot-product Homomorphic Encryption with SIMD Server decrypts the ciphertexts and adds together those plaintexts that corresponds to each row. Let’s call the result the vector *U*. Similarly, the client adds the r values that corresponds to each row. Let’s call the result the vector *V*. Notice that when you add the vectors U and V, the randomness will get cancelled. The result is exactly Wx^c. Notice that if we add together u and v, the random numbers got cancel out. Also notice that u v w x^c are indpendent Therefore the triplet U V x^c can be computed in a precomputation phase. We use SIMD technique to further improve the performance. SIMD represents single instruction multiple data. Since the ciphertext is much larger than the plaintext, so we batch multiple elements into one ciphertext. u + v = W•xc; Note: u, v, and W•xc are independent of x. <u,v,xc > generated/stored in a precomputation phase

Skip to performance Oblivious linear transformation Now, this tricky part can be easily calculated by u plus v. Then the server just outputs this part to the next layer and the client outputs this part.

Skip to performance Oblivious linear transformation We re-write them as y^s and y^c. Now, we have seen how to convert linear transformations into oblivious forms.

Recall: use secret sharing for oblivious computation
Skip to performance Recall: use secret sharing for oblivious computation client & server have shares and s.t. That is, at the beginning of each layer, each party will hold a share which is indistinguishable from random to that party, and which: if added together, can recover the original input. … They will do the same thing for *all* layers and combine the shares in the end. This guarantees that if we can securely evaluate each layer, the *combination of all layers* will be secure as well. Next, I will show how to do this for each layer. In other words, I will show how to do this for linear transformation, activation function and pooling operation. client & server have shares and s.t.

Oblivious activation/pooling functions
Skip to performance Oblivious activation/pooling functions Piecewise linear functions e.g., ReLU: Oblivious ReLU: easily computed obliviously by a garbled circuit Next, let’s see how to do this for activation functions and pooling operations which are typically non-linear. We classify all such functions into two categories.

Oblivious activation/pooling functions
Skip to performance Oblivious activation/pooling functions Smooth functions e.g., Sigmoid: Oblivious sigmoid: approximate by a piecewise linear function then compute obliviously by a garbled circuit empirically: ~14 segments sufficient As we all know, it can be approximated by high-degree polynomials using Taylor series. However, it involves a large performance overhead and accuracy loss. Instead, we approximate it by a piecewise linear function. We first divided it into multiple segments That means given an input we first find the segment it belongs to and then calculate the corresponding garbled circuit and multiply with one. For other segments, we also calculate the garbled circuit but multiply with zero. Then we add together the results for all segments.

Combining the final result
They can jointly calculate max(y1,y2) (for minimizing information leakage)

Recall: use secret sharing for oblivious computation
Now, we have seen how to compute each layer obliviously

Performance (for single queries)
Model Latency (s) Msg sizes (MB) Loss of accuracy MNIST/Square 0.4 (+ 0.88) 44 (+ 3.6) none CIFAR-10/ReLU 472 (+ 72) 6226 (+ 3046) none PTB/Sigmoid 4.39 (+ 13.9) 474 (+ 86.7) Less than 0.5% (cross-entropy loss) We first want to do a head-to-head comparison with CryptoNets. We transform the same model trained from the MNIST dataset using square as the activation function. It uses 1.3 seconds for a single prediction, which is 300 times faster than CryptoNets. If we don‘t include the precomputaion time, it is 700 times faster. In addition, we want to show that our method can be applied to more challenging datasets. We trained a more complex model from the CIFAR-10 dataset. It uses ReLU as the activation function and consists of 20 layers. Even though the cost is high, but as far as we know, this is the first attempt for transforming such a complex network. We also trained a model from the Penn Treebank (PTB) dataset, which is a standard dataset for language modeling. The goal of the model is to predict the next words in a sentence given the previous words. We used Sigmoid as the activation function to train the model. During prediction, we replace sigmoid with our approximated version, and the cross-entropy loss only increase with 0.5%. Pre-computation phase timings in parentheses PTB = Penn Treebank

Skip to End MiniONN pros and cons x faster than CryptoNets Can transform any given neural network to its oblivious variant Still ~1000x slower than without privacy Server can no longer filter requests or do sophisticated metering Assumes online connectivity to server Reveals structure (but not params) of NN

Using a client-side TEE to vet input
5. MiniONN protocol + “Input/Metering Certificate” 4. Input, “Input/Metering Certificate” 3. Input 1. Attest client’s TEE app 2. Provision filtering policy MiniONN + policy filtering + advanced metering

Using a client-side TEE to run the model
5. “Metering Certificate” 4. Predictions + “Metering Certificate” 3. Input 1. Attest client’s TEE app 2. Provision model configuration, filtering policy MiniONN + policy filtering + advanced metering + disconnected operation + performance + better privacy - harder to reason about model secrecy

Using a server-side TEE to run the model
3. Provision model configuration, filtering policy 1. Attest server’s TEE app 1. Attest server’s TEE app 4. Prediction 2. Input MiniONN + policy filtering + advanced metering - disconnected operation + performance + better model secrecy - harder to reason about client input privacy

ML is very fragile in adversarial settings
MiniONN: Efficiently transform any given neural network into oblivious form with no/negligible accuracy loss Try at: Trusted Computing can help realize improved security and privacy for ML ML is very fragile in adversarial settings

Conclusions Cloud-assisted services raise new security/privacy concerns But naïve solutions may conflict with privacy, usability, deployability, … Cloud-assisted malware scanning Carousel approach is promising Generalization to privacy-preserving ML predictions [TLPEPA17] Circle Game, ASIACCS 2017 [LJLA17] MiniONN, ACM CCS 2017

Securing Cloud-assisted Services

Similar presentations

Presentation on theme: "Securing Cloud-assisted Services"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Securing Cloud-assisted Services

Similar presentations

Presentation on theme: "Securing Cloud-assisted Services"— Presentation transcript:

Similar presentations

About project

Feedback