Maria Méndez Real, Vincent Migliore, Vianney Lapotre, Guy Gogniat

Slides:



Advertisements
Similar presentations
Secure Virtual Machine Execution Under an Untrusted Management OS Chunxiao Li Anand Raghunathan Niraj K. Jha.
Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
1 Hardware Support for Isolation Krste Asanovic U.C. Berkeley MURI “DHOSA” Site Visit April 28, 2011.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Secure In-VM Monitoring Using Hardware Virtualization Monirul Sharif, Wenke Lee, Weidong Cui, and Andrea Lanzi Presented by Tyler Bletsch.
Virtualization and Cloud Computing. Definition Virtualization is the ability to run multiple operating systems on a single physical system and share the.
Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)
LAB-STICC CNRS UMR 3192 – UBS – ROMAIN VASLIN – CRYPTARCHI 2008 Memory Security Management for FPGA-based Embedded system Romain Vaslin, Guy Gogniat, Jean-Philippe.
Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University.
1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Project supported by YESS 2009 Young Engineering Scientist Symposium « Identity Management » Cryptography for the Security of Embedded Systems Ambient.
Copyright 2013, Toshiba Corporation. DAC2013 Designer/User Track Scalability Achievement by Low-Overhead, Transparent Threads on an Embedded Many-Core.
TitleEfficient Timing Channel Protection for On-Chip Networks Yao Wang and G. Edward Suh Cornell University.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
Improving Network I/O Virtualization for Cloud Computing.
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
m-Privacy for Collaborative Data Publishing
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Many-SC Project Runtime Environment (RTE) CSAP Lab 2014/10/28.
A.SATHEESH Department of Software Engineering Periyar Maniammai University Tamil Nadu.
Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇 老師 系所 : 碩光通一甲 姓名 : 吳秉謙 學號 :
Milestones, Feedback, Action Items Power Aware Distributed Systems Kickoff August 23, 2000.
R ECONFIGURABLE SECURITY SUPPORT FOR EMBEDDED SYSTEMS 1 AKSHATA VARDHARAJ.
An Integrated Framework for Dependable and Revivable Architecture Using Multicore Processors Weidong ShiMotorola Labs Hsien-Hsin “Sean” LeeGeorgia Tech.
Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.
m-Privacy for Collaborative Data Publishing
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Optimization of Time-Partitions for Mixed-Criticality Real-Time Distributed Embedded Systems Domițian Tămaș-Selicean and Paul Pop Technical University.
MPSoCSim extension: An OVP Simulator for the Evaluation of Cluster-based Multi and Many-core architectures Philipp Wehner, Jens Rettkowski and Diana Göhringer.
Covert Channels Through Branch Predictors: a Feasibility Study
vCAT: Dynamic Cache Management using CAT Virtualization
University of Maryland College Park
Development of an Embedded Platform for Secure CPS Services
Architecture and Algorithms for an IEEE 802
CPU SCHEDULING.
Introduction to Load Balancing:
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
The Multikernel: A New OS Architecture for Scalable Multicore Systems
New Cache Designs for Thwarting Cache-based Side Channel Attacks
Architecture Background
Hardware Support for Embedded Operating System Security
Bruhadeshwar Meltdown Bruhadeshwar
Chapter 6: CPU Scheduling
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
Model-Driven Analysis Frameworks for Embedded Systems
Department of Computer Science University of California, Santa Barbara
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Chapter 6: CPU Scheduling
Multiprocessor and Real-Time Scheduling
Operating System Introduction.
Shielding applications from an untrusted cloud with Haven
Adaptive Data Refinement for Parallel Dynamic Programming Applications
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
University of Illinois at Urbana-Champaign
Module 5: CPU Scheduling
EdgeWise: A Better Stream Processing Engine for the Edge
Presentation transcript:

Dynamic Spatially Isolated Secure Zones for NoC-based Many-core Accelerators Maria Méndez Real, Vincent Migliore, Vianney Lapotre, Guy Gogniat Lab-STICC Université de Bretagne-Sud Lorient FRANCE Philipp Wehner, Diana Göhringer Ruhr-University Bochum, GERMANY

Context: NoC-based many-core accelerator V: Victim A: Attacker A controller is in charge of the deployment of the applications on the accelerator Trusted and untrusted processes executing in parallel sharing resources -> cache-based Side-Channel Attacks (SCA)

Context: Cache-based Side-Channel Attacks Cache-based Side-Channel Attacks due to cache sharing Principle: Caches are seen as leakage channels The attacker behaves as a normal process Analyzes its own activity Determine cache lines or sets accessed by the victim Deduce sensitive information Various implementations AES, RSA, ECC Different architectures (Intel, AMD, ARM) [2] [2] “Y. Yarom, et al., “FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack”, in the 23th USENIX Security Simposium, 2014.

Countermeasures against access-driven attacks Application specific Software countermeasures: Changing the implementation of cryptographic algorithms [3] Hardware countermeasures: Disabling cacheability Flushing the cache after each context switch [4] Changing the cache design -> Partitioned cache [5] Two separate virtual worlds on the same processor [6] Too expensive -> not doable Solution at the processor level only [3] J. Blomer and V. Krummel, “Analysis of Countermeasures Against Access Driven Cache Attacks on AES,” Selected Areas in Cryptography,vol. 4876, pp. 96–109, 2007. [4] Guanciale, et al., “Cache Storage Channels: Alias-Driven Attacks and Verified Countermeasures,” in IEEE Symposium on Security and Privacy, 2016. [5] Wang and R. B. Lee, “New Cache Designs for Thwarting Software Cache-based Side Channel Attacks,” in IEEE Symposium on Computer Architecture (ISCA), 2007, pp. 494–505. [6] www.arm.com/products/processors/technologies/trustzone/

Spatial isolation for sensitive applications The NoC is secured How can this be achieved? Expected under utilization of resources, how can the performance overhead be evaluated?

Implementation A dedicated processor behaves as a controller of the platform Evaluation of different deployment strategies for the controller Scheduling Monitoring of the platform state Resource allocation algorithms Secure zones creation strategies SW controller is trusted [7] M.Méndez Real et al., “ALMOS many-core operating system extension with new secure-enable mechanisms for dynamic creation of secure zones,” in Proc. of Euromicro international Conference on Parallel, Distributed and Network-Based Processing (PDP), 2016.

Secure-enable mechanisms Scheduling: Round Robin Monitoring: Quaternary decision tree structure Task mapping: Leverage the data accesses locality C 1 2 C 2 1 c0 c1 c2 c3 M : Memory utilization rate U : Processor utilization S : Secure zone dedicated M U S c0 c2 total C c1 c3

Secure-enable mechanisms Isolated application -> secure zone creation: Static SZ size C 1 2 Idle cluster Active cluster Secure zone cluster M : Memory utilization rate U : Processor utilization S : Secure zone dedicated C 2 1 c0 c1 c2 c3 M U S c0 c2 total C c1 c3

Secure-enable mechanisms Isolated application -> secure zone creation: Static SZ size Dynamic SZ size C 1 2 Idle cluster Active cluster Secure zone cluster M : Memory utilization rate U : Processor utilization S : Secure zone dedicated C 2 1 c0 c1 c2 c3 M U S c0 c2 total C c1 c3

Experimental setup Virtual prototyping: OVP-based MPSoCSim [8] Experimental protocol: 4x4 clusters architecture (60 MB + 1 ARM) Matrix multiplications, 17 parallel tasks for each multiplication 5 applications -> 85 tasks running in parallel Different deployment strategies: Secure zones of fixed size Secure zones with dynamic size Different execution scenarios: Baseline scenario One isolated application arriving at first One isolated application at the middle of the execution Three isolated applications Maximum parallelism (5 clusters) Limited resources (4 clusters) [8] M.Méndez Real et al., “MPSoCSim extension: An OVP Simulator for the Evaluation of Cluster-based Multicore and Many-core architectures,” in Proc. Workshop on Virtual Prototyping of Parallel and Embedded Systems (ViPES) as part of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XV), 2016.

Negligible overhead when no load Results Total execution time normalized to the baseline scenario: Up to 30% overhead Negligible overhead when no load b. One isolated application with the highest priority c. One isolated application with a medium priority d. Three isolated applications

Results Exec. time of non isolated, isolated application (in msec.) and in average: The static (5 clusters) strategy leverages the performance of isolated applications 1. Static SZ size (5 clusters) 2. Static SZ size (4 clusters) 3. Dynamic SZ size The dynamic strategy leverages the performance of non isolated applications b. One isolated application with the highest priority c. One isolated application with a medium priority d. Three isolated applications

The dynamic strategy achieves the highest resource utilization rate Results Resources utilization rate: 1. Static SZ size (5 clusters) 2. Static SZ size (4 clusters) 3. Dynamic SZ size The dynamic strategy achieves the highest resource utilization rate

Conclusion and future work Spatial isolation of sensitive applications VS cache-based SCA Evaluation on different execution scenarios with different strategies through virtual prototyping According to the deployment strategy, isolated or/and non isolated applications performance is leveraged Future work: NoC protection

Thank you for your attention!