1 RAMP Jan’08 Raksha & Atlas: Prototyping & Emulation at Stanford Christos Kozyrakis work done by S. Wee, N. Njoroge, M. Dalton, H. Kannan Computer Systems.

Slides:



Advertisements
Similar presentations
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Advertisements

Using emulation for RTL performance verification
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Using Instruction Block Signatures to Counter Code Injection Attacks Milena Milenković, Aleksandar Milenković, Emil Jovanov The University of Alabama in.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
Types of Parallel Computers
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
RAKSHA A Flexible Information Flow Architecture for Software Security Michael Dalton Hari Kannan Christos Kozyrakis Computer Systems Laboratory Stanford.
Continuously Recording Program Execution for Deterministic Replay Debugging.
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
1 Building with Assurance CSSE 490 Computer Security Mark Ardis, Rose-Hulman Institute May 10, 2004.
Ritu Varma Roshanak Roshandel Manu Prasanna
ATLAS (a.k.a. RAMP Red) Parallel Programming with Transactional Memory Njuguna Njoroge and Sewook Wee Transactional Coherence and Consistency Computer.
Microkernels: Mach and L4
GSRC Annual Symposium Sep 29-30, 2008 Full-System Chip Multiprocessor Power Evaluations Using FPGA-Based Emulation Abhishek Bhattacharjee, Gilberto Contreras,
ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.
RAKSHA A Flexible Information Flow Architecture for Software Security Michael Dalton Hari Kannan Christos Kozyrakis Computer Systems Laboratory Stanford.
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
1 Hardware Security Mechanisms Krste Asanovic U.C. Berkeley August 20, 2009.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Microsoft Research Faculty Summit Panacea or Pandora’s Box? Software Transactional Memory Panacea or Pandora’s Box? Christos Kozyrakis Assistant.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
Using FPGAs for Systems Research Successes, Failures, and Lessons Using FPGAs for Systems Research Successes, Failures, and Lessons Jared Casper, Michael.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 25: May 27, 2005 Transactional Computing.
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
DARPA Jul A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
G53SEC 1 Reference Monitors Enforcement of Access Control.
Zeldovich et al. (both papers) Reading Group by Theo.
Instrumentation in Software Dynamic Translators for Self-Managed Systems Bruce R. Childers Naveen Kumar, Jonathan Misurda and Mary.
Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Wireless and Mobile Security
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Full and Para Virtualization
DATA COMPROMISE Controlling the flow of sensitive electronic information remains a major challenge, ranging from theft to accidental violation of policies.
Detecting Atomicity Violations via Access Interleaving Invariants
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
April Thesis Defense Talk ATLAS Software Development Environment for Hardware Transactional Memory Sewook Wee Computer Systems Lab Stanford University.
(1) SIMICS Overview. (2) SIMICS – A Full System Simulator Models disks, runs unaltered OSs etc. Accuracy is high (e.g., pollution effects factored in)
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Translation Lookaside Buffer
Constructing a system with multiple computers or processors
Lecture 19: Transactional Memories III
Chapter 4: Threads.
Constructing a system with multiple computers or processors
Translation Lookaside Buffer
Constructing a system with multiple computers or processors
Architectural Support for OS
Operating System Introduction.
Co-designed Virtual Machines for Reliable Computer Systems
Architectural Support for OS
Types of Parallel Computers
Presentation transcript:

1 RAMP Jan’08 Raksha & Atlas: Prototyping & Emulation at Stanford Christos Kozyrakis work done by S. Wee, N. Njoroge, M. Dalton, H. Kannan Computer Systems Laboratory Stanford University

2 RAMP Jan’08 Outline  Raksha  prototyping security architectures Raksha goals Generations of Raksha prototypes Experience & lessons  Atlas  emulating transactional memory architectures Atlas goals Architecture overview New programmability features Experience & lessons

3 RAMP Jan’08 Raksha Goals  Architectural support for software security 1. Protect existing software from attacks  Prevent buffer overflows, SQL injections, …  Based on dynamic information flow tracking (DIFT) 2. Reduce trusted code base (TCB) for new software  Simplify design & verification of security guarantees  Using word-granularity protection on physical memory  Robust, flexible, practical, end-to-end, fast

4 RAMP Jan’08 Raksha Architecture, Version 1 Policy Decode Tag ALU Tag Check PCPC DecodeD-CacheRegFile ALU I-CacheTraps WBWB  Modified Sparc V8 processor (Leon) 4 programmable security policies using 4-bits/word User-level handling of security exceptions +7% logic, +0% clock cycle time over base design  Full Linux distribution with > 120 software packages  1 st DIFT architecture to detect high-level attacks on binaries Have shared this design with 3 other institutions so far…

5 RAMP Jan’08 Raksha Architecture, Version 2  Small off-core coprocessor for all DIFT functionality + state Can be reused across multiple chips  Requires minimal changes to main processor core <1% for our Sparc V8 processor  Same security features as original architecture 8% performance overhead for SpecInt2000 Processor Core I CacheD Cache ROB Policy Decode Tag ALU Tag Check Tag Cache Tag RF WBWB DIFT Coprocessor PC, Inst, Address Security exception L2 Cache

6 RAMP Jan’08 Raksha Architecture, Version 3 (Loki)  Supports fine-grain permission check on physical memory All words associated with a 32-bit tag Permission table provides access rights for different tags Trusted SW specifies permissions; HW enforces them  Independently from OS; checks on device accesses as well  Reduces TCB of a full OS down to 5KLOC Invariant: malicious user/kernel code cannot access data without permission Virtual memory & all device drivers outside of the TCB PCPC Decode D-Cache RegFile ALU I-Cache Traps WBWB I-TLB P-cache D-TLB P-cache Check

7 RAMP Jan’08 Experience & Lessons  HW: a stable starting point is critical Despite deficiencies, Leon has been a reasonable base Good compromise of size, performance, flexibility, support  Even for ISA-level research Can we match this with upcoming RAMP models?  SW: full system is important (full OS + devices) Enables experimentation with wide range of apps Increases credibility of results What is the OS story for RAMP models?  System: need low-cost board option Makes it easier to attract collaborators & disseminate design What is the replacement plan for XUPv5?

8 RAMP Jan’08 Repeat outline  Raksha  prototyping security architectures Raksha goals Generations of Raksha prototypes Experience & lessons  Atlas  emulating transactional memory architectures Atlas goals Architecture overview New programmability features Experience & lessons

9 RAMP Jan’08 Atlas Goals  Fast: at speed experiments with hardware TM ~100x faster than simulator  Comfortable: full-system environment Full Linux OS Integration with standard debugging tools  Easy-to-use: rich support for programmability Automatic detection of performance bottlenecks Deterministic replay Automatic detection of atomicity bugs

10 RAMP Jan’08 ATLAS Hardware Architecture  9-way CMP with hardware support for TM TM support builds upon private caches & coherence protocol One processor dedicated for system code Uses hardcore PowerPC codes in user & control FPGAs in BEE2 TCC PPC 0 TCC PPC 1 I/O Linux PPC TCC PPC 2 TCC PPC 3 TCC PPC 4 TCC PPC 5 TCC PPC 6 TCC PPC 7 Control Switch Main Memory User Switch

11 RAMP Jan’08 ATLAS Software Architecture Application (OpenMP+TM) TM APIATLAS Profiler ATLAS Runtime System Linux OS ATLAS HW on BEE2  High-level application development OpenMP + TM, (Java + TM), …  High-level application debugging Gdb based for common & new features (e.g., infinite watchpoints)

12 RAMP Jan’08 Deterministic Replay with ReplayT  A critical tool for multiprocessor debugging Small system variations can mask bugs  ReplayT: record & replay transaction commit order Sufficient for TCC’s “all transaction, all the time” execution model  Serializable commit order captures all thread interactions Minimal runtime & space overhead (1 byte/transaction) Logging phaseReplay phase Commit time LOG: T0 T1 T2 write-set T0 T1 T2 Commit protocol replays logged commit order T0 T1T2 ComputationArbitrationCommitAbort

13 RAMP Jan’08 ReplayT Runtime Overhead (logging phase)  Average slowdown is 1.05%  Can continuously log on production runs

14 RAMP Jan’08 ReplayT Extensions  Unique replay Problem: maximize usefulness of test runs Approach: shuffle commit order to generate unique scenarios  Replay with monitoring code Problem: replay accuracy after recompilation Approach: faithfully repeat commit order if binary changes  E.g., printf statements inserted for monitoring purposes  Cross-platform replay Problem: debugging on multiple platforms Approach: support for replaying log across platforms & ISAs

15 RAMP Jan’08 Atomicity Bug Detection  Problem: user breaks an atomic task as two transactions Hard to pinpoint problem even with replay  The AVIO proposal [Lu et ASPLOS’06] Unserializable access interleavings are likely bugs Whitelist unserializable interleavings from correct runs  Performed during application testing AVIO challenges  Long & intrusive data collection phase  Long analysis phase  Corner cases (false positives & false negatives)

16 RAMP Jan’08 Atomicity Bug Detection on ATLAS  Based on the general approach of AVIO but Fast & non-intrusive data collection  Single log for each address accessed in transaction  Log collected during deterministic replay Fast analysis  Interleavings examined at transaction granularity More accurate analysis  Eliminated false-negatives due to intermediate writes

17 RAMP Jan’08 Experience & Lessons  HW: need multiple grades of hardware modeling Enable fast prototyping of new ISA & HW features  Even if timing or other details not exactly accurate Atlas experience: 40+ tutorial participants enjoyed using new features in a timing “inaccurate” system  SW: full system is important (full OS + devices) Enables experimentation with wide range of apps  System: need low-cost board option Makes it easier to attract collaborators & disseminate design  Scalability: need access to multiple boards Students will not scale design until 2 nd board arrives   ISA: unfortunately, the key to more sharing of HW & SW models Difficult to share across ISAs due to differences in specification, interfaces, etc Should RAMP simply adapt Sparc?

18 RAMP Jan’08 Questions?