2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Device Virtualization Architecture
Virtualization Technology
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
May 7, A Real Problem  What if you wanted to run a program that needs more memory than you have?
1 A Real Problem  What if you wanted to run a program that needs more memory than you have?
The Multikernel: A new OS architecture for scalable multicore systems Andrew Baumann et al CS530 Graduate Operating System Presented by.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
G Robert Grimm New York University Disco.
DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Chapter 13 Embedded Systems
Figure 1.1 Interaction between applications and the operating system.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
ESA UNCLASSIFIED – For Official Use Deterministic Communication with SpaceWire Martin Suess CCSDS Spring Meeting /03/2015.
ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Xen I/O Overview.
Multicore In Real-Time Systems – Temporal Isolation Challenges Due To Shared Resources Ondřej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
IO Memory Management Hardware Goes Mainstream
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 2 (Part II) Operating System Overview.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
1 Choices “Our object-oriented system architecture embodies the notion of customizing operating systems to tailor them to support particular hardware configuration.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Full and Para Virtualization
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Chapter 2 Operating System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Chapter 6 Storage and Other I/O Topics. Chapter 6 — Storage and Other I/O Topics — 2 Introduction I/O devices can be characterized by Behaviour: input,
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Virtualizing a Multiprocessor Machine on a Network of Computers Easy & efficient utilization of distributed resources Goal Kenji KanedaYoshihiro OyamaAkinori.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Deterministic Communication with SpaceWire
Xen and the Art of Virtualization
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Derek Chiou The University of Texas at Austin
Outline Midterm results summary Distributed file systems – continued
High Performance Computing
CSC3050 – Computer Architecture
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Chapter 13: I/O Systems.
Cache writes and examples
Design.
Presentation transcript:

2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational Science and Engineering George Kornaros Marcello Coppola 曾冠維

Outline Introduction System architecture I/OMMU architecture Conclusions

Outline Introduction System architecture I/OMMU architecture Conclusions

motivation A virtualization-ready SoC platform must support the necessary extensions across the HW/SW stack: applications, programming model, hypervisor and hardware platform. They present the main hardware extensions and architecture of a heterogeneous multicore embedded system supporting virtualization.

vIrtical vIrtical , a system platform architecture is developed towards a heterogeneous Soc target. Three on-chip-networks to implement cache-coherent. Using specialized I/O memory management unit(IOMMU).

Outline Introduction System architecture I/OMMU architecture Conclusions

Host Processor: ARM’s big.LITTLE 2017/4/21 Host Processor: ARM’s big.LITTLE 1. Switch or migration mode (Interrupt controller) 2. MP mode

Memory NoC The CCI-400 component implements the AMBA4 ACE protocol (AXI Coherency Extensions) which provides a framework for system level coherence. It implements distributed virtual memory (DVM) mechanisms, useful to support virtualization. The ACE protocol permits cached copies of the same memory location to reside in the local cache of one or more master components.

System NoC: Spidergon STNoC 2017/4/21 System NoC: Spidergon STNoC Customized packet-switched communication architectures Switching, flow control, arbitration and buffering schemes will be inherited from the STNoC architecture. In order to support coherence, STNoC will also transport coherence messages, by encapsulating ACE protocol transactions.

Spidergon NOC (STNoC) The Spidergon network connects a generic even number of nodes N as a bi- directional ring in both clockwise, and anti-clockwise directions with in addition a cross connection for each couple of nodes. A low cost architecture and flexible

Hardware Accelerator A SMP multicore host processor is coupled to accelerators of different kinds: GPU-like general-purpose programmable many-cores (GPPA). Different types of Hardware Processing Units (HWPU). The host processor can leverage to offload data-intensive computational kernels, achieving significant speedup.

The vIrtical system architecture accelerator

Outline Introduction System architecture I/OMMU architecture Conclusions

I/OMMU architecture IOMMU functionality must focus on translating the virtual address space of fully virtualized guest devices to a global physical address space. This address translation is implemented efficiently using a paging scheme supported by an associated I/O transaction look aside buffer (I/O TLB).

IOMMU Internal Organization

Command Processing Engine (CPE) The Command Processing Engine (CPE) responsible for interfacing and dispatching incoming commands and performing IOMMU component configuration.

I/O translation look-aside buffer (IOTLB) The I/O translation look-aside buffer (IOTLB) which accelerates page translation of DMA addresses by avoiding expensive remote loading of page table entries.

The Memory Page Table Walker (MPTW) The Memory Page Table Walker (MPTW) which accesses system memory to perform address translation in case of an IOTLB cache miss.

The IOMMU Device and Domain Table (IODT) The IOMMU Device and Domain Table (IODT) which contains configuration data for each device in order to provide proper protection for incoming translation requests.

Virtual Machine and Guest OS Protection Protection mechanisms: Multiple isolated domains can be supported by ensuring that all I/O devices are assigned to some domain (possibly a default domain), and that they can access only physical resources allocated to this domain.

Virtual Machine and Guest OS Protection(cont.) Two data structures maintain the information contained in the IODT: (i) I/O Device Table Control (IODTC) (ii) I/O Device Table Domain table(IODTD) The Device Table Control (IODTC) is indexed using a 4-bit wide Device ID. The IODTD contains 256 entries per device.

IOMMU device control and domain table VM:Virtual Machine AS: Application Space

The Hardware Monitoring Unit (HMU) The Hardware Monitoring Unit (HMU) includes agents that provide custom circuitry for monitoring particular events

IOMMU Monitoring Unit Particular event related to: (i) internal IOMMU activity.(counter statistics and error logs) . (ii) interface transactions(AXI bus). These agents can be used to estimate key performance metrics, e.g. by analyzing memory access latency structure, throughput and resource utilization, and help optimize the IOMMU architecture by introducing static configuration.

The Device Discovery Unit (DDU) The Device Discovery Unit (DDU) is responsible for establishing communication with a newly connected device by exchanging identification .

The Interrupt Unit (INTR) The Interrupt Unit (INTU) is in charge of generating interrupts in the event of system exceptions.

Functional Behavior and Synchronization

Outline Introduction System architecture I/OMMU architecture Conclusions

Conclusion Focusing on hardware-assisted virtualization instead of software. A novel hardware memory management unit (IOMMU) is introduced to map DMA virtual addresses to correct VM’s physical memory locations. High performance supported by a configurable TLB. Enhanced protection by an integrated lightweight hardware monitoring unit.

Thank you for your listening