12/02/14 Chet Douglas, DCG Crystal Ridge PE SW Architecture

Slides:



Advertisements
Similar presentations
Bus Specification Embedded Systems Design and Implementation Witawas Srisa-an.
Advertisements

Homework Reading Machine Projects Labs
Chapter Three: Interconnection Structure
Computer Organization and Architecture
Microprocessor and Microcontroller
I/O Unit.
CS-334: Computer Architecture
Xilinx Public System Interfaces & Caches RAMP Retreat Austin, TX June 2009.
1 InfiniBand HW Architecture InfiniBand Unified Fabric InfiniBand Architecture Router xCA Link Topology Switched Fabric (vs shared bus) 64K nodes per sub-net.
TECH CH03 System Buses Computer Components Computer Function
I/O Subsystem Organization and Interfacing Cs 147 Peter Nguyen
The SNIA NVM Programming Model
9/20/6Lecture 3 - Instruction Set - Al Hardware interface (part 2)
1 I/O Management in Representative Operating Systems.
Module I Overview of Computer Architecture and Organization.
CS-334: Computer Architecture
COMP201 Computer Systems Exceptions and Interrupts.
Chapter 10: Input / Output Devices Dr Mohamed Menacer Taibah University
Exercise 2 The Motherboard
Spring EE 437 Lillevik 437s06-l8 University of Portland School of Engineering Advanced Computer Architecture Lecture 8 Project 3: memory agent Programmed.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
Architecture Examples And Hierarchy Samuel Njoroge.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
I/O Example: Disk Drives To access data: — seek: position head over the proper track (8 to 20 ms. avg.) — rotational latency: wait for desired sector (.5.
Chapter 2 The CPU and the Main Board  2.1 Components of the CPU 2.1 Components of the CPU 2.1 Components of the CPU  2.2Performance and Instruction Sets.
Top Level View of Computer Function and Interconnection.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
CS 342 – Operating Systems Spring 2003 © Ibrahim Korpeoglu Bilkent University1 Input/Output CS 342 – Operating Systems Ibrahim Korpeoglu Bilkent University.
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 30 – PC Architecture.
Guangdeng Liao, Xia Zhu, Steen Larsen, Laxmi Bhuyan, Ram Huggahalli University of California, Riverside Intel Labs.
EEE440 Computer Architecture
Advanced x86: BIOS and System Management Mode Internals Boot Process Xeno Kovah && Corey Kallenberg LegbaCore, LLC.
D75P 34 – HNC Computer Architecture Week 8 Direct Memory Access. © C Nyssen/Aberdeen College 2003 All images © C Nyssen/Aberdeen College except where stated.
UDI Advanced Topics DMA and Interrupts Robert Lipe UDI Development Team Lead
Local and Remote byte-addressable NVDIMM High-level Use Cases
August 04, 2004John Carrier, Adaptec1 One-Shot STags John Carrier Adaptec.
BCS361: Computer Architecture I/O Devices. 2 Input/Output CPU Cache Bus MemoryDiskNetworkUSBDVD …
Multiple Interrupts Each interrupt line has a priority Higher priority lines can interrupt lower priority lines If bus mastering only current master can.
Computer Systems Unit 2. Download the unit specification from moodle or the BTEC website Or alternatively visit ahmedictlecturer.wikispaces.com.
Computer Architecture Chapter (5): Internal Memory
2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved. RDMA with PMEM Software mechanisms for enabling access to remote persistent.
Computer Architecture. Top level of Computer A top level of computer consists of CPU, memory, an I/O components, with one or more modules of each type.
Redmond Protocols Plugfest 2016 Tom Talpey SMB3 Extensions for Low Latency Storage Architect, Windows File Server.
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
Nios II Processor: Memory Organization and Access
PHyTM: Persistent Hybrid Transactional Memory
Persistent Memory over Fabrics
Accelerating Large Charm++ Messages using RDMA
CS 286 Computer Organization and Architecture
Chapter 3 Top Level View of Computer Function and Interconnection
RDMA Extensions for Persistency and Consistency
Chet Douglas – DCG NVMS SW Architecture
Memory hierarchy.
Short Circuiting Memory Traffic in Handheld Platforms
Hardware-Level Performance Analysis of Platform I/O
NVMe.
William Stallings Computer Organization and Architecture 7th Edition
Presentation transcript:

12/02/14 Chet Douglas, DCG Crystal Ridge PE SW Architecture RDMA with byte-addressable PM RDMA Write Semantics to Remote Persistent Memory An Intel Perspective when utilizing Intel HW 12/02/14 Chet Douglas, DCG Crystal Ridge PE SW Architecture

RDMA with DRAM – Intel HW Architecture ADR – Asynchronous DRAM Refresh Allows DRAM contents to be saved to NVDIMM on power loss ADR Domain – All data inside of the domain is protected by ADR and will make it to NVM before supercap power dies. The integrated memory controller is currently inside of the ADR Domain. IIO – Integrated IO Controller Controls IO flow between PCIe devices and Main Memory Contains internal buffers that are backed by LLC cache. “Allocating write transactions” from the PCI Root Port will utilize internal buffers backed by LLC core cache. Data in internal buffers naturally aged out of cache in to main memory Enable/Disable via BIOS setting per Root PCI Port DDIO – Data Direct IO Allows Bus Mastering PCI & RDMA IO to move data directly in/out of LLC Core Caches Enable/Disable at platform level via BIOS setting ADR Domain MAIN Memory CPU iMC IIO Internal BUFFERS LLC CORE DDIO Allocating Write Transactions CORE CORE PCI Root Port CORE PCI Func PCI Func RNIC PCI BM DMA Flow RNIC RDMA Flow PCI Func PCI Func DDIO ON Flow DDIO OFF Flow

RDMA with byte-addressable PM – Intel HW Architecture Short Term NVM Considerations With ADR, No DDIO Disable DDIO Requires BIOS Enabling Enable “non-allocating Write” transactions for Root PCI Port to IIO Forces RDMA Write data directly to iMC Enable on PCI Root Port with RNIC Follow RDMA Write(s) with RDMA Read to force remaining IIO buffer write data to ADR Domain Since RDMA Write and Read are silent, there is little or no change to the SW on the node supplying the Sink buffers for RDMA Write ADR Domain NVM CPU iMC IIO Internal BUFFERS LLC CORE DDIO Non-Allocating Write Transactions CORE CORE PCI Root Port CORE RNIC RNIC RDMA Write Flow RNIC RDMA Read Flow RDMA Write Data forced to ADR Domain by RDMA Read Flow Write Data forced to persistence by ADR Flow

RDMA with byte-addressable PM – Intel HW Architecture Short Term NVM Considerations Without ADR, No DDIO Disable DDIO Requires BIOS Enabling Enable “non-allocating Write” transactions for Root PCI Port to IIO Forces RDMA Write data directly to iMC Enable on PCI Root Port with RNIC Follow RDMA Write(s) with RDMA Read to force remaining IIO buffer write data to ADR Domain Follow RDMA Read with Send/Receive to get callback to force write data in the iMC to become persistent ISA - PCOMMIT/SFENCE – Flush iMC and make data persistent ADR Domain NVM CPU iMC IIO Internal BUFFERS LLC CORE DDIO Non-Allocating Write Transactions CORE CORE PCI Root Port CORE RNIC RNIC RDMA Write Flow RNIC RDMA Send/Receive Flow RDMA Write Data forced to iMC by Send/Receive Flow Send/Receive Callback PCOMMIT/SFENCE Flow

RDMA with byte-addressable PM – Intel HW Architecture Short Term NVM Considerations Without ADR, With DDIO Use standard “allocating Write” transactions for Root PCI Port to IIO Follow RDMA Write(s) with Send/Receive to get local callback to force write data from CPU Cache in to the iMC and to make write data in the iMC persistent Send/Receive will contain list of cache lines that were written ISA – CLFLUSHOPT/SFENCE – Flush CPU cache lines and wait for flush to complete (invalidates cache contents). The list of cache lines from the Send message is used to identify the cache lines that need to be flushed. ISA - PCOMMIT/SFENCE – Flush iMC and make data persistent Internal IIO buffers will be flushed as part of CLFLUSHOPT allowing “allocating writes” to be used. ADR Domain NVM CPU iMC IIO Internal BUFFERS LLC CORE DDIO Allocating Write Transactions CORE CORE PCI Root Port CORE RNIC RDMA Write Flow RNIC RNIC RDMA Send/Receive Flow RDMA Write Data forced to iMC by Send/Receive Flow Send/Receive Callback CLFLUSHOPT/SFENCE Flow Send/Receive Callback PCOMMIT/SFENCE Flow

RDMA with byte-addressable PM – Intel HW Architecture Long Term NVM Considerations Just ideas at this point…. ADR HW: Increase ADR Domain to include LLC and IIO Internal Buffers IIO HW: Make HW aware of persistent memory ranges If PCI Read is required, automate read at end of RDMA Write(s), how to indicate end of write(s), hold off last write completion until read complete With ADR: Force write data to iMC before completing write transaction Utilize new transaction type to flush list of persistent memory regions to iMC before completing new transaction Without ADR: Force write data to iMC and then to persistence before completing write transaction Utilize new transaction type to flush list of persistent memory regions to iMC and then to persistence before completing new transaction DDIO HW: Make HW aware of persistent memory ranges and enable DDIO for DRAM and disable for persistent memory transactions on the fly