Derek Chiou The University of Texas at Austin

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Threads, SMP, and Microkernels
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Computer Systems/Operating Systems - Class 8
1 Jan 07 RAMP PI Report: Plans until next Retreat & Beyond Krste Asanovíc (MIT), Derek Chiou (Texas), James Hoe(CMU), Christos Kozyrakis (Stanford), Shih-Lien.
1 RAMP White RAMP Retreat, BWRC, Berkeley, CA 20 January 2006 RAMP collaborators: Arvind (MIT), Krste Asanovíc (MIT), Derek Chiou (Texas), James Hoe (CMU),
© Derek Chiou 1 RAMP-White Derek Chiou and Hari Angepat The University of Texas at Austin Supported in part by DOE, NSF, IBM, Intel, and Xilinx.
G Robert Grimm New York University Disco.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
RAMP-White Hari Angepat Derek Chiou University of Texas at Austin.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
1 Some things we think we learned & the road ahead The RAMPants (as prepared by Mark Oskin) But first, let us thank you for the invaluable feedback you.
PRASHANTHI NARAYAN NETTEM.
Vir. Mem II CSE 471 Aut 011 Synonyms v.p. x, process A v.p. y, process B v.p # index Map to same physical page Map to synonyms in the cache To avoid synonyms,
1 RAMP Breakout 1 Question 3 What are the standard distribution target machines? In what form should they be distributed? or What kind of infrastructure.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational.
CS533 Concepts of Operating Systems Jonathan Walpole.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Infrastructure design & implementation of MIPS processors for students lab based on Bluespec HDL Students: Danny Hofshi, Shai Shachrur Supervisor: Mony.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Kernel Synchronization in Linux Uni-processor and Multi-processor Environment By Kathryn Bean and Wafa’ Jaffal (Group A3)
UDI Technology Benefits Slide 1 Uniform Driver Interface UDI Technology Benefits.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Virtualization.
ARM Embedded Systems
Current Generation Hypervisor Type 1 Type 2.
Microarchitecture.
Lecture 21 Synchronization
William Stallings Computer Organization and Architecture 8th Edition
Andrew Putnam University of Washington RAMP Retreat January 17, 2008
Lecture 24 Virtual Machine Monitors
Grid Computing.
Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch
CS 286 Computer Organization and Architecture
OS Virtualization.
CMSC 611: Advanced Computer Architecture
Parallel and Multiprocessor Architectures – Shared Memory
Multiprocessor Introduction and Characteristics of Multiprocessor
Virtualization Techniques
Combining Simulators and FPGAs “An Out-of-Body Experience”
Chapter 2: The Linux System Part 1
Multiprocessors - Flynn’s taxonomy (1966)
Multiple Processor Systems
Outline Chapter 2 (cont) OS Design OS structure
High Performance Computing
CSC3050 – Computer Architecture
Translation Buffers (TLB’s)
Chapter 4 Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
System calls….. C-program->POSIX call
CSE 471 Autumn 1998 Virtual memory
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Translation Buffers (TLBs)
UNISIM (UNIted SIMulation Environment) walkthrough
Synonyms v.p. x, process A v.p # index Map to same physical page
Chapter 13: I/O Systems.
Review What are the advantages/disadvantages of pages versus segments?
Cluster Computers.
Presentation transcript:

Derek Chiou The University of Texas at Austin RAMP-White Derek Chiou The University of Texas at Austin

High Level Characteristics Coherent distributed shared memory machine Scalable at the same level as other RAMP machines 1K eventual target Intended to be ISA/Architecture independent Use different cores All RAMP efforts are intended to be ISA independent Intended to integrate components from other RAMP participants A testbed for sharing of IP 9/20/2018 RAMP-White

Our Additions New code in Bluespec rather than Verilog/VHDL More configurable That’s what my group is using Embedded PowerPC as one core Leon decided on after we started, just recently boots Debian Wanted to determine issues of different cores My research needs fast cores Eventually an SMP OS, initially multi-OS shared space initially 9/20/2018 RAMP-White

Issues Architecture Implementation Operating System Sharing IP Language Maturity Infrastructure (CVS, etc.) 9/20/2018 RAMP-White

Three Stages (for Implementation Ease) Incoherent shared memory No hardware global cache, just global shared memory support Optimal cache for local memory However, software can maintain coherence if necessary Network virtual memory Run a simulator on top of the processor Ring-based coherence (scalable bus) Requires a coherent cache Running essentially a snoopy protocol True coherence engine not required But, very restricted communication Good for testing, modeling many targets General network-based coherence Requires general coherence engine 9/20/2018 RAMP-White

Generalized Architecture Intersection Unit Network Interface Unit Proc ISA dependent $ Mem MC IU NIU PLB ISA independent OPB bridge 9/20/2018 RAMP-White

Intersection Unit Proc $ Mem MC IU NIU Sits between the Processor (cache), PLB bus and NIU Processor interface Slave Eventually snoop Network interface Master Memory interface Hooks for coherency engine Incoherent version is a special case Programmable regions Global (local and remote) Local Proc $ Mem MC IU NIU PLB OPB bridge 9/20/2018 RAMP-White

Network Interface Unit Split into two components Msg composition/Queuing Net transmit/receive Insert/extract for ring Intended to permit other transmit/receive One input/one output Creates a simple unidirectional ring Can interface to more advanced fabrics Proc $ Mem MC IU NIU PLB OPB bridge 9/20/2018 RAMP-White

Operating System Started by looking at PowerPC Wanted an SMP OS Knew we didn’t have coherent cache But, also missing TLB Invalidation & OpenPIC (interprocessor interrupts, bring-up) But, do have load-reservation/store-conditional instructions Leon is SMP-capable, so should avoid these issues Starting with separate OS’s Region of memory is global (no Block Address Translation (BAT) so need to manage global pages) mmap 9/20/2018 RAMP-White

Status: Hari Angepat Bluespec learned NIU code complete and unit tested IU code complete being tested on XUP 2 PowerPC processors Supports interfaces Processor Slave PLB Master NIU Hardware intended to target different ISAs Some preliminary OS work SMP-linux investigation Multi-image mmap interface currently targeted Targets Phase 1 (incoherent shared memory) 2 IUs, 1 MC with an arbiter 9/20/2018 RAMP-White

Our Long Term Plans Phase 1, XUP complete end of 1Q07 With multi-OS support (with help from Stanford?) Phase 2, 1 BEE2 board hopefully will be 2Q07 Larger scalability, BEE2, Berkeley MC, Leon?, RDL? Phase 3, hopefully 4Q07 Arbitrary network, cache coherency engine, SMP OS?, Leon?, RDL? x86 CMP/SMP on top of RAMP-White Full cycle accurate (separate timing model) RAMP-White executes functional model in parallel Heterogeneous hosts! Start with Phase 1 (separate team) For Phase 3, tie target coherence system to RAMP-White Cache maintained by target coherence, not by host coherence 9/20/2018 RAMP-White

Sharing IP: Some Preliminary Experience We looked at RAMP-Red XUP Used some code (PLB master) Red-BEE is not ready to distribute Looking for switch code Berkeley’s code on CVS repository But, we can’t use memory controller because we don’t have BEE2 board yet Bluespec We are spinning almost all of our own code right now Would like to steal software OS (kernel proxy) SMP OS port Naming MPI reference design in BEE2 repository Is that RAMP-Blue? A central CVS repository for RAMP code? 9/20/2018 RAMP-White

Sharing Over the Long Term Processor is shared Leon PowerPC MicroBlaze Everything else MC is shared Xilinx or Berkeley Coherent cache can be shared Transactional/traditional Borrow Stanford’s? Coherency engine can be shared CMU/Stanford IU functionality can be shared Trying to make ours general NIU can be shared Borrow half from Berkeley? Network can be shared Borrow Berkeley’s? Proc $ Mem MC IU NIU CCE Peripherals 9/20/2018 RAMP-White

Conclusions RAMP White is started Have a clear first direction Hari has been working full time for 1 semester Have a clear first direction Architecture looks fairly flexible Would like to discuss how to share IP better so we don’t reinvent the wheel 9/20/2018 RAMP-White