WildFire: A Scalable Path for SMPs Erick Hagersten and Michael Koster Sun Microsystems Inc. Presented by Terry Arnold II.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.

Distributed Systems CS

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

University of Maryland Locality Optimizations in cc-NUMA Architectures Using Hardware Counters and Dyninst Mustafa M. Tikir Jeffrey K. Hollingsworth.

CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA Babak Falsafi and David A. Wood University of Wisconsin, Madison, 1997 Presented by: Jie Xiao.

HBO Locks Uppsala University Department of Information Technology Uppsala Architecture Research Team [UART] Hierarchical Back-Off (HBO) Locks for Non-Uniform.

1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.

1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.

Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division.

G Robert Grimm New York University Disco.

Disco Running Commodity Operating Systems on Scalable Multiprocessors.

Chapter 17 Parallel Processing.

Eidgenössische TechnischeHochschule Zürich Ecolepolytechniquefédérale de Zurich PolitecnicofederalediZurigo Swiss Federal Institute of Technology Zurich.

Symmetric and CC-NUMA. Scope zDesign experiences of SMPs and Coherent Cache Nonuniform Memory Access (CC- NUMA) zNUMA yNatural extension of SMP systems.

CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008.

Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

1 Lecture 22 Multiprocessor Performance Adapted from UCB CS252 S01, Copyright 2001 USB.

Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum

Embedded System Lab. 김해천 Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist,

1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.

Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.

Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

© 1999, Cisco Systems, Inc. 1-1 Chapter 2 Overview of a Campus Network © 1999, Cisco Systems, Inc.

Supporting Multi-Processors Bernard Wong February 17, 2003.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.

Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.

A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.

1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Shared Memory MPs – COMA & Beyond Copyright 2004 Daniel J. Sorin Duke.

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

The University of Adelaide, School of Computer Science

CT101: Computing Systems Introduction to Operating Systems.

Monitoring Windows Server 2012

Architecture and Design of AlphaServer GS320

Definition of Distributed System

The Multikernel: A New OS Architecture for Scalable Multicore Systems

Reactive NUMA A Design for Unifying S-COMA and CC-NUMA

Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

What we need to be able to count to tune programs

Advanced Operating Systems

STARFIRE Extending the SMP Envelope

Lecture 24: Memory, VM, Multiproc

High Performance Computing

The University of Adelaide, School of Computer Science

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Lecture 17 Multiprocessors and Thread-Level Parallelism

A Virtual Machine Monitor for Utilizing Non-dedicated Clusters

The University of Adelaide, School of Computer Science

Lecture 10: Directory-Based Examples II

Presentation transcript:

WildFire: A Scalable Path for SMPs Erick Hagersten and Michael Koster Sun Microsystems Inc. Presented by Terry Arnold II

Introduction What was the goal? What was the goal? How did they achieve it? How did they achieve it? CMR CMR HAS HAS Competitive Comparisons Competitive Comparisons Results Results Questions Questions

The Goal In the past people have been skeptical about the ability of SMPs to continue to scale due to their bandwidth limitations In the past people have been skeptical about the ability of SMPs to continue to scale due to their bandwidth limitations The trend has been to switch to cc-NUMA The trend has been to switch to cc-NUMA To improve the scalability of SMP technologies To improve the scalability of SMP technologies

Cc-NUMA issues Great scalability but have less than optimal “access patterns” Great scalability but have less than optimal “access patterns” Require high software optimization for capacity and conflict misses Require high software optimization for capacity and conflict misses Non trivial scheduling, etc. (resource and memory management) Non trivial scheduling, etc. (resource and memory management)

How? The answer is the same as the answer to all engineering problems, that is, throwing new acronyms at the problem The answer is the same as the answer to all engineering problems, that is, throwing new acronyms at the problem Coherent Memory Replication (CMR) Coherent Memory Replication (CMR) Hierarchical Affinity Scheduling (HAS) Hierarchical Affinity Scheduling (HAS) Both of these exploit locality as a means of increasing performance (that is for OLTP workloads) Both of these exploit locality as a means of increasing performance (that is for OLTP workloads)

The Overview

The Acronyms: CMR S-COMA with fixed home locations for each address S-COMA with fixed home locations for each address Shadow physical pages Shadow physical pages Coherence at hardware level (64 byte) Coherence at hardware level (64 byte) Things start out cc-NUMA and changed into CMR based on hardware counters that monitor memory access patterns Things start out cc-NUMA and changed into CMR based on hardware counters that monitor memory access patterns Limitations – memory-resident pages and large physical pages can only be replicated explicitly Limitations – memory-resident pages and large physical pages can only be replicated explicitly

The Acronyms: HAS Schedules in the following way: Schedules in the following way: Last processor it ran on Last processor it ran on Same node processor Same node processor Remote node processor (when load balances exceeds “threshold”) Remote node processor (when load balances exceeds “threshold”)

Implementation 2 ASICs – NIAC (coherence), NIDC (bit sliced interconnect) 2 ASICs – NIAC (coherence), NIDC (bit sliced interconnect) These improve upon latency of a switch These improve upon latency of a switch NIAC – Interface and Global-Coherence Layer NIAC – Interface and Global-Coherence Layer Translators and Counters Translators and Counters

Competition The SGI Origin and Sequent’s NUMA-Q The SGI Origin and Sequent’s NUMA-Q

Results 1

Results 2

Questions? Is this “solution” too dependent on the software (kernel modifications)? How compatible are CMR and HAS with the other DSM solutions?