CMP research colloquiumNovember 5, 2007 Improving Memory Power and Performance for CMPs John Carter Students: Devyani Ghosh Kshitij Sudan Aniruddha Udipi.

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Cache coherence for CMPs Miodrag Bolic. Private cache Each cache bank is private to a particular core Cache coherence is maintained at the L2 cache level.
Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain) MOSAIC :
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
CSCE 432/832 High Performance ---- An Introduction to Multicore Memory Hierarchy Dongyuan Zhan CS252 S05.
Optimizing Shared Caches in Chip Multiprocessors Samir Sapra Athula Balachandran Ravishankar Krishnaswamy.
High Performing Cache Hierarchies for Server Workloads
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Lecture 16: Large Cache Innovations Today: Large cache design and other cache innovations Midterm scores  91-80: 17 students  79-75: 14 students 
Utilizing Shared Data in Chip Multiprocessors with the Nahalal Architecture Zvika Guz, Idit Keidar, Avinoam Kolodny, Uri C. Weiser The Technion – Israel.
Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.
Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers Manu Awasthi, David Nellans, Kshitij Sudan, Rajeev Balasubramonian,
1 Lecture 8: Large Cache Design I Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.
1 Lecture 11: Large Cache Design Topics: large cache basics and… An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al.,
1 Lecture 15: Large Cache Design Topics: innovations for multi-mega-byte cache hierarchies Reminders:  Assignment 5 posted.
Tile Size Selection for Low-Power Tile-based Architectures Michael Brown.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Multiprocessor Cache Coherency
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.17.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.
ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Lecture 5 Non-Uniform Cache.
Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.
Multi-core architectures. Single-core computer Single-core CPU chip.
1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)
Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007.
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
CMP L2 Cache Management Presented by: Yang Liu CPS221 Spring 2008 Based on: Optimizing Replication, Communication, and Capacity Allocation in CMPs, Z.
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
Caching in multiprocessor systems Tiina Niklander In AMICT 2009, Petrozavodsk
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Martin Kruliš by Martin Kruliš (v1.1)1.
By Islam Atta Supervised by Dr. Ihab Talkhan
Optimizing Replication, Communication, and Capacity Allocation in CMPs Z. Chishti, M. D. Powell, and T. N. Vijaykumar Presented by: Siddhesh Mhambrey Published.
IMPROVING THE PREFETCHING PERFORMANCE THROUGH CODE REGION PROFILING Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri.
By Chad Andrus. TILE-Gx100  100 Identical Processor Cores Each core has its own L2 & L3 cache Each can run its own OS or group together for multiprocessing.
ECE 692 Power-Aware Computer Systems Final Review Prof. Xiaorui Wang.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.
Presented by: Nick Kirchem Feb 13, 2004
ASR: Adaptive Selective Replication for CMP Caches
Lecture: Large Caches, Virtual Memory
CS 147 – Parallel Processing
Multi-Processing in High Performance Computer Architecture:
Lecture: Large Caches, Virtual Memory
Lecture 13: Large Cache Design I
CMSC 611: Advanced Computer Architecture
Lecture 12: Cache Innovations
Directory-based Protocol
Cache Coherence Protocols:
Cache Coherence Protocols:
Another Performance Evaluation of Memory Hierarchy in Embedded Systems
Interconnect with Cache Coherency Manager
Lecture: Cache Innovations, Virtual Memory
CS 3410, Spring 2014 Computer Science Cornell University
Lecture: Cache Hierarchies
Lecture 23: Virtual Memory, Multiprocessors
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
CSE 486/586 Distributed Systems Cache Coherence
Presentation transcript:

CMP research colloquiumNovember 5, 2007 Improving Memory Power and Performance for CMPs John Carter Students: Devyani Ghosh Kshitij Sudan Aniruddha Udipi

CMP research colloquiumNovember 5, 2007 Scaling the CMP Memory Wall l How should CMP memory hierarchy be organized? –What are the power/perf./reliability/verifiability/cost tradeoffs? –Tiled vs Hierarchical vs. NUCA vs 3-D cache organizations l What resource allocation and caching policies? –How (if at all) should cache be shared? Dynamic vs static. –Is benefit of cooperation sufficient given extra design/verification? l Can we move computation to data? –Caches not always useful; moving data often a waste –Perform “add (x1), #1  r3” wherever (x1) is located l Enhanced coherence protocols –Support for updates, speculative use of data, etc. –Idea: Use machine learning to determine “opt” protocol behavior Devyani Aniruddha Kshitij

CMP research colloquiumNovember 5, 2007 Tiled Cache Organization l Cookie-cutter design –N copies of CPU design –Easiest to build/verify/… l Resource sharing? l Interconnect bottleneck? l Memory controllers? l Scalability? L1I CPU 4 L2 L1D Interconnect L1I CPU 5 L2 L1DL1I CPU 6 L2 L1DL1I CPU 7 L2 L1D L1I CPU 0 L2 L1DL1I CPU 1 L2 L1DL1I CPU 2 L2 L1DL1I CPU 3 L2 L1D

CMP research colloquiumNovember 5, 2007 Hierarchical Cache Organization l Scalable hierarchy –More complex protocols –Better scaling l Resource sharing? l What hierarchy? l Memory controllers? Interconnect L2 L1I CPU 0 L1DL1I CPU 1 L1DL1I CPU 2 L1DL1I CPU 3 L1D L1I CPU 4 L1DL1I CPU 5 L1DL1I CPU 6 L1DL1I CPU 7 L1D L3 L2

CMP research colloquiumNovember 5, 2007 NUCA Cache Organization l Malleable by design –Sea of caches –Islands of CPU –Memory controller atolls? l Static/dynamic allocation of cache  cores –Who needs more capacity? –Which cores cooperating? –Migrate/replicate data 3-D (2.5-D) possible for any design. L1 I $ L1 D $ CPU 2 L1 I $ L1 D $ CPU 3 L1 D $ L1 I $ CPU 7 L1 D $ L1 I $ CPU 6 L1 D $ L1 I $ CPU 1 L1 D $ L1 I $ CPU 0 L1 I $ L1 D $ CPU 4 L1 I $ L1 D $ CPU 5