Download presentation
Presentation is loading. Please wait.
1
Presented by: Nick Kirchem Feb 13, 2004
Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing Luiz A. Barroso et al. (Compaq Computer Corporation) Presented by: Nick Kirchem Feb 13, 2004
2
Target and Motivation Commercial applications (databases, OLTP)
Most important market for high performance servers Data dependent computation (low ILP) Little gained by complex multiple issue out-of-order processors Complexity of current processors Long design times High development costs Better use of transistors?
3
Project Goals Design a Chip Multiprocessing (CMP) System
Integrate 8 simple processor cores on a single chip Exploit thread-level parallelism instead of ILP High performance, Low Cost Achieve superior performance on commercial workloads Small team, modest investment, short design time
4
Architecture Overview
5
Architecture Elements
Simple Processors (500 MHz, In-Order) No I/O capability on chip (separate I/O nodes) Up to 1024 nodes in a system Individual L1 Caches (64KB, 2-way set-assoc) One Logical L2 Cache, interleaved, 1MB Intra-Chip Switch Unidirectional crossbar Transaction based, atomic transfers Bandwidth ~3x memory bandwidth
6
Intra-Chip Cache Coherence
MESI protocol No Inclusion (1 MB aggregate L1, 1MB L2) But, L2 holds copy of L1 tags and state (no snooping required at L1) L1 filled directly from memory (L2 = victim cache) Coherence handled by L2 controllers Can service request directly, forward to owner L1, forward to protocol engine, obtain from Memory
7
Inter-Node Coherence Protocol Engines (microprogrammable controllers)
Home: exports local memory Remote: imports remote memory Directory Storage Compute ECC at coarse granularity, use extra bits for directory info no memory space overhead Directory granularity = 1 node (not individual processor) Interconnect: I/O queues, router (point-to-point, 4 links) No NAKs – avoid deadlock by sufficient buffering, and guarantee forwarded requests can be serviced
8
Performance Evaluation
OLTP and DSS workloads: TPC-B/D, Oracle database SimOS-Alpha environment Compared: Piranha 500 MHz and Full-Custom 1.25 GHz Next-generation Microprocessor (OOO) 1 GHz Single Chip Evaluation OOO outperforms P1 (individual proc) by 2.3x P8 outperforms OOO by 3x Speedup of P8 over P1 = 7x Multi-chip Configurations Four chips (only 4 CPUs per chip ?!) Results show that Piranha scales better than OOO
9
Questions/Concerns Would the Piranha design be worthwhile if there were a well-designed SMT processor (with 4 or 8 threads)? Reliability better or worse with multiple chips per processor? Power consumption?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.