Download presentation
Presentation is loading. Please wait.
Published byAshlie Hardy Modified over 8 years ago
1
UltraSparc IV Tolga TOLGAY
2
OUTLINE Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion
3
INTRODUCTION Sparc = Scalable Processor Architecture Open processor architecture SUN UltraSparc v9: RISC Architecture 64 bit address and data Superscalar Sparc = Scalable Processor Architecture Open processor architecture SUN UltraSparc v9: RISC Architecture 64 bit address and data Superscalar
4
HISTORY Begin developing Sparc – 1984 First Sparc Processor – 1986 SuperSparc – 1992 UltraSparc I – 1995 UltraSparc II – 1997 UltraSparc III – 2001 UltraSparc IV – 2004 UltraSparc IV – 2004 UltraSparc IV+ – 2005 UltraSparc T1 – 2005 Begin developing Sparc – 1984 First Sparc Processor – 1986 SuperSparc – 1992 UltraSparc I – 1995 UltraSparc II – 1997 UltraSparc III – 2001 UltraSparc IV – 2004 UltraSparc IV – 2004 UltraSparc IV+ – 2005 UltraSparc T1 – 2005
5
WHAT IS NEW? What UltraSparc IV offers new : CMT (Chip Multithreading) New registers added due to CMT enhancement MCU registers, Sun Fireplan Interconnect registers are shared. Enhancements on Floating Point Unit 16 MB L2 cache with 128 byte line-size shared by two processors. L2 caches uses LRU replacement strategy New write-cache indexing-hashing feature What UltraSparc IV offers new : CMT (Chip Multithreading) New registers added due to CMT enhancement MCU registers, Sun Fireplan Interconnect registers are shared. Enhancements on Floating Point Unit 16 MB L2 cache with 128 byte line-size shared by two processors. L2 caches uses LRU replacement strategy New write-cache indexing-hashing feature
6
Chip Multitreading (CMT) Two UltraSparc III cores into one die. Two mirrored cores share : System bus DRAM controller Off-die L2 cache Fireplan registers. Also called Chip Multiprocessing Two UltraSparc III cores into one die. Two mirrored cores share : System bus DRAM controller Off-die L2 cache Fireplan registers. Also called Chip Multiprocessing
7
Chip Multitreading
8
Aim is to increase performance without increasing clock speed. Mirroring the cores cause a hot spot of floating point units. How to avoid hot spot : Heat towers in copper interconnect Aim is to increase performance without increasing clock speed. Mirroring the cores cause a hot spot of floating point units. How to avoid hot spot : Heat towers in copper interconnect
9
Chip Multitreading
10
Core More core improvements: Improved instruction fetch and store bandwidth. Improved data prefetching FPU can handle more unexpected and underflow cases so reducing exceptions. On-die cache enhanced with a hashed index to better handle multiple writes. More core improvements: Improved instruction fetch and store bandwidth. Improved data prefetching FPU can handle more unexpected and underflow cases so reducing exceptions. On-die cache enhanced with a hashed index to better handle multiple writes.
11
Pipeline Because UltraSparc IV contains two UltraSparc III cores, it uses the same pipeline. 4-way superscalar architecture. 14-stage pipeline. Because UltraSparc IV contains two UltraSparc III cores, it uses the same pipeline. 4-way superscalar architecture. 14-stage pipeline.
12
Pipeline Stages
13
Pipeline StageDefinition AAddress Generation PPreliminary Fetch FFetch Intructions from I-Cache BBranch Target Computation IInstruction Group Formation JGrouping RRegister Access EExecute CCache MMiss Detect WWrite XExtend TTrap DDone
14
Pipeline Stages
15
Stage A : Address Generation Generates and selects the fetch address Address can be selected from several sources Stage P : Preliminary Fetch Starts fetching from I-Cache Accesses to Branch Predictor Stage F : Fetch Second half of I-Cache access At the end of stage 4 instructions may be latched Stage B : Branch Target Computation Analyzes the instructions Calculate branch target address Stage A : Address Generation Generates and selects the fetch address Address can be selected from several sources Stage P : Preliminary Fetch Starts fetching from I-Cache Accesses to Branch Predictor Stage F : Fetch Second half of I-Cache access At the end of stage 4 instructions may be latched Stage B : Branch Target Computation Analyzes the instructions Calculate branch target address
16
Pipeline Stages Stage I : Instruction Group Formation Instructions are grouped into instruction queue. Stage J : Instruction Group Staging A group of instructions are dequeued and sent to R-Stage Stage R : Dispatch and Register Access Dependency calculation Dependency solution Stage I : Instruction Group Formation Instructions are grouped into instruction queue. Stage J : Instruction Group Staging A group of instructions are dequeued and sent to R-Stage Stage R : Dispatch and Register Access Dependency calculation Dependency solution
17
Pipeline Stages Stage E : Integer Instruction Execution First stage of execution pipelines Integer instructions -> A0 and A1 pipelines Branch instructions -> Branch pipeline Other instructions -> MS pipeline Stage C : Cache Integer pipelines write results back SIU results are produced First stage for Floating Point Instructions Stage E : Integer Instruction Execution First stage of execution pipelines Integer instructions -> A0 and A1 pipelines Branch instructions -> Branch pipeline Other instructions -> MS pipeline Stage C : Cache Integer pipelines write results back SIU results are produced First stage for Floating Point Instructions
18
Pipeline Stages Stage M : Miss Data cache misses are determined Second step for FP instructions Stage W : Write MS pipeline results are written Third step for FP instructions D-cache miss requests send to L2 cache Stage X : Extend Final step for Floating Point instructions Results from FP instructions are ready for bypass Stage M : Miss Data cache misses are determined Second step for FP instructions Stage W : Write MS pipeline results are written Third step for FP instructions D-cache miss requests send to L2 cache Stage X : Extend Final step for Floating Point instructions Results from FP instructions are ready for bypass
19
Pipeline Stages Stage T : Trap Traps are signalled After trap, instructions invalidate results Stage D : Done Integer results are written into architectural register file Floating point results are written to floating point register file. Results became visible to any traps generated from younger instructions. Stage T : Trap Traps are signalled After trap, instructions invalidate results Stage D : Done Integer results are written into architectural register file Floating point results are written to floating point register file. Results became visible to any traps generated from younger instructions.
20
Pipeline Rules Grouping rules : Group : collection of instructions that does not limit eachother to be executed in parallel Made before R-stage Needed for : The execution order is maintained Each pipeline runs a subset of instructions Instructions may require helpers Execution order : in – order execution Grouping rules : Group : collection of instructions that does not limit eachother to be executed in parallel Made before R-stage Needed for : The execution order is maintained Each pipeline runs a subset of instructions Instructions may require helpers Execution order : in – order execution
21
Cache Organization Doubled cache size because of dual core. Data Cache : 64 KB x 2 Instruction Cache : 32 KB x 2 L2 Cache : 16 MB, off-chip, shared No L3 Cache Doubled cache size because of dual core. Data Cache : 64 KB x 2 Instruction Cache : 32 KB x 2 L2 Cache : 16 MB, off-chip, shared No L3 Cache
22
Cache Organization
23
Data Cache 64 KB Level 1 cache per core Instruction Cache 32 KB Level 1 cache per core 4 – way associative Data Cache 64 KB Level 1 cache per core Instruction Cache 32 KB Level 1 cache per core 4 – way associative
24
Cache Organization Prefetch Cache One of L1 caches 2 Kbyte SRAM : 32 x 64 bytes Uses LRU replacement algorithm Aim is to fetch data before needed Reduces main memory access latency 2 ports reads 8 bytes, 1 port writes 16 bytes per cycle. Hardware prefetch Prefetch Cache One of L1 caches 2 Kbyte SRAM : 32 x 64 bytes Uses LRU replacement algorithm Aim is to fetch data before needed Reduces main memory access latency 2 ports reads 8 bytes, 1 port writes 16 bytes per cycle. Hardware prefetch
25
Cache Organization Write Cache Reduces the bandwidth due to store traffic 2 Kbyte cache Handles multiprocessor and on-chip cache consistency Improves error recovery Optionally uses a hashed index Write Cache Reduces the bandwidth due to store traffic 2 Kbyte cache Handles multiprocessor and on-chip cache consistency Improves error recovery Optionally uses a hashed index
26
Cache Organization L2 Cache 16 MB SRAM shared by two processors Seperate L2 cache tags Two way set associative LRU replacement policy 128 bytes of line size UltraSparc IV+ has an on-die Level 2 cache with an off-die Level 3 cache L2 Cache 16 MB SRAM shared by two processors Seperate L2 cache tags Two way set associative LRU replacement policy 128 bytes of line size UltraSparc IV+ has an on-die Level 2 cache with an off-die Level 3 cache
27
Branch Prediction Branch Predictor : Small, single-cycle accessed SRAM Output is connected to P-stage Branch detemination is made in B-stage If miss, return to A-Stage. Branch Predictor : Small, single-cycle accessed SRAM Output is connected to P-stage Branch detemination is made in B-stage If miss, return to A-Stage.
28
Conclusion UltraSparc IV is a milestone as it is first dual core chip of UltraSparc family Sun continues to develop UltraSparc : UltraSparc IV+ UltraSparc T1 UltraSparc IV is a milestone as it is first dual core chip of UltraSparc family Sun continues to develop UltraSparc : UltraSparc IV+ UltraSparc T1
29
References UltraSparc IV User’s Manual, Sun Microsystems UltraSparc IV Whitepaper, Sun Microsystems UltraSparc IV Mirrors Predecessor, Kevin Krewell Implementation and Productization of a 4th Generation 1.8GHz Dual-Core SPARC V9 Microprocessor, Anand Dixit, Jason Hart,... UltraSparc III User’s Manual, Sun Microsystems UltraSparc IV User’s Manual, Sun Microsystems UltraSparc IV Whitepaper, Sun Microsystems UltraSparc IV Mirrors Predecessor, Kevin Krewell Implementation and Productization of a 4th Generation 1.8GHz Dual-Core SPARC V9 Microprocessor, Anand Dixit, Jason Hart,... UltraSparc III User’s Manual, Sun Microsystems
30
References Web Sites : http://web.cs.unlv.edu/cs219/group3/index.html http://bwrc.eecs.berkeley.edu/CIC/archive/cpu_ history.html#SPARC http://www.arcade-eu.org/overview/2005/ sparcIV.html http://www.top500.org/orsc/2006/sparcIV.htm http://www.sparc.org/history.html Web Sites : http://web.cs.unlv.edu/cs219/group3/index.html http://bwrc.eecs.berkeley.edu/CIC/archive/cpu_ history.html#SPARC http://www.arcade-eu.org/overview/2005/ sparcIV.html http://www.top500.org/orsc/2006/sparcIV.htm http://www.sparc.org/history.html
31
Questions...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.