Download presentation
Presentation is loading. Please wait.
Published byClinton Sims Modified over 9 years ago
1
One-Chip TeraArchitecture 19 martie 2009 One-Chip TeraArchitecture Gheorghe Stefan http://arh.pub.ro/gstefan/
2
One-Chip TeraArchitecture Outline One-Chip Parallel Engines - an Emergent Market One-Chip Parallel Architecture & its Performance Integral Parallel Architecture A Case Study: BA1024 Concluding Remarks
3
One-Chip TeraArchitecture One-Chip Parallel Engines – an Emergent Market Parallelism is ubiquitous: Instruction Level Parallelism Multi-Threaded Execution Multi-Core Many-Core Engines Many-Computer (Message Passing Interface)
4
One-Chip TeraArchitecture (Performance / Power or Price) & Market 1.Performance only approach: supercomputing 2.Performance/Price approach: hi-end PCs 3.Performance/Price & performance/Power approach: embedded computing
5
One-Chip TeraArchitecture SoC Market & Programmability SoC in nano-meter era asks for: High complexity High intensity High flexibility One Giga-Gate per Chip Era enforce complexity can’t follow size Then the key word is PROGRAMMABILITY
6
One-Chip TeraArchitecture Embedded Parallel Computing “Flexible & Feasible ASIC” = programmable parallel engine ASIC is a circuit = inherent heterogeneous parallel system Flexibility = programmability Feasibility = segregating all kinds of the simple parallel structures from the complex program
7
One-Chip TeraArchitecture One-Chip Parallel Architecture & its Performance Because a programmable structure competes with ASIC philosophy, one-chip parallel architecture must be an integral parallel architecture The performance is evaluated according to the weight of each type of instruction: float, pointer, word, half-word, byte Examples: for a half-word machine a word instruction is executed in 2 cycles For a word machine a float instruction is executed in 20 cycles
8
One-Chip TeraArchitecture
9
Weighted Tera Instruction Per Second (TIPS) Medium intense float: Float op: 10% Word op: 13% Half-word op: 30% Byte op: 37% 1 TIPS = 2.75 TOPS High intense float: Float op: 25% Word op: 35% Half-word op: 12% Byte op: 28% 1 TIPS = 5.96 TOPS Float op : 20 cycles Word op : 2 cycles Half-word op : 1 cycle Byte op : 0.5 cycles Then: 1 TIPS = 3 – 6 TOPS
10
One-Chip TeraArchitecture Integral Parallel Architecture (IPA) Computation is: Complex (control intensive) Intense (data intensive) Parallelism is: data parallelism (almost SIMD) time parallelism (a sort of MIMD) speculative parallelism (a true MISD)
11
One-Chip TeraArchitecture Complex vs. Intense Intense computation: high latency functional pipe array computation buffer based hierarchy 400 GOPS (half-word ops) 0.5 cm 2, 6 W, 0.4 GHz 800 GOPS/ cm 2 6.6 GOPS/W Complex computation: OS oriented multi-threading cache based hierarchy 4 GIPS & 2 GFLOPS 1.5 cm 2, 50 W, 2 GHz (2.6 GIPS+1.3 GFLOPS)/cm 2 (0.08GIPS + 0.04GFLOPS)/W
12
One-Chip TeraArchitecture Embedded Parallel Organization Coarse grain Multi-Core Engine for complex computation & fine grain Many-core Engine for intense computation Multi-Core: 2 – 16 multi-threaded complex processors Many-Core: 256 – 4096 small & simple execution units (EU) or processing elements (PE)
13
One-Chip TeraArchitecture Chip Organization Cache Interconnection Fabric DDR SDRAM Interface Multi-Core 2 -16 Many-Core 64 - 4096 Buffer
14
One-Chip TeraArchitecture A Case Study: BA1024 The organization of BA1024: multi-core area of 4 MIPS many-core data parallel area of 1024 simple PEs speculative time parallel pipe of 8 PEs interfaces (DDR, PCI, video & audio interfaces for 2 HDTV channels)
15
One-Chip TeraArchitecture Overall performances of BA1024 400 GOP/sec 6.4 GB/sec: external bandwidth 800 GB/sec: internal bandwidth > 60 GOPS/Watt > 8 GOPS/mm 2 65 nm, Standard process Note: 1 OP = 16-bit simple integer operation (excluding multiplication)
16
One-Chip TeraArchitecture Full Vector Operations 0 511 01023 Line i Line k Line j +, -, *, XOR, etc. = Line k = Line i OP Line j Line k = Line i OP scalar value (repeated for all elements) 16-bit data operand
17
One-Chip TeraArchitecture Conditioned Operations Based 0 511 01023 Line i Line k Line j +, -, *, XOR, etc. = This enables selective processing based on data content.
18
One-Chip TeraArchitecture Multi-Core Organization Multi-threaded programming model Each core supports: block multi-threading interleaved multi-threading Number of cores limited by the random access to the external memory
19
One-Chip TeraArchitecture Extrapolating BA1024 performance Medium float environment: 45 nm, standard process 1cm2 4096 EUs 0.7 GHz 2.8 TOPS = 1 TIPS ~ 25 W High float environment: 45 nm, standard process 1cm2 4096 EUs 1.5 GHz 6 TOPS = 1 TIPS ~ 50W
20
One-Chip TeraArchitecture Concluding Remarks 1. Segregating the complex from intense is the key 2. Using all forms of parallelism allow the competition with ASIC approach 3. Implementation issues limit the true scalability 4. The organization must be maintained as simple as possible in order to be easy hidden to the user 5. The Landscape of Parallel Computing Research: A View from Berkeley is a good tool to evaluate our approach
21
One-Chip TeraArchitecture Main technical contributors to the project: Emanuele Altieri, BrightScale Inc., CA Frank Ho, BrightScale Inc., CA Mihaela Malita, St. Anselm College, NH Bogdan Mitu, BrightScale Inc., CA Marius Stoian, PUB, Romania Dominique Thiebaut, Smith College, MA Dan Tomescu, BrightScale Inc., CA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.