CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore.

Slides:



Advertisements
Similar presentations
Parallelism Lecture notes from MKP and S. Yalamanchili.
Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Computer Abstractions and Technology
Power Reduction Techniques For Microprocessor Systems
SuperRange: Wide Operational Range Power Delivery Design for both STV and NTV Computing Xin He, Guihai Yan, Yinhe Han, Xiaowei Li Institute of Computing.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Instruction Level Parallelism (ILP) Colin Stevens.
How Multi-threading can increase on-chip parallelism
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
GCSE Computing - The CPU
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Energy Model for Multiprocess Applications Texas Tech University.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
8 – Simultaneous Multithreading. 2 Review from Last Time Limits to ILP (power efficiency, compilers, dependencies …) seem to limit to 3 to 6 issue for.
Power Reduction for FPGA using Multiple Vdd/Vth
Multi Core Processor Submitted by: Lizolen Pradhan
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Amdahl’s Law in the Multicore Era Mark D.Hill & Michael R.Marty 2008 ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun Ham 2012.
Feb. 19, 2008 Multicore Processor Technology and Managing Contention for Shared Resource Cong Zhao Yixing Li.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems Gui-Bin.
Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Core-Selectability in Chip-Multiprocessors Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Floating Point Numbers & Parallel Computing. Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing.
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.
CENTRAL PROCESSING UNIT. CPU Does the actual processing in the computer. A single chip called a microprocessor. Composed of an arithmetic and logic unit.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
EKT303/4 Superscalar vs Super-pipelined.
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
CS203 – Advanced Computer Architecture
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
“Temperature-Aware Task Scheduling for Multicore Processors” Masters Thesis Proposal by Myname 1 This slides presents title of the proposed project State.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
Multiprocessing.
Lynn Choi School of Electrical Engineering
Multi-core processors
Architecture & Organization 1
Hyperthreading Technology
Improved schedulability on the ρVEX polymorphic VLIW processor
Architecture & Organization 1
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Computer Evolution and Performance
The University of Adelaide, School of Computer Science
8 – Simultaneous Multithreading
Presentation transcript:

CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore Era Toshinori Sato, Hideki Mori, Rikiya Yano, Takanori Hayashida - Fukuoka University, Japan Published: Thirty-fifth Australasian Computer Science Conference

CISC 879 : Advanced Parallel Programming Outline Introduction Motivation Searching for best Multicore Single-core Performance improvement Results Conclusion

CISC 879 : Advanced Parallel Programming Introduction Pollack’s rule Processor performance is proportional to the square root of the area of the processor. Amhadl’s law: The speedup using multiple processors in parallel computing is limited by time needed for the sequential fraction of the program.

CISC 879 : Advanced Parallel Programming Motivation Increasing number of transistors for increasing the number of cores on a chip might not be the best choice. What would be the best configuration of a multicore processor? How do we improve the performance of the single-core?

CISC 879 : Advanced Parallel Programming Searching for the best Multicore As number of transistors increase on a chip, the flexibility to determine a processor configuration also increases. With this flexibility, we don’t know which the best configuration is; how many cores should it have; etc.

CISC 879 : Advanced Parallel Programming Searching for the best Multicore Processor Topologies:

CISC 879 : Advanced Parallel Programming Searching for the best Multicore Processor Topologies: 1.Single-core: For a better performance, in the future, one option is to increase the size of the core. All transistors on the chip are utilized by a single core. 2. Many-core: The core microarchitecture is fixed and multiple copies of the core are integrated on the chip.

CISC 879 : Advanced Parallel Programming Searching for the best Multicore 3. Heterogeneous Multicore: Only one core becomes large and other cores remain small. 4.Scalable Multicore: A collection of small cores that can logically fuse together to compose a high-performance large core. 5.Dynamically Configurable: The processor cores can combine together to form a larger core.

CISC 879 : Advanced Parallel Programming Single-core vs Many-core Single-core: As the core becomes larger, area-performance ratio meets a diminishing return (Pollack’s rule) Many-core: If the amount of parallelizable code is less, the speedup might not be as much (Amhadl’s law)

CISC 879 : Advanced Parallel Programming Single-core vs Many-core X-axis: Times the area of a baseline processor Y-axis: Performance improvement rate

CISC 879 : Advanced Parallel Programming Single-core vs Heterogeneous Multicore Heterogeneous Multicore: They are widely studied for improving energy efficiency. Parallelized portions are executed by multiple small cores and hard-to-parallelize portions are executed by a big strong core. Interestingly, the performance is equivalent regardless of the big core’s size

CISC 879 : Advanced Parallel Programming Single-core vs Heterogeneous Multicore X-axis: Times the area of a baseline processor Y-axis: Performance improvement rate

CISC 879 : Advanced Parallel Programming Heterogeneous Multicore vs Scalable Homogeneous Scalable Homogeneous: They have smaller number of larger cores. Sometimes using 3 large cores is desirable when compared to using 6 small cores.

CISC 879 : Advanced Parallel Programming Heterogeneous Multicore vs Scalable Homogeneous X-axis: Times the area of a baseline processor Y-axis: Performance improvement rate

CISC 879 : Advanced Parallel Programming Heterogeneous vs Dynamically Configurable Dynamically Configurable: They dynamically configure each core and size of each core.

CISC 879 : Advanced Parallel Programming Heterogeneous vs Dynamically Configurable X-axis: Times the area of a baseline processor Y-axis: Performance improvement rate

CISC 879 : Advanced Parallel Programming Heterogeneous vs Dynamically Configurable Dynamic reconfiguration suffers approx. 25% penalty.(0.8 DC-n & 0.8 DC-8) As the number of cores increases, it becomes difficult to combine all cores due to the increasing complexity of interconnects. Red dashed line represents the current technology. The 0.8 DC 8 is the most practical Dynamically configurable processor and it’s performance is not as good as Heterogeneous.

CISC 879 : Advanced Parallel Programming Single-Core performance improvement Increasing clock frequency has been the easiest way to improve performance. But it increases the power supply voltage, resulting in serious power and temperature problems. A technique to increase the clock frequency without increasing supply voltage.

CISC 879 : Advanced Parallel Programming Cool Turbo Boost Intel’s Turbo Boost Technology increases the supply voltage and thus clock frequency. Cool Turbo Boost Technology, will not require the increase in supply voltage. When the hardware size and complexity become small, there is an opportunity to increase its clock frequency. (Intel ATOM)

CISC 879 : Advanced Parallel Programming Cool Turbo Boost Datapath: A collection of functional units, as arithmetic logic units or multipliers, that perform data processing operations, registers, and buses. When datapath becomes small, its computing performance is degraded. If the performance loss is not compensated by the clock frequency boost, then the processor performance is diminished.

CISC 879 : Advanced Parallel Programming Cool Turbo Boost Instruction level parallelism (ILP): Number of operations in a computer program that can be performed simultaneously. When ILP is small, small datapath is enough; otherwise, the datapath should not be reduced. Hence, the datapath is dynamically configured according to ILP in each program phase.

CISC 879 : Advanced Parallel Programming Cool Turbo Boost Multiple Clustered-Core Processor (MCCP): Configures its datapath according to ILP and thread level parallelism (TLP) in the program. The authors configure MCCP so that its clock frequency is increased when it configures its datapath small.

CISC 879 : Advanced Parallel Programming Results Six programs from SPECint2000 are used and executed for 2 billion instructions are executed Narrow Datapath ResultsCool Turbo Boosting Results X-axis: Boosting ratio Y-axis: Normalized Single-core performance

CISC 879 : Advanced Parallel Programming Results Average performance loss of Narrow datapath is 36.1% and of Cool turbo boost is only 4.2% When boosting rate reaches 1.4 and 1.6, the performance is improved by 5.0% and 8.7% on average respectively. For parser (which includes gzip, vpr and parser) the performance of cool turbo boost is not good. Whereas for Vortex (includes gcc and vortex) the performance is better regardless of boosting ratio.

CISC 879 : Advanced Parallel Programming Conclusion Paper investigates the best multicore configuration for the near future, winner is Heterogeneous Multicore. It unveiled that the single-core performance is the key for improving the performance of the heterogeneous multicore in the near future. The average performance improvement using the Cool Turbo Boost Technology is only 5%. Hence, future studies are to be made in this area.

CISC 879 : Advanced Parallel Programming Questions?

CISC 879 : Advanced Parallel Programming Thank you