© 2000 Morgan Kaufman Overheads for Computers as Components Energy/power optimization  Energy: ability to do work.  Most important in battery-powered.

Slides:

Advertisements

Similar presentations

Instruction Level Parallelism and Superscalar Processors

Advertisements

CH14 Instruction Level Parallelism and Superscalar Processors

Code Optimization and Performance Chapter 5 CS 105 Tour of the Black Holes of Computing.

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.

Instruction Set Design

Computer Organization and Architecture

CSCI 4717/5717 Computer Architecture

Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPUs CPU performance CPU power consumption. 1.

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

Compiler techniques for exposing ILP

1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.

1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

1 Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2)

Chapter 4 Advanced Pipelining and Intruction-Level Parallelism Computer Architecture A Quantitative Approach John L Hennessy & David A Patterson 2 nd Edition,

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)

© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zCPU performance zCPU power consumption.

Shortcomings of The Simple CPUs

Parallell Processing Systems1 Chapter 4 Vector Processors.

Evaluating Performance and Power of Object-oriented vs. Procedural Programming in Embedded Processors A. Chatzigeorgiou, G. Stephanides Department of Applied.

Memory Consistency in Vector IRAM David Martin. Consistency model applies to instructions in a single instruction stream (different than multi-processor.

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

Chapter 12 Pipelining Strategies Performance Hazards.

1 Tuesday, September 19, 2006 The practical scientist is trying to solve tomorrow's problem on yesterday's computer. Computer scientists often have it.

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

Optimization Compiler Optimization – optimizes the speed of compilation Execution Optimization – optimizes the speed of execution.

Operating Systems Béat Hirsbrunner Main Reference: William Stallings, Operating Systems: Internals and Design Principles, 6 th Edition, Prentice Hall 2009.

CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.

Processor Structure & Operations of an Accumulator Machine

Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.

Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.

© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zOptimizing for execution time. zOptimizing for energy/power. zOptimizing.

© 2005 ECNU SEIPrinciples of Embedded Computing System Design1 Program design and analysis zOptimizing for execution time. zOptimizing for energy/power.

The Central Processing Unit (CPU) and the Machine Cycle.

Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.

Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.

Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)

Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.

© 2000 Morgan Kaufman Overheads for Computers as Components Program design and analysis zCompilation flow. zBasic statement translation. zBasic optimizations.

Performance Tuning John Black CS 425 UNR, Fall 2000.

© 2000 Morgan Kaufman Overheads for Computers as Components CPUs zCPU performance: How fast it can execute instructions  increasing throughput by pipelining.

Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑教授組員 : R 張馨怡 R 林秀萍.

Computer operation is of how the different parts of a computer system work together to perform a task.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.

Introduction Why low power?

Code Optimization.

CS2100 Computer Organization

Introduction To Computer Systems

Computer architecture and computer organization

The University of Adelaide, School of Computer Science

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

Instruction Level Parallelism and Superscalar Processors

STUDY AND IMPLEMENTATION

Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2)

Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2) HW3 posted, due in a week.

Overheads for Computers as Components 2nd ed.

Compiler Code Optimizations

Lecture 5: Pipeline Wrap-up, Static ILP

Out-of-Order Execution (OoOE)

* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.

Presentation transcript:

© 2000 Morgan Kaufman Overheads for Computers as Components Energy/power optimization  Energy: ability to do work.  Most important in battery-powered systems.  Power: energy per unit time.  Important even in wall-plug systems---power becomes heat.

© 2000 Morgan Kaufman Overheads for Computers as Components Measuring energy consumption  Execute a small loop, measure current: while (TRUE) a(); I

© 2000 Morgan Kaufman Overheads for Computers as Components Sources of energy consumption  Relative energy per operation (Catthoor et al):  memory transfer: 33  external I/O: 10  SRAM write: 9  SRAM read: 4.4  multiply: 3.6  add: 1

© 2000 Morgan Kaufman Overheads for Computers as Components Cache behavior is important  Energy consumption has a sweet spot as cache size changes:  cache too small: program thrashes, burning energy on external memory accesses;  cache too large: cache itself burns too much power.

© 2000 Morgan Kaufman Overheads for Computers as Components Optimizing for energy  First-order optimization:  high performance = low energy.  Not many instructions trade speed for energy.

© 2000 Morgan Kaufman Overheads for Computers as Components Optimizing for energy, cont’d.  Use registers efficiently.  Identify and eliminate cache conflicts.  Moderate loop unrolling eliminates some loop overhead instructions.  Eliminate pipeline stalls.  Inlining procedures may help: reduces linkage, but may increase cache thrashing.