Outline Announcements: –HW III due Friday! –HW II returned soon Software performance Architecture & performance Measuring performance.

Slides:

Advertisements

Similar presentations

Code Tuning Strategies and Techniques

Advertisements

Chapt.2 Machine Architecture Impact of languages –Support – faster, more secure Primitive Operations –e.g. nested subroutine calls »Subroutines implemented.

Announcements You survived midterm 2! No Class / No Office hours Friday.

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.

Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.

Garfield AP Computer Science

Memory Operation and Performance To understand the memory architecture so that you could write programs that could take the advantages and make the programs.

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.

Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.

MEMORY ORGANIZATION Memory Hierarchy Main Memory Auxiliary Memory

CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis Tyler Robison Summer

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

Memory Organization.

1 Algorithms and Analysis CS 2308 Foundations of CS II.

Elementary Data Structures and Algorithms

Abstract Data Types (ADTs) Data Structures The Java Collections API

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.

CPSC 171 Introduction to Computer Science 3 Levels of Understanding Algorithms More Algorithm Discovery and Design.

Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,

Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

Chapter 11- Recursion. Overview n What is recursion? n Basics of a recursive function. n Understanding recursive functions. n Other details of recursion.

Searching. RHS – SOC 2 Searching A magic trick: –Let a person secretly choose a random number between 1 and 1000 –Announce that you can guess the number.

CS 162 Intro to Programming II Searching 1. Data is stored in various structures – Typically it is organized on the type of data – Optimized for retrieval.

Peter Andreae Computer Science Victoria University of Wellington Copyright: Peter Andreae, Victoria University of Wellington ArraySet and Binary Search.

IT253: Computer Organization

Configurable, reconfigurable, and run-time reconfigurable computing.

Performance Optimization Getting your programs to run faster CS 691.

CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer

Week 12 - Wednesday.  What did we talk about last time?  Asymptotic notation.

CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.

CSC 211 Data Structures Lecture 13

Recitation 7: 10/21/02 Outline Program Optimization –Machine Independent –Machine Dependent Loop Unrolling Blocking Annie Luo

Java Programming: Guided Learning with Early Objects Chapter 11 Recursion.

Algorithm Analysis CS 400/600 – Data Structures. Algorithm Analysis2 Abstract Data Types Abstract Data Type (ADT): a definition for a data type solely.

CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.

Performance Optimization Getting your programs to run faster.

CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.

Big Java by Cay Horstmann Copyright © 2009 by John Wiley & Sons. All rights reserved. Selection Sort Sorts an array by repeatedly finding the smallest.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Georgia Institute of Technology Speed part 4 Barb Ericson Georgia Institute of Technology May 2006.

Optimization of C Code The C for Speed

Searching and Sorting Searching: Sequential, Binary Sorting: Selection, Insertion, Shell.

1 Performance Issues CIS*2450 Advanced Programming Concepts.

Searching Topics Sequential Search Binary Search.

Improving performance Matlab’s a pig. Outline Announcements: –Homework I: Solutions on web –Homework II: on web--due Wed. Homework I Performance Issues.

1 Chapter 2 Algorithm Analysis All sections. 2 Complexity Analysis Measures efficiency (time and memory) of algorithms and programs –Can be used for the.

Outline Announcements: –HW II Idue Friday! Validating Model Problem Software performance Measuring performance Improving performance.

Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 3. Time Complexity Calculations.

1 Chapter 2 Algorithm Analysis Reading: Chapter 2.

Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.

Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Algorithm efficiency Prudence Wong

July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.

Compilers: History and Context COMP Outline Compilers and languages Compilers and architectures – parallelism – memory hierarchies Other uses.

Chapter 2 Algorithm Analysis

Modeling Big Data Execution speed limited by: Model complexity

COMP108 Algorithmic Foundations Algorithm efficiency

COMP 53 – Week Seven Big O Sorting.

Sorting by Tammy Bailey

Algorithm Analysis CSE 2011 Winter September 2018.

Searching CSCE 121 J. Michael Moore.

Data Structures and Algorithms

Algorithm design and Analysis

Performance Optimization for Embedded Software

Big O Notation.

CS 201 Fundamental Structures of Computer Science

Analysis of Algorithms

Presentation transcript:

Outline Announcements: –HW III due Friday! –HW II returned soon Software performance Architecture & performance Measuring performance

Software Performance Factors influencing speed –Hardware Clock speed, memory size/speed, bus speed –Algorithmic scaling What happens as n gets big? –Interaction between algorithm and hardware Compiler optimizations vs. hand tuning

Algorithm Performance There are often several algorithms for solving the same problem, each with its own performance characteristics Much of computer science is concerned with finding the fastest algorithm for a given problem Typically, we’re interested in how an algorithm scales –How it performs as the problem size increases –To determine this, we need an accounting system (or model) of how a program runs –The simplest model assumes all commands take the same amount of time, so we just need to count commands –We could use more complicated models that weight commands

Algorithmic Performance of Linear Search Linear Search –Inputs=integer array x of length n, value k –Output: j s.t. x[j]==k, or j=-9 if k not in x int LinSearch(int x[], int n, int k){ j=0; while(x[j] != k & j<n){ j=j+1; } if(j==n){ return(-9); }else{ return(j); }

Algorithmic Performance of Binary Search Binary Search –Inputs=SORTED integer array x of length n, value k, integers st >=0 and en<=n –Output: j s.t. x[j]==k, or j=-9 if k not in x[st:en] int BinSearch(int x[], int st, int en, int k){ int mid, ans; mid=(en-st)/2+mod(en-st,2); //middle of array if(vec[st+mid] == k){ ans=st+mid; } else { if(mid<1){ ans=(-9); } else { if(vec[st+mid] < k) { ans=BinSearch(vec, (st+mid+1), en, k);//Search on right } else { ans=BinSearch(vec, st, (st+mid-1), k);//Search on left }}} return(ans); }

Comparing Linear and Binary Search Linear Search –max n iterations through loop –will work on any array Binary Search –max log 2 (n) recursions (log 2 (n)<n) –faster if x is already sorted –but sorting takes approx. n 2 steps So, if you have to perform < n searches, use linear search If you have to perform >n searches, sort first, then use BinSearch

Interaction between algorithm and hardware There are several ways to implement the same algorithm Their performance could be very different, if they get compiled in different ways. The differences are due to interactions with the memory hierarchy

Computer Architecture CPU Control Unit Arith./Logic Unit Registers L1 cache L2 cache Memory (RAM) Disk Bus Von Neumann memory hierarchy –Fast > slow –$$$ > cheap –Register L1 L2 RAM Disk

Architecture and Performance Avoid using the disk! –Minimize reading and writing –Buy more RAM so you don’t use virtual memory Buy a system with lots of fast cache Buy a system with lots of RAM Then, if you have money left, buy a fast chip

Avoid big jumps in memory –Data is moved from RAM to cache in chunks –Accessing arrays sequentially maximizes cache-reuse Precomputing helps reuse as well –Special implication for 2 (or higher) D arrays Rules of Thumb Row Major: C, C++, Java, Matlab for(j=0;j<2;j++){ for(k=0;k<3;k++){ A[j][k]=… } Column Major: FORTRAN do k=1,3 do j=1,2 A(j,k)=… enddo A=

Rules of thumb Minimize computation (precomputing) –Compute an equation once, save its value and reuse Minimize division –Processors have add & multiply units, but no divide –Division is slow –0.5*x rather than x/2 Minimize function/subroutine calls (inlining) –There is overhead associated with calling functions –This goes against good programming which encourages modularity

RAD1D example RAD1D must create an array sigma where sigma=k*dt/dx 2 Naïve implementation –for(j=0;j<m;j++){ sigma[j]=k[kJ][j]*dt/pow(dx,2); } Replace call to pow –for(j=0;j<m;j++){ sigma[j]=k[kJ][j]*dt/(dx*dx); } Precompute dt/dx 2 overdx2=1/dx; overdx2=dt*overdx2*overdx2;// dt/dx^2 for(j=0;j<m;j++){sigma[j]=k[kJ][j]*overdx2;}

Improving performance To improve algorithmic performance –Take more CS! To improve interaction with hardware –Check out compiler optimizations –Then, start hand-tuning the code

Compiler Optimization A good compiler can take care of lots of things automatically –some precomputing –some inlining (for small functions) –other things like loop unrolling: for(j=0;j<n;j++){ a=a+b[j]*c[j] } for(j=0;j<n;j=j+4){ a=a+b[j]*c[j] a2=a2+b[j+1]*c[j+1] a3=a3+b[j+2]*c[j+2] a4=a4+b[j+3]*c[j+3] } a=a+a2+a3+a4 Sums are independent and can be computed at same time

Measuring Performance Before we start messing with working code, we should identify where code is slow This is called profiling –Goal is to identify bottlenecks--places in the code that limit performance We can use profiling tools like prof (gprof) or insert timing calls Important to check performance on problems of different sizes

My Advice Before you do anything, make sure your code works well-tuned incorrect code is still incorrect It is better to solve your problem slowly than not at all! Look for algorithmic improvements Call numerical libraries rather than writing your own –Often highly optimized –Take CIS 404 Try compiler options Read your compiler’s manual to learn about what they do Last, try hand tuning slow sections