Outline Announcements: –HW II Idue Friday! Validating Model Problem Software performance Measuring performance Improving performance.

Slides:



Advertisements
Similar presentations
Code Tuning Strategies and Techniques
Advertisements

Announcements You survived midterm 2! No Class / No Office hours Friday.
Introduction to Computer Science Theory
Garfield AP Computer Science
Memory Operation and Performance To understand the memory architecture so that you could write programs that could take the advantages and make the programs.
Algorithms (continued)
Lecture3: Algorithm Analysis Bohyung Han CSE, POSTECH CSED233: Data Structures (2014F)
Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.
CSE332: Data Abstractions Lecture 2: Math Review; Algorithm Analysis Tyler Robison Summer
CS 206 Introduction to Computer Science II 10 / 14 / 2009 Instructor: Michael Eckmann.
Memory Organization.
1 Algorithm Efficiency, Big O Notation, and Role of Data Structures/ADTs Algorithm Efficiency Big O Notation Role of Data Structures Abstract Data Types.
Algorithm Analysis CS 201 Fundamental Structures of Computer Science.
Analysis of Algorithm.
Elementary Data Structures and Algorithms
CHAPTER 10 Recursion. 2 Recursive Thinking Recursion is a programming technique in which a method can call itself to solve a problem A recursive definition.
Abstract Data Types (ADTs) Data Structures The Java Collections API
Lecture 2 MAS 714 Hartmut Klauck
Chapter 5 Algorithm Analysis 1CSCI 3333 Data Structures.
SIGCSE Tradeoffs, intuition analysis, understanding big-Oh aka O-notation Owen Astrachan
© Janice Regan, CMPT 128, Feb CMPT 128: Introduction to Computing Science for Engineering Students Running Time Big O Notation.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Array operations II manipulating arrays and measuring performance.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Chapter 11- Recursion. Overview n What is recursion? n Basics of a recursive function. n Understanding recursive functions. n Other details of recursion.
CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.
L. Grewe.  An array ◦ stores several elements of the same type ◦ can be thought of as a list of elements: int a[8]
Peter Andreae Computer Science Victoria University of Wellington Copyright: Peter Andreae, Victoria University of Wellington ArraySet and Binary Search.
IT253: Computer Organization
Configurable, reconfigurable, and run-time reconfigurable computing.
Performance Optimization Getting your programs to run faster CS 691.
CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer
Week 12 - Wednesday.  What did we talk about last time?  Asymptotic notation.
CSC 211 Data Structures Lecture 13
Algorithm Analysis CS 400/600 – Data Structures. Algorithm Analysis2 Abstract Data Types Abstract Data Type (ADT): a definition for a data type solely.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Performance Optimization Getting your programs to run faster.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
The Nature of Computing INEL 4206 – Microprocessors Lecture 3 Bienvenido Vélez Ph. D. School of Engineering University of Puerto Rico - Mayagüez.
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
Outline Announcements: –HW III due Friday! –HW II returned soon Software performance Architecture & performance Measuring performance.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Optimization of C Code The C for Speed
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Searching and Sorting Searching: Sequential, Binary Sorting: Selection, Insertion, Shell.
Algorithm Analysis. What is an algorithm ? A clearly specifiable set of instructions –to solve a problem Given a problem –decide that the algorithm is.
Searching Topics Sequential Search Binary Search.
1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.
Improving performance Matlab’s a pig. Outline Announcements: –Homework I: Solutions on web –Homework II: on web--due Wed. Homework I Performance Issues.
1 Chapter 2 Algorithm Analysis All sections. 2 Complexity Analysis Measures efficiency (time and memory) of algorithms and programs –Can be used for the.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 3. Time Complexity Calculations.
1 Chapter 2 Algorithm Analysis Reading: Chapter 2.
1 Algorithms Searching and Sorting Algorithm Efficiency.
Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Algorithm efficiency Prudence Wong.
Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Algorithm efficiency Prudence Wong
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
Chapter 2 Algorithm Analysis
Modeling Big Data Execution speed limited by: Model complexity
COMP108 Algorithmic Foundations Algorithm efficiency
COMP 53 – Week Seven Big O Sorting.
Introduction to Algorithms
Sorting by Tammy Bailey
Data Structures and Algorithms
Algorithm design and Analysis
CS 201 Fundamental Structures of Computer Science
Programming and Data Structure
Presentation transcript:

Outline Announcements: –HW II Idue Friday! Validating Model Problem Software performance Measuring performance Improving performance

Validating Model Problem Our solution C is only an approximation of the true solution The accuracy of the approximation will depend on –dt –dx –“smoothness” of the initial conditions Two things to watch for: –lambda 1 –decreasing dx will make solutions more accurate Try at coarse (m=10) and fine resolution (m=100)

Software Performance Factors influencing speed –Hardware Clock speed, memory size/speed, bus speed –Algorithmic scaling What happens as n gets big? –Interaction between algorithm and hardware Compiler optimizations vs. hand tuning

Computer Architecture CPU Control Unit Arith./Logic Unit Registers L1 cache L2 cache Memory (RAM) Disk Bus Von Neumann described a memory hierarchy –Fast > slow –$$$ > cheap –Register L1 L2 RAM Disk

Architecture and Performance Avoid using the disk! –Minimize reading and writing –Buy more RAM so you don’t use virtual memory Buy a system with lots of fast cache Buy a system with lots of RAM Then, if you have money left, buy a fast chip

Algorithm Performance There are often several algorithms for solving the same problem, each with its own performance characteristics Much of computer science is concerned with finding the fastest algorithm for a given problem Typically, we’re interested in how an algorithm scales –How it performs as the problem size increases –To determine this, we need an accounting system (or model) of how a program runs –The simplest model assumes all commands take the same amount of time, so we just need to count commands –We could use more complicated models that weight commands

Algorithmic Performance of Linear Search Linear Search –Inputs=integer array x of length n, value k –Output: j s.t. x[j]==k, or j=-9 if k not in x int LinSearch(int x[], int n, int k){ j=0; while(x[j] != k & j<n){ j=j+1; } if(j==n){ return(-9); }else{ return(j); }

Algorithmic Performance of Binary Search Binary Search –Inputs=SORTED integer array x of length n, value k, integers st >=0 and en<=n –Output: j s.t. x[j]==k, or j=-9 if k not in x[st:en] int BinSearch(int x[], int st, int en, int k){ int mid, ans; mid=(en-st)/2+mod(en-st,2); //middle of array if(vec[st+mid] == k){ ans=st+mid; } else { if(mid<1){ ans=(-1); } else { if(vec[st+mid] < k) { ans=BinSearch(vec, (st+mid+1), en, k);//Search on right } else { ans=BinSearch(vec, st, (st+mid-1), k);//Search on left }}} return(ans); }

Comparing Linear and Binary Search Linear Search –max n iterations through loop –will work on any array Binary Search –max log 2 (n) recursions (log 2 (n)<n) –faster if x is already sorted –but sorting takes approx. n 2 steps So, if you have to perform < n searches, use linear search If you have to perform >n searches, sort first, then use BinSearch

Interaction between algorithm and hardware There are several ways to implement the same algorithm Their performance could be very different, if they get compiled in different ways. The differences are due to interactions with the memory hierarchy

Rules of thumb Minimize computation (precomputing) s=sqrt(b*b-2*a*c) x1=(-b+s)/(2*a);x2=(-b-s)/(2*a); –better than x1=(-b+ sqrt(b*b-2*a*c))/(2*a); etc. Minimize division overdx2=1/dx; overdx2=overdx2*overdx2;//1/dx^2 for(j=0;j<m;j++){sigma[j]=k[kJ][j]*dt*overdx2;} Minimize function/subroutine calls (inlining) –There is overhead associated with calling functions –This goes against good programming which encourages modularity

Avoid big jumps in memory –Data is moved from RAM to cache in chunks –Accessing arrays sequentially maximizes cache-reuse –Special implication for 2 (or higher) D arrays Memory is sequential: Rules of Thumb A00A01A02 A10A11A12 A00 A01 A02 A10 A11 A12 A00 A10 A01 A11 A02 A12 Row Major: C, C++, Java, Matlab for(j=0;j<2;j++){ for(k=0;k<3;k++){ A[j][k]=… } Column Major: FORTRAN do k=1,3 do j=1,2 A(j,k)=… enddo

Improving performance To improve algorithmic performance –Take more CS! To improve interaction with hardware –Check out compiler optimizations –Then, start hand-tuning the code

Compiler Optimization A good compiler can take care of lots of things automatically –some precomputing –some inlining (for small functions) –other things like loop unrolling: for(j=0;j<100;j++){ for(k=0;k<20;k++){ A[j][k]=… } for(j=0;j<100;j++){ for(k=0;k<20;k+=4){ A[j][k]=… A[j][k+1]=… A[j][k+2]=… A[j][k+3]=… }

Measuring Performance Before we start messing with working code, we should identify where code is slow This is called profiling –Goal is to identify bottlenecks--places in the code that limit performance We can use profiling tools like prof (gprof) or insert timing calls Important to check performance on problems of different sizes

My Advice Before you do anything, make sure your code works well-tuned incorrect code is still incorrect It is better to solve your problem slowly than not at all! Look for algorithmic improvements Try compiler options Read your compiler’s manual to learn about what they do Last but not least, try hand tuning