CSCE 313: Embedded Systems Scaling Multiprocessors Instructor: Jason D. Bakos.

Slides:



Advertisements
Similar presentations
Lesson 6D – Loops and String Arrays – split process By John B. Owen All rights reserved ©2011.
Advertisements

DSPs Vs General Purpose Microprocessors
MATH 224 – Discrete Mathematics
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Linear Separators.
CSCE 313: Embedded Systems Performance Counters Instructor: Jason D. Bakos.
Analysis of Algorithms (Chapter 4)
Hardware/software Interfacing. Page 2 Interrupt handling and using internal timer Two way for processor to accept external input: Waiting for input: Processor.
CSCE 313: Embedded Systems Multiprocessor Systems
CS 106 Introduction to Computer Science I 10 / 15 / 2007 Instructor: Michael Eckmann.
Chapter 2: Algorithm Discovery and Design
Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This.
College Algebra Fifth Edition James Stewart Lothar Redlin Saleem Watson.
Mandelbrot Fractals Betsey Davis MathScience Innovation Center.
HONR 300/CMSC 491 Computation, Complexity, and Emergence Mandelbrot & Julia Sets Prof. Marie desJardins February 22, 2012 Based on slides prepared by Nathaniel.
1 GEM2505M Frederick H. Willeboordse Taming Chaos.
Erdem Alpay Ala Nawaiseh. Why Shadows? Real world has shadows More control of the game’s feel  dramatic effects  spooky effects Without shadows the.
Algebra Review. Polynomial Manipulation Combine like terms, multiply, FOIL, factor, etc.
Lab 2: Capturing and Displaying Digital Image
1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.
COMP 175: Computer Graphics March 24, 2015
Scan Conversion Line and Circle
CSCE 313: Embedded Systems Final Project Instructor: Jason D. Bakos.
CS 376 Introduction to Computer Graphics 02 / 12 / 2007 Instructor: Michael Eckmann.
Computer Graphics Bing-Yu Chen National Taiwan University.
ECE291 Computer Engineering II Lecture 9 Josh Potts University of Illinois at Urbana- Champaign.
Today  Table/List operations  Parallel Arrays  Efficiency and Big ‘O’  Searching.
Analysis of Algorithms
Ch 9 Infinity page 1CSC 367 Fractals (9.2) Self similar curves appear identical at every level of detail often created by recursively drawing lines.
CS 638, Fall 2001 Multi-Pass Rendering The pipeline takes one triangle at a time, so only local information, and pre-computed maps, are available Multi-Pass.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
1 2-Hardware Design Basics of Embedded Processors (cont.)
Field Trip #33 Creating and Saving Fractals. Julia Set We consider a complex function, f(z) For each point on the complex plane (x,y), where z = x + iy,
Term 2, 2011 Week 1. CONTENTS Problem-solving methodology Programming and scripting languages – Programming languages Programming languages – Scripting.
An Introduction to Programming with C++ Sixth Edition Chapter 7 The Repetition Structure.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.
1 Perception and VR MONT 104S, Fall 2008 Lecture 21 More Graphics for VR.
CS 325 Introduction to Computer Graphics 03 / 22 / 2010 Instructor: Michael Eckmann.
10/19/04© University of Wisconsin, CS559 Fall 2004 Last Time Clipping –Why we care –Sutherland-Hodgman –Cohen-Sutherland –Intuition for Liang-Barsky Homework.
Today we will learn MATLAB Click Start  All programm  Class Software  Matlab This command window will be seen with a prompt sign >> Any command can.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Real-Time Dynamic Shadow Algorithms Evan Closson CSE 528.
1 Design of an MIMD Multimicroprocessor for DSM A Board Which turns PC into a DSM Node Based on the RM Approach 1 The RM approach is essentially a write-through.
Lecture 11 Text mode video
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 3. Time Complexity Calculations.
Overview of microcomputer structure and operation
Computer Graphics CC416 Lecture 04: Bresenham Line Algorithm & Mid-point circle algorithm Dr. Manal Helal – Fall 2014.
Lab 4 HW/SW Compression and Decompression of Captured Image
HONR 300/CMSC 491 Computation, Complexity, and Emergence
- Introduction - Graphics Pipeline
CSCE 313: Embedded Systems Final Project Notes
Sketch these graphs on your whiteboards
Virtual Memory - Part II
Embedded Systems Design
MATLAB DENC 2533 ECADD LAB 9.
Using a Stack Chapter 6 introduces the stack data type.
A segmentation and tracking algorithm
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
Using a Stack Chapter 6 introduces the stack data type.
#31 He was seized by the fidgets,
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
Using a Stack Chapter 6 introduces the stack data type.
Using a Stack Chapter 6 introduces the stack data type.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
CSC1401 Viewing a picture as a 2D image - 2
CSCE 313: Embedded Systems Final Project
CSCE 313: Embedded Systems Scaling Multiprocessors
CSCE 313: Embedded Systems Performance Analysis and Tuning
Paging Andrew Whitaker CSE451.
Presentation transcript:

CSCE 313: Embedded Systems Scaling Multiprocessors Instructor: Jason D. Bakos

Design Space Exploration In Lab 3 we explored the performance impact of cache size In Lab 4 we explored the performance impact of data-level parallelism for two processors In Lab 5 we will explore the performance impact of scaling up the number of processors beyond two We’ll also implement a new application CSCE 313 2

Processor Scaling In Lab 4 some groups encountered resource limits on the FPGA, maxing out the number of M4K RAMS –This was due to instancing two processors with large caches Our FPGA has 105 M4K RAMs (each holds 512 KB) Each processor requires 2 M4Ks for the register file, 1 for the debugger, and 8 for 1KB/1KB I/D caches My SOPC System, without character buffer, requires ~16 M4Ks without processors –Overhead required for Avalon bus, mailbox memory, video FIFO, etc. I was able to build a four processor system with only 67 M4Ks CSCE 313 3

Resource Usage CSCE 313 4

Fractals CSCE 313 5

Fractals CSCE 313 6

Mandelbrot Set CSCE 313 7

Mandelbrot Set Basic idea: –Any arbitrary complex number c is either in the set or not Recall complex numbers have a real and imaginary part e.g i i = sqrt(-1) –Plot all c’s in the set, set x=Real(c), y=Imag(c) Black represents points in the set Colored points according to how “close” that point was to being in set CSCE 313 8

Mandelbrot Set Definition: –Consider complex polynomial P c (z) = z 2 + c –c is in the set if the sequence: P c (0), P c (P c (0)), P c (P c (P c (0))), P c (P c (P c (P c (0)))), … –…does NOT diverge to infinity –All points in the set are inside radius = 2 around (0,0) CSCE 313 9

Mandelbrot Set How do you square a complex number? –Answer: treat it as polynomial, but keep in mind that i 2 = -1 –Example: (3 + 2i) 2 = (3 + 2i) (3 + 2i) = 9 + 6i + 6i + 4i 2 = i – 4 = i (x + yi) 2 = (x + yi) (x + yi) = x 2 + 2xyi – y 2 = (x 2 - y 2 ) + 2xyi CSCE

Example 1 Is ( i) in the Mandelbrot set? –P ( i) (0) = 0 2 +( i) = i –P ( i) (P ( i) (0)) = ( i) 2 + ( i) = i –P ( i) (P ( i) (P ( i) (0))) = ( i) 2 + ( i) = i –… = i (outside) –… = i (outside) Color should reflect 4 iterations CSCE

Example 2 Is ( i) in the Mandelbrot set? –Iteration 1 => i –Iteration 2 => i –Iteration 3 => i –Iteration 4 => i –Iteration 5 => i –Iteration 6 => i –Iteration 7 => i –Iteration 8 => i –… –Iteration 1000 => –(Looks like it is) CSCE

Mandelbrot Set Goal of next lab: –Use the DE2 board to plot a Mandelbrot fractal over VGA and zoom in as far as possible to “reveal” infinitely repeating structures –Problems to solve: 1.How to discretize a complex space onto a 320x240 discrete pixel display 2.How to determine if a discretized point (pixel) is in the set 3.How to zoom in 4.What happens numerically as we zoom in? 5.How to parallelize the algorithm for multiple processors CSCE

Plotting the Space Keep track of a “zoom” window in complex space using the four double precision variables: –min_x, max_x, min_y, max_y Keep a flattened 240x320 pixel array, as before –Each pixel [col,row] can be mapped to a point in complex space [x,y] using: x = col / 320 x (max_x – min_x) + min_x y = (239-row) / 240 x (max_y – min_y) + min_y Keep track of your zoom origin in complex space, or target point: –target_x, target_y CSCE

Algorithm For each pixel (row,col): –Check to see if the corresponding point in complex space is in Modelbrot set: Transform (row,col) into c = (x0,y0) initialize z = 0 => x=0, y=0 // recall that series begins with P c (0) set iteration = 0 while ((x*x + y*y) <= 4) and (iteration < 500) // while (x,y) is inside radius=2 (otherwise we know the series has diverged) –xtemp = x*x – y*y + x0 –y = 2*x*y + y0 –x = xtemp –iteration++ if iteration == 500 then color=black, else color=(some function of iteration) CSCE

Zooming Goal: –We want to zoom in to show the details on the fractal –Problem: on which point to we zoom? –Set the initial frame to encompass: -2.5 <= x <= 1 -1 <= y <= 1 (This is the typical window from which the Mandelbrot fractal is shown) –During the first frame rendering, find the first pixel that has greater than 450 iterations Set this only once! –This will identify a colorful and featureful area –Set this point (in complex space) as your target_x and target_y CSCE

Zooming To zoom in: –min_x=target_x – 1/(1.5^zoom) –max_x=target_x + 1/(1.5^zoom) –min_y=target_y -.75/(1.5^zoom) –max_y=target_y +.75/(1.5^zoom) In the outer loop, increment zoom from 1 to 100 (or more) Fractals are deliberately made colorful, but the way you set the colors is arbitrary –Here’s one sample technique: –color [R,G,B] = [iteration*8/zoom, iteration*4/zoom, iteration*2/zoom] –This creates a yellowish brown hue that dampens as you zoom in –Make sure you saturate the colors CSCE

Numerical Precision You’ll notice as you zoom in that picture definition quickly degrades This is because double precision values have a precision of and zooming in at a quadratic rate reaches this quickly –In other words, the difference in the complex space between pixels approaches this value –Note: = 2.2 x Inter-pixel distance from 0 to 200 iterations using specified zoom CSCE

Parallelizing The frame rate will depend on how many of the pixels in the frame are in the Mandelbrot set, since these pixels are expensive (requires 1000 loop iterations each) –Lighter-colored pixels are also expensive, though less so To speed things up, use multiple processors Use the data parallel approach and write to the frame buffer in line CSCE

Notes Make sure you enable hardware floating point and hardware multiply on each processor Use 1KB instruction and data cache per processor Implement on one processor first You don’t need the key inputs, LCD, LEDs, or Flash memory, so feel free to delete their interfaces For recording performance, measure the average time for rendering each frame (including sending the pixels to the pixel buffer) You’ll need math.h for exponentiation, but only use it for the zooming You’ll also need four mailboxes to implement your barriers CSCE