ESE532: System-on-a-Chip Architecture

Slides:



Advertisements
Similar presentations
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Advertisements

Instruction Level Parallelism (ILP) Colin Stevens.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 26, 2009 Scheduled Operator Sharing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
1 - CPRE 583 (Reconfigurable Computing): Compute Models Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 7: Wed 10/28/2009 (Compute.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
1 - CPRE 583 (Reconfigurable Computing): System Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 13: Fri 10/8/2010.
1 - CPRE 583 (Reconfigurable Computing): Compute Models Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 20: Wed 11/2/2011 (Compute.
1 - CPRE 583 (Reconfigurable Computing): System Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 21: Fri 11/4/2011.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Computer Organization and Architecture Lecture 1 : Introduction
ESE532: System-on-a-Chip Architecture
These slides are based on the book:
ESE532: System-on-a-Chip Architecture
Overview Parallel Processing Pipelining
18-447: Computer Architecture Lecture 30B: Multiprocessors
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
buses, crossing switch, multistage network.
ESE532: System-on-a-Chip Architecture
Parallel Programming By J. H. Wang May 2, 2017.
ESE534 Computer Organization
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
Embedded Systems Design
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
Assembly Language for Intel-Based Computers, 5th Edition
ESE532: System-on-a-Chip Architecture
课程名 编译原理 Compiling Techniques
ESE532: System-on-a-Chip Architecture
Lecture 5: GPU Compute Architecture
Instructor: Dr. Phillip Jones
Multi-Processing in High Performance Computer Architecture:
Introduction to cosynthesis Rabi Mahapatra CSCE617
Lecture 5: GPU Compute Architecture for the last time
ESE534: Computer Organization
Day 26: November 1, 2013 Synchronous Circuits
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
buses, crossing switch, multistage network.
ESE534: Computer Organization
Mattan Erez The University of Texas at Austin
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
EE 4xx: Computer Architecture and Performance Programming
The Vector-Thread Architecture
ESE534: Computer Organization
Mattan Erez The University of Texas at Austin
ESE535: Electronic Design Automation
Superscalar and VLIW Architectures
ESE534: Computer Organization
ESE535: Electronic Design Automation
ESE534: Computer Organization
CSC Multiprocessor Programming, Spring, 2011
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
Prof. Onur Mutlu Carnegie Mellon University
Presentation transcript:

ESE532: System-on-a-Chip Architecture Day 3: January 23, 2017 Parallelism Overview Penn ESE532 Spring 2017 -- DeHon

Today Parallelism in Tasks Types of Parallelism Compute Models System Architectures Penn ESE532 Spring 2017 -- DeHon

Message Many useful models for parallelism One-size does not fill all Help conceptualize One-size does not fill all But maybe 6—10 do? Match to problem Penn ESE532 Spring 2017 -- DeHon

Preclass 1 How do 6 people collaborate on sphere building? Penn ESE532 Spring 2017 -- DeHon

Preclass 2 How do 12 people collaborate on sphere building? Penn ESE532 Spring 2017 -- DeHon

Preclass 3 How do 6 people collaborate on building 3 spheres? (alternate solution?) Penn ESE532 Spring 2017 -- DeHon

In Class Exercise Distribute 24 piece sets for building Red and Yellow Sphere [if have more than 24 people, have pairs build a different model] Follow instructions from slides to come Penn ESE532 Spring 2017 -- DeHon

Step 1: Build half of L1 Penn ESE532 Spring 2017 -- DeHon

Step 2: Build half of L2 Penn ESE532 Spring 2017 -- DeHon

Step 3: Pass half to builder with 2x2 plate Penn ESE532 Spring 2017 -- DeHon

Step 4: Build L3 Penn ESE532 Spring 2017 -- DeHon

Step 5: Build L5 (ends) (if have pieces) Penn ESE532 Spring 2017 -- DeHon

Step 6: Pass both “L5: ends” to builder with side Penn ESE532 Spring 2017 -- DeHon

Step 7: half of L7 Install one side Penn ESE532 Spring 2017 -- DeHon

Step 8: Pass assembly to builder with unused side Penn ESE532 Spring 2017 -- DeHon

Step 9: finish L7 Penn ESE532 Spring 2017 -- DeHon

Step 10: Pass assemble to builder with unused side Penn ESE532 Spring 2017 -- DeHon

Step 11: add 3rd side Penn ESE532 Spring 2017 -- DeHon

Step 12: Pass assemble to builder with unused side Penn ESE532 Spring 2017 -- DeHon

Step 13: add final side Penn ESE532 Spring 2017 -- DeHon

Finish Check status of all builds Penn ESE532 Spring 2017 -- DeHon

Types of Parallelism Penn ESE532 Spring 2017 -- DeHon

Types of Parallelism What kind of parallelism did we see for steps 1—3? Penn ESE532 Spring 2017 -- DeHon

Types of Parallelism What parallelism when some folks built different model? Penn ESE532 Spring 2017 -- DeHon

Types of Parallelism What could we build independently here? Kind of parallelism? Penn ESE532 Spring 2017 -- DeHon

Type of Parallelism Latency multiply = 1 Latency add = 1 (different Day2) cycle mpy add 1 B,x 2 x,x (Bx)+C 3 A,x2 4 Ax2+(Bx+C) Kind of Parallelism? Penn ESE532 Spring 2017 -- DeHon

Types of Parallelism Data Level – Perform same computation on different data items Thread or Task Level – Perform separable (perhaps heterogeneous) tasks independently Instruction Level – Within a single sequential thread, perform multiple operations on each cycle. Penn ESE532 Spring 2017 -- DeHon

Parallel Compute Models Penn ESE532 Spring 2017 -- DeHon

Sequential Control Flow Model of correctness is sequential execution Examples C (Java, …) FSM / FA Control flow Program is a sequence of operations Operation reads inputs and writes outputs into common store One operation runs at a time defines successor Penn ESE535 Spring 2015 -- DeHon

Parallelism can be explicit Sphere Build example Step 2 Coordinate data parallel operations Multiply, add for quadratic equation Coordinate ILP cycle mpy add 1 B,x 2 x,x (Bx)+C 3 A,x2 4 Ax2+(Bx+C) Penn ESE532 Spring 2017 -- DeHon

Parallelism can be implicit Sequential expression Infer data dependencies T1=x*x T2=A*T1 T3=B*x T4=T2+T3 Y=C+T4 Or Y=A*x*x+B*x+C Penn ESE532 Spring 2017 -- DeHon

Implicit Parallelism d=(x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) What parallelism exists here? Penn ESE532 Spring 2017 -- DeHon

Parallelism can be implicit Sequential expression Infer data dependencies for (i=0;i<100;i++) y[i]=A*x[i]*x[i]+B*x[i]+C Why can these operations be performed in parallel? Penn ESE532 Spring 2017 -- DeHon

Term: Operation Operation – logic computation to be performed Penn ESE535 Spring 2015 -- DeHon

Dataflow / Control Flow Control flow (e.g. C) Program is a sequence of operations Operation reads inputs and writes outputs into common store One operation runs at a time defines successor Dataflow Program is a graph of operations Operation consumes tokens and produces tokens All operations run concurrently Penn ESE535 Spring 2015 -- DeHon

Token Data value with presence indication May be conceptual Only exist in high-level model Not kept around at runtime Or may be physically represented One bit represents presence/absence of data Penn ESE535 Spring 2015 -- DeHon

Token Examples? What are familiar cases where data may come with presence tokens? Network packets Memory references from processor Variable latency depending on cache presence Start bit on serial communication Penn ESE535 Spring 2015 -- DeHon

Operation Takes in one or more inputs Computes on the inputs Produces results Logically self-timed “Fires” only when input set present Signals availability of output Penn ESE535 Spring 2015 -- DeHon

Penn ESE535 Spring 2015 -- DeHon

Dataflow Graph Represents Abstractly computation sub-blocks linkage controlled by data presence Penn ESE535 Spring 2015 -- DeHon

Dataflow Graph Example Penn ESE535 Spring 2015 -- DeHon

Sequential / FSM FSM is degenerate dataflow graph where there is exactly one token S1 x not present? cycle mpy add next S1 B,x x-->S2, else S1 S2 x,x (Bx)+C S3 A,x2 S4 Ax2+(Bx+C) S2 S3 S4 Penn ESE532 Spring 2017 -- DeHon

Sequential / FSM FSM is degenerate dataflow graph where there is exactly one token S1 S2 S3 S4 cycle mpy add next S1 B,x x-->S2, else S1 S2 x,x (Bx)+C S3 A,x2 S4 Ax2+(Bx+C) Penn ESE532 Spring 2017 -- DeHon

Communicating Threads Computation is a collection of sequential/control-flow “threads” Threads may communicate Through dataflow I/O (Through shared variables) View as hybrid or generalization CSP – Communicating Sequential Processes  canonical model example Penn ESE532 Spring 2017 -- DeHon

Video Decode Why might need to synchronize to send to HDMI? Audio Parse Audio Sync to HDMI Video Why might need to synchronize to send to HDMI? Penn ESE532 Spring 2017 -- DeHon

Compute Models Penn ESE532 Spring 2017 -- DeHon

System Architectures Penn ESE532 Spring 2017 -- DeHon

System Architecture Hypothesis There are a small number of useful system architectures These architectures Give guidance for organizing resources Make manageable Allow share lessons between applications Provide basis for scalability Point toward efficient solutions FPT Tutorial: DeHon 2005

Unconstrained Model Multithreaded programming (equivalently Communicating Sequential Processes) Application is collection of threads Communicate with each other May or may not have shared memory Programmer responsible for Synchronization Parallelism Data layout Communications… FPT Tutorial: DeHon 2005

Architectural Restrictions Sequential Control Data Parallel  all parallel processing does the same thing Lock-Step  all parallel processing does different things at synchronized time (e.g. VLIW) Bulk Synchronous  periodic barrier synchronization Instruction Augmentation – control accelerators from seq. instruction stream FPT Tutorial: DeHon 2005

Very Long Instruction Word (VLIW) Penn ESE532 Spring 2017 -- DeHon

Very Long Instruction Word (VLIW) cycle mpy add 1 B,x 2 x,x (Bx)+C 3 A,x2 4 Ax2+(Bx+C) Penn ESE532 Spring 2017 -- DeHon

Instruction Augmentation Co-Processor Penn ESE532 Spring 2017 -- DeHon

Architectural Restrictions (2) Dataflow interactions Allow multithreaded operation Use data presence for synchronization E.g. Pipe-and-filter / Streaming Dataflow Synchronous Dataflow (SDF) FPT Tutorial: DeHon 2005

Producer-Consumer Parallelism Stock predictions encrypt Can run concurrently Just let consumer know when producer sending data Penn ESE535 Spring 2015 -- DeHon

Pipeline Parallelism Can potentially all run in parallel ME DCT VQ code Can potentially all run in parallel Like physical pipeline Useful to think about stream of data between operators Penn ESE535 Spring 2015 -- DeHon

Architectural Restrictions (3) Regular Communication Patterns Systolic Cellular Automata  regular grid of homogeneous FSMs FPT Tutorial: DeHon 2005

Architectural Restrictions (4) Memory/Data Centric Computation is collection of objects in memory Each object triggered by input changes Compute and potentially trigger other objects E.g. Repository models GraphStep App: network flow, routing… FPT Tutorial: DeHon 2005

Work Farm Central controller farms out work Penn ESE532 Spring 2017 -- DeHon

System Architecture Taxonomy Penn ESE532 Spring 2017 -- DeHon

System Architecture Taxonomy Further down the hierarchy More restricted the model More guidance provided More efficient potential implementation More amenable to analysis  tools and optimizations Restrictions provide power FPT Tutorial: DeHon 2005

System Architecture Taxonomy Further down the hierarchy More restricted the model More guidance provided More efficient potential implementation More amenable to analysis  tools and optimizations Restrictions provide power FPT Tutorial: DeHon 2005

Value of Multiple Architectures When you have a big enough hammer, everything looks like a nail. Many stuck on single model Try to make all problems look like their nail Value to diversity / heterogeneity One size does not fit all Penn ESE532 Spring 2017 -- DeHon

System Architecture Hypothesis There are a small number of useful system architectures These architectures Give guidance for organizing resources Make manageable Allow share lessons between applications Provide basis for scalability Point toward efficient solutions FPT Tutorial: DeHon 2005

System Architectures Penn ESE532 Spring 2017 -- DeHon

Model  Architecture not 1:1 Penn ESE532 Spring 2017 -- DeHon

Big Ideas Many parallel compute models Useful System Architectures Sequential, Dataflow, CSP Useful System Architectures Streaming Dataflow, VLIW, co-processor, work farm, SIMD, Vector, CA, FSMD, … Find natural parallelism in problem Mix-and-match Penn ESE532 Spring 2017 -- DeHon

Admin HW1 FAQ Reading for Day 4 on web roundup of problems and solutions Reading for Day 4 on web Talk on Thursday by Ed Lee (UCB) 3pm in Wu and Chen HW2 due Friday Penn ESE532 Spring 2017 -- DeHon