StreamIt: A Language for Streaming Applications

Slides:



Advertisements
Similar presentations
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Advertisements

Bottleneck Elimination from Stream Graphs S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz.
A Dataflow Programming Language and Its Compiler for Streaming Systems
The Raw Architecture Signal Processing on a Scalable Composable Computation Fabric David Wentzlaff, Michael Taylor, Jason Kim, Jason Miller, Fae Ghodrat,
An Empirical Characterization of Stream Programs and its Implications for Language and Compiler Design Bill Thies 1 and Saman Amarasinghe 2 1 Microsoft.
DATAFLOW PROCESS NETWORKS Edward A. Lee Thomas M. Parks.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.
SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.
A Systolic FFT Architecture for Real Time FPGA Systems.
Phased Scheduling of Stream Programs Michal Karczmarek, William Thies and Saman Amarasinghe MIT LCS.
University of Michigan Electrical Engineering and Computer Science MacroSS: Macro-SIMDization of Streaming Applications Amir Hormati*, Yoonseo Choi ‡,
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Static Translation of Stream Programming to a Parallel System S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz Programming Language Group School.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
Models of Computation for Embedded System Design Alvise Bonivento.
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
Static Translation of Stream Programming to a Parallel System S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz Programming Language Group School.
Optimus: Efficient Realization of Streaming Applications on FPGAs University of Michigan: Amir Hormati, Manjunath Kudlur, Scott Mahlke IBM Research: David.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
1 Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs Michael Gordon, William Thies, and Saman Amarasinghe Massachusetts.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Voicu Groza, 2008 SITE, HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Hardware/Software Codesign of Embedded Systems Voicu Groza SITE Hall, Room.
Orchestration by Approximation Mapping Stream Programs onto Multicore Architectures S. M. Farhad (University of Sydney) Joint work with Yousun Ko Bernd.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Communication Overhead Estimation on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz.
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
A Reconfigurable Architecture for Load-Balanced Rendering Graphics Hardware July 31, 2005, Los Angeles, CA Jiawen Chen Michael I. Gordon William Thies.
1 Optimizing Stream Programs Using Linear State Space Analysis Sitij Agrawal 1,2, William Thies 1, and Saman Amarasinghe 1 1 Massachusetts Institute of.
StreamIt: A Language for Streaming Applications William Thies, Michal Karczmarek, Michael Gordon, David Maze, Jasper Lin, Ali Meli, Andrew Lamb, Chris.
Profile Guided Deployment of Stream Programs on Multicores S. M. Farhad The University of Sydney Joint work with Yousun Ko Bernd Burgstaller Bernhard Scholz.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Orchestration by Approximation Mapping Stream Programs onto Multicore Architectures S. M. Farhad (University of Sydney) Joint work with Yousun Ko Bernd.
A Common Machine Language for Communication-Exposed Architectures Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory.
Michael I. Gordon, William Thies, and Saman Amarasinghe
Michael Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman Amarasinghe.
Flexible Filters for High Performance Embedded Computing Rebecca Collins and Luca Carloni Department of Computer Science Columbia University.
StreamIt on Raw StreamIt Group: Michael Gordon, William Thies, Michal Karczmarek, David Maze, Jasper Lin, Jeremy Wong, Andrew Lamb, Ali S. Meli, Chris.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
A Compiler Infrastructure for Stream Programs Bill Thies Joint work with Michael Gordon, Michal Karczmarek, Jasper Lin, Andrew Lamb, David Maze, Rodric.
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
CS 5991 Presentation Ptolemy: A Framework For Simulating and Prototyping Heterogeneous Systems.
Linear Analysis and Optimization of Stream Programs Masterworks Presentation Andrew A. Lamb 4/30/2003 Professor Saman Amarasinghe MIT Laboratory for Computer.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
A Parallel Communication Infrastructure for STAPL
Efficient Evaluation of XQuery over Streaming Data
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Static Memory Management for Efficient Mobile Sensing Applications
Self Healing and Dynamic Construction Framework:
Conception of parallel algorithms
SOFTWARE DESIGN AND ARCHITECTURE
Parallel Programming By J. H. Wang May 2, 2017.
A Common Machine Language for Communication-Exposed Architectures
Linear Filters in StreamIt
Embedded Systems Design
Programming Models for SimMillennium
Optimization Code Optimization ©SoftMoore Consulting.
Parallel Algorithm Design
Teleport Messaging for Distributed Stream Programs
Introduction to cosynthesis Rabi Mahapatra CSCE617
Cache Aware Optimization of Stream Programs
High Performance Stream Processing for Mobile Sensing Applications
StreamIt: High-Level Stream Programming on Raw
Chapter 20 Object-Oriented Analysis and Design
Research Opportunities in IP Wide Area Storage
The Vector-Thread Architecture
Mattan Erez The University of Texas at Austin
Prof. Onur Mutlu Carnegie Mellon University
Presentation transcript:

StreamIt: A Language for Streaming Applications William Thies, Michal Karczmarek, Michael Gordon, David Maze, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger and Saman Amarasinghe MIT Laboratory for Computer Science New England Programming Languages and Systems Symposium August 7, 2002

Streaming Application Domain Based on streams of data Increasingly prevalent and important Embedded systems Cell phones, handheld computers, DSP’s Desktop applications Streaming media – Real-time encryption Software radio - Graphics packages High-performance servers Software routers Cell phone base stations HDTV editing consoles

Synchronous Dataflow (SDF) Application is a graph of nodes Nodes send/receive items over channels Nodes have static I/O rates Can construct a static schedule

Prototyping Streaming Apps. Modeling Environments: Ptolemy (UC Berkeley) COSSAP (Synopsys) SPW (Cadence) ADS (Hewlett Packard) DSP Station (Mentor Graphics)

Programming Streaming Apps. Compiler- Conscious Language Design C / C++ / Assembly Performance Synchronous Dataflow - LUSTRE - SIGNAL - Silage - Lucid Programmability

The StreamIt Language Also a synchronous dataflow language Goals: With a few extra features Goals: High performance Improved programmer productivity Language Contributions: Structured model of streams Messaging system for control Automatic program morphing ENABLES Compiler Analysis & Optimization

Outline Design of StreamIt Structured Streams Messaging Morphing Results Conclusions

Outline Design of StreamIt Structured Streams Messaging Morphing Results Conclusions

Representing Streams Conventional wisdom: streams are graphs Graphs have no simple textual representation Graphs are difficult to analyze and optimize

Representing Streams Conventional wisdom: streams are graphs Graphs have no simple textual representation Graphs are difficult to analyze and optimize Insight: stream programs have structure unstructured structured

Structured Streams Hierarchical structures: Pipeline SplitJoin Feedback Loop Basic programmable unit: Filter

Structured Streams Hierarchical structures: Pipeline SplitJoin Feedback Loop Basic programmable unit: Filter Splits / Joins are compiler-defined

Representing Filters Autonomous unit of computation No access to global resources Communicates through FIFO channels - pop() - peek(index) - push(value) Peek / pop / push rates must be constant Looks like a Java class, with An initialization function A steady-state “work” function Message handler functions

Filter Example: LowPassFilter float->float filter LowPassFilter (float N) { float[N] weights; init { weights = calcWeights(N); } work push 1 pop 1 peek N { float result = 0; for (int i=0; i<weights.length; i++) { result += weights[i] * peek(i); push(result); pop();

Filter Example: LowPassFilter float->float filter LowPassFilter (float N) { float[N] weights; init { weights = calcWeights(N); } work push 1 pop 1 peek N { float result = 0; for (int i=0; i<weights.length; i++) { result += weights[i] * peek(i); push(result); pop(); N

Filter Example: LowPassFilter float->float filter LowPassFilter (float N) { float[N] weights; init { weights = calcWeights(N); } work push 1 pop 1 peek N { float result = 0; for (int i=0; i<weights.length; i++) { result += weights[i] * peek(i); push(result); pop(); N

Filter Example: LowPassFilter float->float filter LowPassFilter (float N) { float[N] weights; init { weights = calcWeights(N); } work push 1 pop 1 peek N { float result = 0; for (int i=0; i<weights.length; i++) { result += weights[i] * peek(i); push(result); pop(); N

Filter Example: LowPassFilter float->float filter LowPassFilter (float N) { float[N] weights; init { weights = calcWeights(N); } work push 1 pop 1 peek N { float result = 0; for (int i=0; i<weights.length; i++) { result += weights[i] * peek(i); push(result); pop(); N

Pipeline Example: FM Radio pipeline FMRadio { add DataSource(); add LowPassFilter(); add FMDemodulator(); add Equalizer(8); add Speaker(); } DataSource LowPassFilter FMDemodulator Equalizer Speaker

Pipeline Example: FM Radio pipeline FMRadio { add DataSource(); add LowPassFilter(); add FMDemodulator(); add Equalizer(8); add Speaker(); } DataSource LowPassFilter FMDemodulator Equalizer Speaker

SplitJoin Example: Equalizer pipeline Equalizer (int N) { add splitjoin { split duplicate; float freq = 10000; for (int i = 0; i < N; i ++, freq*=2) { add BandPassFilter(freq, 2*freq); } join roundrobin; add Adder(N); duplicate BPF BPF BPF roundrobin (1) Adder

Why Structured Streams? Compare to structured control flow Tradeoff: PRO: - more robust - more analyzable CON: - “restricted” style of programming GOTO statements If / else / for statements

Structure Helps Programmers Modules are hierarchical and composable Each structure is single-input, single-output Encapsulates common idioms Good textual representation Enables parameterizable graphs

N-Element Merge Sort (3-level)

N-Element Merge Sort (K-level) pipeline MergeSort (int N, int K) { if (K==1) { add Sort(N); } else { add splitjoin { split roundrobin; add MergeSort(N/2, K-1); joiner roundrobin; } add Merge(N);

Structure Helps Compilers Enables local, hierarchical analyses Scheduling Optimization Parallelization Load balancing

Structure Helps Compilers Enables local, hierarchical analyses Scheduling Optimization Parallelization Load balancing Examples: … Pipeline Fusion … SplitJoin Fusion Pipeline Fission SplitJoin Fission

Structure Helps Compilers Enables local, hierarchical analyses Scheduling Optimization Parallelization Load balancing Examples: … Filter Hoisting

Structure Helps Compilers Enables local, hierarchical analyses Scheduling Optimization Parallelization Load balancing Disallows non-sensical graphs Simplifies separate compilation All blocks single-input, single-output

CON: Restricts Coding Style Some graphs need to be re-arranged Example: FFT roundrobin (2) push(pop()) push(pop() * w[…]) duplicate push(pop() + pop()) push(pop() – pop()) roundrobin (1) Bit-reverse order Butterfly (2 way) Butterfly (4 way) Butterfly (8 way)

Outline Design of StreamIt Structured Streams Messaging Morphing Results Conclusions

Control Messages Structures for regular, high-bandwidth data But also need a control mechanism for irregular, low-bandwidth events Change volume on a cell phone Initiate handoff of stream Adjust network protocol

Supporting Control Messages Option 1: Embed message in stream PRO: - message arrives with data CON: - complicates filter code - complicates structure - runtime overhead Option 1: Embed message in stream PRO: - method arrives with data CON: - complicates filter code - complicates structure Option 2: Synchronous method call PRO: - delivery transparent to user CON: - timing is unclear - limits parallelism

StreamIt Messaging System Looks like method call, but semantics differ No return value Asynchronous delivery Can broadcast to multiple targets void raiseVolume(int v) { myVolume += v; }

StreamIt Messaging System Looks like method call, but semantics differ Timed relative to data User gains precision; compiler gains flexibility Looks like method call, but semantics differ No return value Asynchronous delivery Can broadcast to multiple targets TargetFilter x; work { … if (lowVolume()) x.raiseVolume(10) at 100; }

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A B

Message Timing A sends message to B with zero latency A Distance between wavefronts might have changed B

Message Timing A sends message to B with zero latency A B

General Message Timing Latency of N means: Message attached to wavefront that sender sees in N executions A B

General Message Timing Latency of N means: Message attached to wavefront that sender sees in N executions Examples: A  B, latency 1 A B

General Message Timing Latency of N means: Message attached to wavefront that sender sees in N executions Examples: A  B, latency 1 A B

General Message Timing Latency of N means: Message attached to wavefront that sender sees in N executions Examples: A  B, latency 1 B  A, latency 25 A B

General Message Timing Latency of N means: Message attached to wavefront that sender sees in N executions Examples: A  B, latency 1 B  A, latency 25 A B

Rationale Better for the programmer Better for the compiler Simplicity of method call Precision of embedding in stream Better for the compiler Program is easier to analyze No code for timing / embedding No control channels in stream graph Can reorder filter firings, respecting constraints Implement in most efficient way

Outline Design of StreamIt Structured Streams Messaging Morphing Results Conclusions

Dynamic Changes to Stream Stream structure needs to change Examples Switch radio from AM to FM Change from Bluetooth to 802.11 Program “Morphing”

Dynamic Changes to Stream Stream structure needs to change Challenges for programmer: Synchronizing the beginning, end of morphing Preserving live data in the system Efficiency

Morphing in StreamIt Send message to “init” to morph a structure pipeline Equalizer (int N) { … }

Morphing in StreamIt Send message to “init” to morph a structure pipeline Equalizer (int N) { … } myEqualizer.init (6);

Morphing in StreamIt Send message to “init” to morph a structure When message arrives, structure is replaced Live data is automatically drained pipeline Equalizer (int N) { … } myEqualizer.init (6);

Rationale Programmer writes “init” only once No need for complicated transitions Compiler optimizes each phase separately Benefits from anticipation of phase changes

Outline Design of StreamIt Structured Streams Messaging Morphing Results Conclusions

Implementation Basic StreamIt implementation complete Backends: Uniprocessor Raw: A tiled architecture with fine-grained, programmable communication Extended KOPI, open-source Java compiler

Results Developed applications in StreamIt GSM Decoder - FFT FM Radio - 3GPP Channel Decoder Radar - Bitonic Sort Load-balancing transformations improve performance on RAW … Vertical Fusion … Horizontal Fusion Vertical Fission Horizontal Fission

Example: Radar App. (Original) Splitter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter Joiner Splitter Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Joiner

Example: Radar App. (Original)

Example: Radar App. (Original) Splitter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter Joiner Splitter Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Joiner

Example: Radar App. (Original) Splitter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter Joiner Splitter Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Joiner

Example: Radar App. Joiner Joiner Splitter Splitter Detector FirFilter Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Joiner

Example: Radar App. Joiner Joiner Splitter Splitter Detector FirFilter Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Joiner

Example: Radar App. Joiner Joiner Splitter Splitter Detector FirFilter Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Detector Magnitude FirFilter Vector Mult Vector Mult FirFilter Magnitude Detector Joiner

Example: Radar App. Joiner Joiner Splitter Splitter FIRFilter Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Joiner

Example: Radar App. Joiner Joiner Splitter Splitter FIRFilter Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Joiner

Example: Radar App. Joiner Joiner Splitter Splitter FIRFilter Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Joiner

Example: Radar App. Joiner Joiner Splitter Splitter FIRFilter Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Joiner

Example: Radar App. Joiner Joiner Splitter Splitter FIRFilter Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Joiner

Example: Radar App. Joiner Joiner Splitter Splitter FIRFilter Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Joiner

Example: Radar App. Joiner Joiner Splitter Splitter FIRFilter Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Joiner

Example: Radar App. (Balanced) Splitter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter FIRFilter Joiner Splitter Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Vector Mult FIRFilter Magnitude Detector Joiner

Example: Radar App. (Balanced)

Outline Design of StreamIt Structured Streams Messaging Morphing Results Conclusions

Conclusions Compiler-conscious language design can improve both programmability and performance Structure enables local, hierachical analyses Messaging simplifies code, exposes parallelism Morphing allows optimization across phases Goal: Stream programming at high level of abstraction without sacrificing performance

For More Information StreamIt Homepage http://compiler.lcs.mit.edu/streamit