ILPc: A novel approach for scalable timing analysis of synchronous programs Hugh Wang Partha S. Roop Sidharta Andalam.

Slides:

Advertisements

Similar presentations

Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.

Advertisements

Approximation of the Worst-Case Execution Time Using Structural Analysis Matteo Corti and Thomas Gross Zürich.

Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.

College of Information Technology & Design

Approximating the Worst-Case Execution Time of Soft Real-time Applications Matteo Corti.

Xianfeng Li Tulika Mitra Abhik Roychoudhury

Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………

A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.

Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.

Optimizing single thread performance Dependence Loop transformations.

Precision Timed Embedded Systems Using TickPAD Memory Matthew M Y Kuo* Partha S Roop* Sidharta Andalam † Nitish Patel* *University of Auckland, New Zealand.

Formal Semantics of Programming Languages 虞慧群 Topic 6: Advanced Issues.

Lecture 8: Three-Level Architectures CS 344R: Robotics Benjamin Kuipers.

Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.

Analyzing and Verifying Esterel Programs Taisook Han , Division of Computer Science, KAIST.

Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.

Parametric Throughput Analysis of Synchronous Data Flow Graphs

Syntax-driven partitioning for model-checking of Esterel programs Eric Vecchié - INRIA Tick.

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.

What is an Algorithm? (And how do we analyze one?)

Department of Electrical and Computer Engineering M.A. Basith, T. Ahmad, A. Rossi *, M. Ciesielski ECE Dept. Univ. Massachusetts, Amherst * Univ. Bretagne.

Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.

Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.

Esterel Overview Roberto Passerone ee249 discussion section.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.

Compiling Esterel into Static Discrete-Event Code Stephen A. Edwards Columbia University Computer Science Department New York, USA

Copyright © 2001 Stephen A. Edwards All rights reserved Esterel and Other Projects Prof. Stephen A. Edwards Columbia University, New York

Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.

Parallel Programming and Timing Analysis on Embedded Multicores Eugene Yip The University of Auckland Supervisors:Advisor: Dr. Partha RoopDr. Alain Girault.

Design Space Exploration

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Parallel Programming and Timing Analysis on Embedded Multicores Eugene Yip The University of Auckland Supervisors:Advisor: Dr. Partha RoopDr. Alain Girault.

Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

Programming and Timing Analysis of Parallel Programs on Multicores Eugene Yip, Partha Roop, Morteza Biglari-Abhari, Alain Girault ACSD

Reporter: PCLee. Although assertions are a great tool for aiding debugging in the design and implementation verification stages, their use.

“Software” Esterel Execution (work in progress) Dumitru POTOP-BUTUCARU Ecole des Mines de Paris

Fast Simulation Techniques for Design Space Exploration Daniel Knorreck, Ludovic Apvrille, Renaud Pacalet

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.

1 Workshop Topics - Outline Workshop 1 - Introduction Workshop 2 - module instantiation Workshop 3 - Lexical conventions Workshop 4 - Value Logic System.

Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.

A Framework on Synchronization Verification in System-Level Design Thanyapat Sakunkonchak Satoshi Komatsu Masahiro Fujita Fujita Laboratory University.

Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,

A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur

Sunpyo Hong, Hyesoon Kim

Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 1 (Performance measurement)

GPGPU Performance and Power Estimation Using Machine Learning Gene Wu – UT Austin Joseph Greathouse – AMD Research Alexander Lyashevsky – AMD Research.

A Study of Data Partitioning on OpenCL-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1.

1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,

Optimistic Hybrid Analysis

Chih-Fan Lai1, J.-H. Roland Jiang1, and Kuo-Hua Wang2

Amir Kamil and Katherine Yelick

Year 2 Updates.

CSCI1600: Embedded and Real Time Software

Loop Control Structure.

Software Testing (Lecture 11-a)

Loops CIS 40 – Introduction to Programming in Python

Amir Kamil and Katherine Yelick

ECE 498AL Lecture 10: Control Flow

IntScope: Automatically Detecting Integer overflow vulnerability in X86 Binary Using Symbolic Execution Tielei Wang, TaoWei, ZhingiangLin, weiZou Purdue.

Implementation of a De-blocking Filter and Optimization in PLX

CSCI1600: Embedded and Real Time Software

Pointer analysis John Rollinson & Kaiyuan Li

Presentation transcript:

ILPc: A novel approach for scalable timing analysis of synchronous programs Hugh Wang Partha S. Roop Sidharta Andalam

Outline Timing analysis of concurrent programs Problem statement Our approach - ILPc Results Conclusions

Acronyms ILP – Integer Linear Program ILPs – ILP sequential ILPc – ILP concurrent EOT = End of Tick – also known as “pause” TCCFG – Timed concurrent control flow graph (Intermediate format)

Synchronous languages abort [loop foo1(); pause; foo2(); pause; end loop ] || [loop foo3(); pause; foo4(); pause; end loop ] when s

Synchronous languages abort [loop foo1(); pause; foo2(); pause; end loop ] || [loop foo3(); pause; foo4(); pause; end loop ] when s

Synchronous languages abort [loop foo1(); pause; foo2(); pause; end loop ] || [loop foo3(); pause; foo4(); pause; end loop ] when s abort start abort end

Synchronous languages abort [loop foo1(); pause; foo2(); pause; end loop ] || [loop foo3(); pause; foo4(); pause; end loop ] when s abort condition

Synchronous languages abort [loop foo1(); pause; foo2(); pause; end loop ] || [loop foo3(); pause; foo4(); pause; end loop ] when s abort condition False True

Synchronous languages abort [loop foo1(); pause; foo2(); pause; end loop ] || [loop foo3(); pause; foo4(); pause; end loop ] when s fork

Synchronous languages abort [loop foo1(); pause; foo2(); pause; end loop ] || [loop foo3(); pause; foo4(); pause; end loop ] when s

Synchronous languages abort [loop foo1(); pause; foo2(); pause; end loop ] || [loop foo3(); pause; foo4(); pause; end loop ] when s

Synchronous languages Time Read Inputs Emit outputs Computation

Synchronous languages Time Tick 1Tick 4Tick 2Tick 3

Synchronous languages Time Worst Case Reaction Time analysis ◦ WCRT analysis Synchrony hypothesis ◦ Reactive system operates infinitely fast compared to the environment. Validation ◦ Min(inter-arrival-time of events) > Max (Reaction Time) Longest tick

Outline Timing analysis of concurrent programs Problem statement Our approach - ILPc Results Conclusions

Motivating example A1 A2 A3 T1 B1 B2 T2 C1 C2 T3 T1 || T2 || T3

Motivating example A1 A2 A3 T1 B1 B2 T2 C1 C2 T Execution Time: 25

Motivating example A1 A2 A3 T1 B1 B2 T2 C1 C2 T Execution Time: 28

Motivating example A1 A2 A3 T1 B1 B2 T2 C1 C2 T Execution Time: 35

Conventional approaches Max-plus M. Boldt and C. Traulsen and R. Hanxleden. Worst Case Reaction Time Analysis of Concurrent Reactive Programs. ENTCS, 203(4), L. Ju, B. K. Huynh, A. Roychoudhury, and S. Chakraborty. Performance debugging of Esterel specifications. CODES-ISSS 2008.

Conventional approaches State exploration L. Ju, B. K. Huynh, S. Chakraborty, and A. Roychoudhury. Context- sensitive timing analysis of Esterel programs. DAC, S. Andalam, P. S. Roop, and A. Girault. Pruning infeasible paths for tight WCRT analysis of synchronous programs. DATE, M. Kuo, R. Sinha, and P. S. Roop. Efficient WCRT analysis of synchronous programs using reachability. DAC, ILPs Model checking Reachability

Motivating example A1 A2 A3 T1 B1 B2 T2 C1 C2 T WCRT analysis -Max-Plus WCRT = Max(T1) + Max(T2) + Max(T3) = A3 + B1 + C2 = = 40 cycles -State Exploration A1+B1+C1 = 25 A2+B2+C2 = 28 A3+B1+C1 = 35 A1+B2+C2 = 23 A2+B1+C1 = 30 A3+B2+C2 = 33 WCRT = Max(25,28,35,23,30,33) = 35 cycles

Tradeoff Precision Analysis Time Max-plus State exploration

Motivating example A1 A2 A3 T1 B1 B2 T2 C1 C2 T

Motivating example A1 A2 A3 T1 B1 B2 T2 C1 C2 T WCRT analysis -Max-Plus WCRT = Max(T1) + Max(T2) + Max(T3) = A3 + B2 + C2 = = 45 cycles -State Exploration A1+B1+C1 = 25 A2+B2+C2 = 40 A3+B1+C1 = 35 A1+B2+C2 = 35 A2+B1+C1 = 30 A3+B2+C2 = 45 WCRT = Max(25,40,35,35,30,45) = 45 cycles

The problem statement Precision Analysis Time Max-plus State exploration

Our approach Our approach (ILPc) ◦ Inspired by counter example guided model checking ◦ Also has some ideas similar to local model checking Success Fail

Outline Timing analysis of concurrent programs Problem statement Our approach - ILPc Results Conclusions

Overview of ILPc TCCFGILP model

E1 E2 E5 E4 E3 E6 E7 E12 E11E8 E9 E10 E17 E13 E14 E15 E16

ILP model E1 E2 E5 E4 E3 E6 E7 E12 E11E8 E9 E10 E17 E13 E14 E15 E16

ILP model E1 E2 E5 E4 E3 E6 E7 E12 E11E8 E9 E10 E17 E13 E14 E15 E16

ILP Compared with the conventional ILP ◦ Directly capture features of synchronous languages. ◦ More precise WCRT estimates with minimum overhead.

Overview of ILPc TCCFGILP model Tick expressions

Tick Expressions 0 (1,3,5…)

Overview of ILPc ILP solver TCCFG Ticks can be aligned? ILP model Tick expressions WCRT & Execution path

Verifying tick alignment 0

Overview of ILPc ILP solver TCCFG Ticks can be aligned? ILP model Tick expressions WCRT & Execution path Refinement Success Fail

Outline Timing analysis of concurrent programs Problem statement Our approach - ILPc Results Conclusions

Benchmarking Compared with 3 existing approaches [1,2,3] Conducted in 2 phases ◦ Phase 1: Theoretical performance ◦ Phase 2: Real-world applications Benchmark computer ◦ Windows based ◦ Quad-core 1.6 GHz CPU ◦ 8 GB memory [1] L. Ju, B. K. Huynh, S. Chakraborty, and A. Roychoudhury. Context- sensitive timing analysis of Esterel programs, DAC, [2] S. Andalam, P. S. Roop, and A. Girault. Pruning infeasible paths for tight WCRT analysis of synchronous programs, DATE [3] M. Kuo, R. Sinha, and P. S. Roop. Efficient WCRT analysis of synchronous programs using reachability, DAC, 2011.

Scalability Analysis time vs. program states. Analysis time of ILPc ◦ Time taken for each iteration.  Number of program states. ◦ The number of iteration.  Structure and cost distribution.

Benchmarking: Phase 1 Two sets of benchmarks ◦ Set A - Maximum number of required iterations to find the WCRT. ◦ Set B - Minimum number of required iterations to find the WCRT. (Iteration = 1)

Benchmarking: Phase 1 Set A A1 A2 End … 5 0

Benchmarking: Phase … Set A A1 A2 End Set B …

Benchmarking: Phase 1 Set A

Benchmarking: Phase 1 Set B

Benchmarking: Phase 1 Same precision. Analysis time of ILPc depends heavily on the number of iterations rather than the number of program states. On average, analysis time of ILPc should be between the worst case and best case scenarios.

Benchmarking: Phase2 NameWCRTThreads 1ChannelProtocol9977 2Flasher6177 3RobotSonar DrillStation CruiseControl RailroadCrossing WaterMonitor Small Large L. H. Yoong and G. D. Shaw. Auckland function block benchmark. University of Auckland,

Benchmarking: Phase2 Benchmarks 1-4 (less than 20 threads)

Benchmarking: Phase2 Benchmarks 5-7 (more than 20 threads) >1 hr Out of Memory

Benchmarking: Phase2

Results summary Same precision as state exploration approaches. Orders of magnitude faster compared to state exploration approaches for synchronous benchmarks for industrial automation applications.

Conclusions We proposed ILPc tailored for timing analysis of concurrent programs. ILPc paves the way for scalable timing analysis through an iterative refinement process. Future work: ◦ Consideration of complex architectural features. ◦ ILPc variant for the analysis of parallel programs.

Thank you!