Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Course Outline Traditional Static Program Analysis Software Testing
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Automated Refinement Checking of Concurrent Systems Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
1 Translation Validation: From Simulink to C Michael RyabtsevOfer Strichman Technion, Haifa, Israel Acknowledgement: sponsored by a grant from General.
6/14/991 Symbolic verification of systems with state machines David L. Dill Jeffrey Su Jens Skakkebaek Computer System Laboratory Stanford University.
08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.
ISBN Chapter 3 Describing Syntax and Semantics.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Automated Soundness Proofs for Dataflow Analyses and Transformations via Local Rules Sorin Lerner* Todd Millstein** Erika Rice* Craig Chambers* * University.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Cpeg421-08S/final-review1 Course Review Tom St. John.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Administrative info Subscribe to the class mailing list –instructions are on the class web page, which is accessible from my home page, which is accessible.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.
Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Partial Order Reduction for Scalable Testing of SystemC TLM Designs Sudipta Kundu, University of California, San Diego Malay Ganai, NEC Laboratories America.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.
Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
Describing Syntax and Semantics
DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Formal Verification of SpecC Programs using Predicate Abstraction Himanshu Jain Daniel Kroening Edmund Clarke Carnegie Mellon University.
Ben Livshits Based in part of Stanford class slides from
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Optimizing Compilers Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Using Mathematica for modeling, simulation and property checking of hardware systems Ghiath AL SAMMANE VDS group : Verification & Modeling of Digital systems.
Compositional correctness of IP-based system design: Translating C/C++ Models into SIGNAL Processes Rennes, November 04, 2005 Hamoudi Kalla and Jean-Pierre.
1 Program Correctness CIS 375 Bruce R. Maxim UM-Dearborn.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Reasoning about programs March CSE 403, Winter 2011, Brun.
Chapter 3 Part II Describing Syntax and Semantics.
Semantics In Text: Chapter 3.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke Presented by: Xia Cheng.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Proving Optimizations Correct using Parameterized Program Equivalence University of California, San Diego Sudipta Kundu Zachary Tatlock Sorin Lerner.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Compiler Principles Fall Compiler Principles Lecture 8: Dataflow & Optimizations 1 Roman Manevich Ben-Gurion University of the Negev.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
Code Optimization.
Mechanical Certification of Loop Pipelining Transformations: A Preview
SS 2017 Software Verification Bounded Model Checking, Outlook
Benjamin Goldberg Compiler Verification and Optimization
Basic Concepts of Algorithm
Presentation transcript:

Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San Diego

Validating High-Level Synthesis 2 Power Perfor mance Area High-Level Synthesis Algorithmic Design Register Transfer Level Design (RTL) Functionally Equivalent Scheduled Design Functionally Equivalent Functionally Equivalent Controller S0 S1 S2S3 S4 f !f Data path …. x = a * b; c = a < b; if (c) then a = b – x; else a = b + x; a = a + x; b = b * x; ….  C/C++, SystemC  Verilog, VHDL High-Level Synthesis (HLS)

University of California, San Diego Validating High-Level Synthesis 3 Equivalence Checker For each translation Does not guarantee => HLS tool is bug free Checker Once and for all HLS tool always produce correct results The Problem Transformations Input Program (Specification) Transformed Program (Implementation)  Translation Validation (TV)  Optimizing Compiler [Pnueli et al. 98] [Necula 00] [Zuck et al. 05] Does guarantee => any errors in translation will be caught when tool runs Is the Specification “functionally equivalent” to the Implementation?

University of California, San Diego Validating High-Level Synthesis 4 Contributions Developed TV techniques for a new setting: HLS Algorithm uses a bisimulation relation approach. Implemented TV for a realistic HLS tool: SPARK Widely used: 4,000 downloads, over 100 active users. Large Software: around 125, 000 LoC. Ran it on 12 benchmarks Modular: works on one procedure at a time. Practical: took on average 6 secs to run per procedure. Easy to Implement: only a fraction of development cost of SPARK. Useful: found 2 previously unknown bugs.

University of California, San Diego Validating High-Level Synthesis 5 Outline 1. Motivation and Problem definition 2. Our Approach using an Example Definition of Equivalence Translation Validation Algorithm Generate Constraints Solve Constraints 3. Experiments and Results SPARK: Parallelizing HLS Framework 4. Conclusion

University of California, San Diego Validating High-Level Synthesis 6 Original Program: An Example of HLS Specification: i 2 : k = p i 1 : sum = 0 i 3 : (k < 10) i 6 : ¬ (k < 10) i 4 : k = k + 1 i 5 : sum = sum + k a2a2 a3a3 a4a4 a5a5 a1a1 a6a6 a0a0 i 7 : return sum int SumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; int SumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; ++< Resource Allocation: i 5 : sum = sum + k i 4 : k = k + 1 i 2 : k = p i 1 : sum = 0 a1a1 a0a0 i 3 : (k < 10) i 6 : ¬ (k < 10) a2a2 a3a3 a4a4 a5a5 a6a6 i 7 : return sum 10 sum = ∑ k p+1

University of California, San Diego Validating High-Level Synthesis 7 i 2 : k = p i 1 : sum = 0 a1a1 a0a0 Loop Pipelining: t = k + 1 i 5 : sum = sum + k An Example of HLS Specification: i 2 : k = p i 1 : sum = 0 i 3 : (k < 10) i 6 : ¬ (k < 10) i 4 : k = k + 1 i 5 : sum = sum + k a2a2 a3a3 a4a4 a5a5 a1a1 a6a6 a0a0 i 7 : return sum int SumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; ++< Resource Allocation: i 4 : k = k + 1 i 3 : (k < 10) i 6 : ¬ (k < 10) a2a2 a3a3 a4a4 a5a5 a6a6 i 7 : return sum i 2 : k = p i 1 : sum = 0 a1a1 a0a0 t = k + 1 i 4 : k = t

University of California, San Diego Validating High-Level Synthesis 8 Copy Propagation: t = k + 1 i 5 : sum = sum + k i 5 : sum = sum + t t = p + 1 An Example of HLS Specification: i 2 : k = p i 1 : sum = 0 i 3 : (k < 10) i 6 : ¬ (k < 10) i 4 : k = k + 1 i 5 : sum = sum + k a2a2 a3a3 a4a4 a5a5 a1a1 a6a6 a0a0 i 7 : return sum int SumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; ++< Resource Allocation: i 3 : (k < 10) i 6 : ¬ (k < 10) a2a2 a3a3 a4a4 a5a5 a6a6 i 7 : return sum i 2 : k = p i 1 : sum = 0 a1a1 a0a0 t = k + 1 i 4 : k = t t = t + 1

University of California, San Diego Validating High-Level Synthesis 9 a0a0 i 1 : sum = 0 i 2 : k = p i 41 : t = p + 1 Scheduling: An Example of HLS Specification: i 2 : k = p i 1 : sum = 0 i 3 : (k < 10) i 6 : ¬ (k < 10) i 4 : k = k + 1 i 5 : sum = sum + k a2a2 a3a3 a4a4 a5a5 a1a1 a6a6 a0a0 i 7 : return sum int SumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; ++< Resource Allocation: t = p + 1 i 2 : k = p i 1 : sum = 0 a1a1 a0a0 a2a2 i 3 : (k < 10) i 6 : ¬ (k < 10) a5a5 a6a6 i 7 : return sum a3a3 i 4 : k = t i 5 : sum = sum + t a4a4 t = t + 1 i 4 : k = t i 5 : sum = sum + t i 42 : t = t + 1 Read After Write dependency

University of California, San Diego Validating High-Level Synthesis 10 An Example of HLS Specification: i 2 : k = p i 1 : sum = 0 i 3 : (k < 10) i 6 : ¬ (k < 10) i 4 : k = k + 1 i 5 : sum = sum + k a2a2 a3a3 a4a4 a5a5 a1a1 a6a6 a0a0 i 7 : return sum int SumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; b0b0 j 1 : sum = 0 j 2 : k = p j 41 : t = p + 1 b1b1 j 3 : (k < 10) j 6 : ¬ (k < 10) b3b3 b4b4 j 7 : return sum b2b2 j 4 : k = t j 5 : sum = sum + t j 42 : t = t + 1 Implementation: ≡ Re-labeled

University of California, San Diego Validating High-Level Synthesis 11 Definition of Equivalence Specification ≡ Implementation => They have the same set of execution sequences of visible instructions. Visible instructions are: Function call and return statements. Two function calls are equivalent if the state of globals and the arguments are the same. Two returns are equivalent if the state of the globals and the returned values are the same.

University of California, San Diego Validating High-Level Synthesis 12 Technique for Proving Equivalence Bisimulation Relation: relates a given program state in the implementation with the corresponding state in the specification and vice versa. It satisfies the following properties: 1.The start states are related. 2. s2s2 i s’ 2 i s1s1 s’ 1 SpecificationImplementation Theorem: If there exists a bisimulation relation between the specification and the implementation, then they are equivalent. Visible Instruction

University of California, San Diego Validating High-Level Synthesis 13 Our Approach Split program state space in two parts: control flow state, which is finite. => explored by traversing the CFGs. dataflow state, which may be infinite. => explored using Automated Theorem Prover (ATP). Bisimulation relation: set of entries of the form (p 1, p 2, Ф). p 1 – program point in Specification. p 2 – program point in Implementation. Ф – formula that relates the data.

University of California, San Diego Validating High-Level Synthesis 14 Bisimulation Relation p s = p i k s = k i Λ sum s = sum i Λ (k s + 1) = t i sum s = sum i i 2 : k = p i 1 : sum = 0 i 3 : (k < 10) i 6 : ¬ (k < 10) i 4 : k = k + 1 i 5 : sum = sum + k a2a2 a3a3 a4a4 a5a5 a1a1 a6a6 a0a0 i 7 : return sum b1b1 b2b2 b3b3 b4b4 b0b0 j 1 : sum = 0 j 2 : k = p j 41 : t = p + 1 j 4 : k = t j 5 : sum = sum + t j 42 : t = t + 1 j 7 : return sum j 6 : ¬ (k < 10) j 3 : (k < 10) Specification Implementation Bisimulation relation: set of entries of the form (p 1, p 2, Ф). p 1 – program point in Specification. p 2 – program point in Implementation. Ф – formula that relates the data.

University of California, San Diego Validating High-Level Synthesis 15 Translation Validation Algorithm Two step approach. Generate Constraints: traverses the CFGs simultaneously and generates the constraints required for the visible instructions to be matched. Solve Constraints: solves the constraints using a fixpoint algorithm. For loops: iterate to a fixed point. May not terminate in general. But in practice can find the bisimulation relation.

University of California, San Diego Validating High-Level Synthesis 16 Generate Constraints i 2 : k = p i 1 : sum = 0 i 3 : (k < 10) i 6 : ¬ (k < 10) i 4 : k = k + 1 i 5 : sum = sum + k a2a2 a3a3 a4a4 a5a5 a1a1 a6a6 a0a0 i 7 : return sum b1b1 b2b2 b3b3 b4b4 b0b0 j 1 : sum = 0 j 2 : k = p j 41 : t = p + 1 j 4 : k = t j 5 : sum = sum + t j 42 : t = t + 1 j 7 : return sum j 6 : ¬ (k < 10) j 3 : (k < 10) Constraint Variable Constraints: Specification Implementation Ψ (a 2, b 1 ) b1b1 a2a2 Ψ (a 0, b 0 ) b0b0 a0a0 i1i1 i2i2 j 1 j 2 j 41 Ψ (a 2, b 1 ) b1b1 b1b1 a2a2 a2a2 i3i3 i4i4 i5i5 j 4 j 5 j 42 j3j3 Ψ (a 5, b 3 ) j6j6 b1b1 b3b3 a2a2 a5a5 i6i6 Ψ (a 2, b 1 ) Ψ (a 5, b 3 ) ⇒ sum s = sum i Ψ (a 2, b 1 ) ⇒ k s = k i Ψ (a 0, b 0 ) Ψ (a 2, b 1 ) Ψ (a 5, b 3 ) Branch Correlation - Strongest Post Condition - Structure of the code

University of California, San Diego Validating High-Level Synthesis 17 Ψ (a 2, b 1 ) Ψ (a 0, b 0 ) b0b0 b1b1 a0a0 a2a2 i1i1 i2i2 j 1 j 2 j 41 Ψ (a 2, b 1 ) b1b1 b1b1 a2a2 a2a2 i3i3 i4i4 i5i5 j 4 j 5 j 42 j3j3 Ψ (a 5, b 3 ) j6j6 b1b1 b3b3 a2a2 a5a5 i6i6 Ψ (a 2, b 1 ) Solve Constraints Constraints: i 2 : k = p i 1 : sum = 0 i 3 : (k < 10) i 6 : ¬ (k < 10) i 4 : k = k + 1 i 5 : sum = sum + k a2a2 a3a3 a4a4 a5a5 a1a1 a6a6 a0a0 i 7 : return sum b1b1 b2b2 b3b3 b4b4 b0b0 j 1 : sum = 0 j 2 : k = p j 41 : t = p + 1 j 4 : k = t j 5 : sum = sum + t j 42 : t = t + 1 j 7 : return sum j 6 : ¬ (k < 10) j 3 : (k < 10) Specification Implementation Ψ (a 0, b 0 ) : p s = p i Ψ (a 2, b 1 ) : True Ψ (a 2, b 1 ) : k s = k i Ψ (a 5, b 3 ) : True Ψ (a 5, b 3 ) : sum s = sum i XX X Ψ (a 2, b 1 ) : k s = k i Λ (k s + 1) = t i Ψ (a 2, b 1 ) : k s = k i Λ (k s + 1) = t i Λ sum s = sum i Ψ (a 5, b 3 ) ⇒ sum s = sum i Ψ (a 2, b 1 ) ⇒ k s = k i X X Ψ (a 0, b 0 ) Ψ (a 2, b 1 ) Ψ (a 5, b 3 ) b1b1 b1b1 a2a2 a2a2 Ψ (a 2, b 1 ) : k s = k i j 3 : (k i < 10) j 4 : k i = t i j 5 : sum i = sum i + t i j 42 : t i = t i + 1 i 3 : (k s < 10) i 4 : k s = k s + 1 i 5 : sum s = sum s + k s WP( Ψ (a 2, b 1 ) ) : (k i + 1) ⇒ (k s + 1) ⇒ (k s + 1) = t i ATP[ Ψ (a 2, b 1 ) ⇒ WP( Ψ (a 2, b 1 ) ) ] Ψ (a 2, b 1 ) : k s = k i ∧ (k s + 1) = t i : k s = k i ∧ WP( Ψ (a 2, b 1 ) )

University of California, San Diego Validating High-Level Synthesis 18 SPARK: Parallelizing HLS Framework C Program High Level Synthesis RTL Binding Intermediate Representation (IR) Transformations Code Motion, CSE, IVA, Copy Propagation, Dead Code Elimination, Percolation, Trailblazing, Chaining Across conditions, dynamic CSE. Heuristics HTG Scheduling Walker, Candidate Op Walker, Get Available Ops, Loop Pipelining Scheduled IR SPARK Pre-Synthesis Optimization AllocationScheduling No Pointer No Recursion No goto Validation Engine

University of California, San Diego Validating High-Level Synthesis 19 Results Benchmarks No. of simulation relation entries No. of calls to theorem prover Time (sec:msec) 1. incrementer6900:5 2. integer-sum62000:8 3. array-sum62400:8 4. diffeq74101:6 5. waka117902:6 6. pipelining127502:3 7. rotor147102:5 8. parker :2 9. s2r :7 10. findmin :8

University of California, San Diego Validating High-Level Synthesis 20 Bugs Found in SPARK Array Copy Propagation Code motion Code fragment Before scheduling After scheduling (Buggy) After scheduling (Correct) a[0] := b[1]; c := a[0]; a[0] := b[1]; c := b[0]; a[0] := b[1]; c := b[1]; Code fragment Before scheduling After scheduling (Buggy) After scheduling (Correct) ret[1] := blk[0] <<3; ret[0] := ret[1]; ret[1] := blk[0] <<3; ret[0] := ret[1]; Did not copy array index Read After Write dependency

University of California, San Diego Validating High-Level Synthesis 21 Related Work Translation Validation Sequential Programs [Pnueli et al. 98] [Necula 00] [Zuck et al. 05] CSP Programs [Kundu et al. 07] HLS Verification Scheduling Step Correctness preserving transformation [Eveking 99] Symbolic Simulation [Ashar 99] Formal assertions [Narasimhan 01] Relational approaches for Equivalence of FSMDs [Kim 04, Karfa 06]

University of California, San Diego Validating High-Level Synthesis 22 Conclusion and Future Directions Presented an automated algorithm for translation validation of the HLS process. Implemented it for a HLS tool called SPARK. Modular: works on one procedure at a time. Practical: took on average 6 secs to run per procedure. Easy to Implement: only a fraction of development cost of SPARK. Useful: found 2 previously unknown bugs. In future, Remaining phases of SPARK: parsing, binding and code generation.

University of California, San Diego Validating High-Level Synthesis 23 Thank You