Torino, Italy – June 27th, 2013 A2B: AN I NTEGRATED F RAMEWORK FOR D ESIGNING H ETEROGENEOUS AND R ECONFIGURABLE S YSTEMS C. Pilato, R. Cattaneo, G. Durelli,

Slides:



Advertisements
Similar presentations
purpose Search : automation methods for device driver development in IP-based embedded systems in order to achieve high reliability, productivity, reusability.
Advertisements

D ARMSTADT, G ERMANY - 11/07/2013 A Framework for Effective Exploitation of Partial Reconfiguration in Dataflow Computing Riccardo Cattaneo ∗, Xinyu Niu†,
Torino, Italy – June 25, 2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013) C. Pilato R. Cattaneo, C. Pilato, M. Mastinu, M.D. Santambrogio.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Politecnico di Milano, Italy
Systems Development Environment
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
CS487 Software Engineering Omar Aldawud
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Berlin, Germany – January 21st, 2013 A2B: A F RAMEWORK FOR F AST P ROTOTYPING OF R ECONFIGURABLE S YSTEMS Christian Pilato, R. Cattaneo, G. Durelli, A.A.
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
Ashish Gupta Manan Sanghi Integrated Framework for Visualization and Analysis of Platforms.
 Data copy forms part of an auto-tuning compiler framework.  Auto-tuning compiler, while using the library, can empirically evaluate the different implementations.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Major Exam II Reschedule 5:30 – 7:30 pm in Tue Dec 5 th.
Term Project User Interface Specifications in a Usability Engineering Course: Challenges and Suggestions Laura Leventhal Julie Barnes Joe Chao Bowling.
Distributed Computations MapReduce
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
What is Business Analysis Planning & Monitoring?
UML - Development Process 1 Software Development Process Using UML (2)
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
EECE **** Embedded System Design
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
1 The Architectural Design of FRUIT: A Family of Retargetable User Interface Tools Yi Liu, H. Conrad Cunningham and Hui Xiong Computer & Information Science.
A Flexible Interconnection Structure for Reconfigurable FPGA Dataflow Applications Gianluca Durelli, Alessandro A. Nacci, Riccardo Cattaneo, Christian.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
1 © FASTER Consortium Catalin Ciobanu Chalmers University of Technology Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
POLITECNICO DI MILANO Reconfiguration 4 Reliability design methodology for reliability assessment and enhancement of FPGA-based systems Dynamic Reconfigurability.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
HYPER: An Interactive Synthesis Environment for Real Time Applications Introduction to High Level Synthesis EE690 Presentation Sanjeev Gunawardena March.
Context Workshop. Diepenbeek 22 january 2004 Agenda Introduction Work methodology Context description Description frameworks Conclusion Questions.
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
© 2005 Prentice Hall1-1 Stumpf and Teague Object-Oriented Systems Analysis and Design with UML.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
Advanced Computer Architecture & Processing Systems Research Lab Framework for Automatic Design Space Exploration.
1 COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY 1/9/2008 SAXS Software.
M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
1 of 16 April 25, 2006 System-Level Modeling and Synthesis Techniques for Flow-Based Microfluidic Large-Scale Integration Biochips Contact: Wajid Hassan.
Introduction Problem Statement Research Goals Conclusion Contact / More Information { shimin, ltahvild Componentizing legacy system.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Software Systems Division (TEC-SW) ASSERT process & toolchain Maxime Perrotin, ESA.
Hardware/Software Co-Design of Complex Embedded System NIKOLAOS S. VOROS, LUIS SANCHES, ALEJANDRO ALONSO, ALEXIOS N. BIRBAS, MICHAEL BIRBAS, AHMED JERRAYA.
POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
WoPANets: Decision-support Tool for real-time Networks Design
FHIRForms – Viewing and Editing FHIR Data
SLS-CS_13-13 SCCC Green Book
Component Based Software Engineering
CS & CS Capstone Project & Software Development Project
Subject Name: Embedded system Design Subject Code: 10EC74
Wide Area Workload Management Work Package DATAGRID project
Executable Specifications
Building a “System” Moving from writing a program to building a system. What’s the difference?! Complexity, size, complexity, size complexity Breadth.
Presentation transcript:

Torino, Italy – June 27th, 2013 A2B: AN I NTEGRATED F RAMEWORK FOR D ESIGNING H ETEROGENEOUS AND R ECONFIGURABLE S YSTEMS C. Pilato, R. Cattaneo, G. Durelli, A.A. Nacci, M.D. Santambrogio, D. Sciuto Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria, Italy NASA/ESA C ONFERENCE ON A DAPTIVE H ARDWARE AND S YSTEMS (AHS 2013)

Torino, Italy – June 27th, Motivations  The design of reconfigurable systems is a difficult task Interactions between the different phases have to be taken into account  Decisions in the frontend phase may highly affect the backend implementation: iterative exploration E.g.: Mapping onto reconfigurable regions and floorplacing of the tasks may generate low-quality solutions due to a wrong partitioning or assignment of implementations  Currently, the optimal design methodology (and the number of its iterations) is not known in advance A2B is an ongoing project at Politecnico di Milano to assist the design of such complex systems

Torino, Italy – June 27th, Agenda  Framework Overview Design Space Exploration Solution Generation  Preliminary Results – Test Case  Conclusions and Future Work

Torino, Italy – June 27th, Framework Overview Evaluation Exploration  Inputs: Information about the target device (.XML) Application source files (.C) plus custom pragma for additional information (e.g., task level parallelism/kernels)  Decision Making (Exploration): Task graph generation Library generation Mapping, Scheduling, Floorplacing Architectural modification  Refinement (Evaluation): Specification of the platform details Code generation for target platform  Output: Project files ready for the synthesis with back-end tools

Torino, Italy – June 27th, XML Exchange Format  The entire project can be represented through an XML file Architecture: components’ characteristics (e.g., reconfigurable regions), … Applications: source code files and profiling information Library: task implementations with the characterization (time, resources,...) Partitions: task graph, mapping and scheduling, …  It allows a modular organization of the framework, but also the sharing of information among the different phases The phases can be applied in any order to progressively optimize the design The designer can perform as many iterations as he/she wants to refine the solution  Specific details of the target architecture are taken into account only in the refinement phase (interactions with backend tools)

Torino, Italy – June 27th, Task Graph Generation  Application source code files can be analyzed to extract the task graphs Profiling information can drive the generation of such solutions  Task graph will be then specified in the XML file as processing nodes connected by data transfers Currently they can be designed by hand, but automated methodologies for automatic extraction will be investigated in the future Transformations to improve the description by splitting/merging the tasks #pragma omp task void threshold(unsigned char *o1,unsigned char *r, unsigned char t, int * p){ nt DIMH = p[0]; int minH1 = p[1]; int maxH1 = p[2]; int minV1 = p[3]; int maxV1 = p[4]; for(v=minV1;v<maxV1;v++) for(h=minH1;h<maxH1;h++){ If(original1[v*DIMH+h]>thresh){ result[v*DIMH*BPP+h*BPP]=255; result[v*DIMH*BPP+h*BPP+1]=255; result[v*DIMH*BPP+h*BPP+2]=255; } else{ result[v*DIMH*BPP+h*BPP]=0; result[v*DIMH*BPP+h*BPP+1]=0; result[v*DIMH*BPP+h*BPP+2]=0; }

Torino, Italy – June 27th, Library Generation: a collection of implementations  LLVM-based compiler to extract the dataflow graph of each task Estimation of required resources (including bit-width analysis) [IMP] Interaction with HLS synthesis tools to obtain more accurate results  Generated implementations are then store in the XML file to offer opportunities to the mapping phase and information to the floorplacer Politecnico di Milano/Imperial College of London joint effort to integrate High Level Analysis techniques into the toolchain A Framework for Effective Exploitation of Partial Reconfiguration in Dataflow Computing – R.Cattaneo, X. Niu, C. Pilato, T.Becker, W.Luk, M. D. Santambrogio (to appear in ReCoSoC13)

Torino, Italy – June 27th, Mapping, Scheduling and Floorplacing  We generate one or more configurations where each task of the applications is analyzed and assigned (via Mapping, Scheduling and Floorplanning – M/S/FP) to An available and admissible implementation A component of the architecture (e.g., processor or reconfigurable region)  This allows to “share” implementations across different tasks (hardware sharing) move a task implementation to another processing element at run-time (task relocation)

Torino, Italy – June 27th, Architecture Exploration  An additional step can be included to explore the target architecture Adding/removing processing elements (reconfigurable regions) Modifying their parameters Determining the proper interconnection topology  It can iteratively affect: task graph transformations and library generation mapping and floorplacing: modification to the computational resources (especially the number of reconfigurable regions)  It allows a progressive and iterative refinement of the solution and a concurrent customization of both architecture and application E.g.: mapping and floorplacing can suggest which resources should be added

Torino, Italy – June 27th, Supported Platforms  Virtex-5 XC5VLX110T (embedded) Two XCF32P Platform Flash PROMs (32Mbyte each) SystemACE™ Compact Flash configuration controller 64-bit wide 256Mbyte DDR2 small outline DIMM (SODIMM)  Maxeler MaxWorkstation (HPC system) Intel i7 16GB RAM, 500GB HDD Max3 dataflow engine (DFE) Virtex 6 SX475T FPGA, 24GB memory DFE connected to CPU via PCI Express XUPV5 Reconf. Area DDR2 (256MB) CPU0 CPU1 CPU MAX3 DFE DRAM (16GB) Interface FPGA Compute FPGA DRAM (24GB)

Torino, Italy – June 27th, Target-Dependent Code Generation CPU Compiler.c.xml Bitstream Generation HLS (MaxJ-VHDL) -Source code for CPU -DFGs for HW tasks -Mapping configurations Bitstream Generation exec bin bit Manual VHDL Implementations DFG-C HLS (C-VHDL) Manual MaxJ Implementations FPGA-based embedded system MaxWorkstation The code can be always further optimized by hand; e.g., glue code for data transfers MaxIDE DFG-MaxJ

Torino, Italy – June 27th, Graphical User Interface (GUI)  Practical GUI to support the designer, to limit the errors in the interactions with the XML and to allow custom design methodologies

Torino, Italy – June 27th, Preliminary results: edge detection  Edge detection application: 4 stages of computation C + custom #pragmas based description Extracted taskgraph and corresponding DFG of first stage (Scale, 1x parallelism)  We generate 4 implementations with different levels of parallelism and resource consumption for each of the 4 tasks of the application “parallelism X”: X pixels processed at once  Maxeler Backend

Torino, Italy – June 27th, Experimental Results / 1  Static vs reconfigurable design (both extracted using the framework) R0: S,T R1: B,E Task Name Area Occupation S664 B64 E7680 T7376 Region NameFinal Area Occupation R0max(664,64)=664 R1max(7680,7376)=7680 Total area consumption =8344  Reconfigurable (parallelism 8) Task Name Area Occupation S664 B64 E7680 T7376 Region NameFinal Area Occupation Total area consumption = 7876  Static (parallelism 4) IP0: S IP1: B IP2: E IP3: T  We limit the available area to 10klut and implement the most performing design

Torino, Italy – June 27th, Experiment Results / 2  Reconfiguration time is automatically masked (when possible)  Partial Reconfiguration improves performance of application via resource multiplexing

Torino, Italy – June 27th, Conclusions and Future Work  A2B is a modular framework to design reconfigurable systems Easy to plug alternative methods for each of the phase Possibility to perform progressive refinement of both application and architecture  A2B is becoming part of a larger project (ASAP – Advanced Synthesis of Applications and Platforms) Refinement will also include the generation of SystemC TLM models of the target system for (co-)simulation and early validation More architectural templates Closer interaction with actual synthesis (e.g., high-level synthesis) Automated methodologies to accelerate the design

Torino, Italy – June 27th, 2013 Thank you! Riccardo Cattaneo the European Community’s Seventh Framework Programme, FASTER project. Research partially funded by the European Community’s Seventh Framework Programme, FASTER project.