Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS 295440) ARTEMIS 2 nd Project Review October 2014 Aerospace Demonstrator.

Slides:

Advertisements

Similar presentations

Lean Powertrain Development Sam Akehurst, University of Bath, Powertrain & Vehicle Research Centre Funded Under EPSRC Project Codes EP/C540883/1 & EP/C540891/1EP/C540883/1EP/C540891/1.

Advertisements

Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 2 nd Project Review 28 th October 2014 WP6 “Demonstrators”

Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS Project Review October 2014 WP1 “Management and IPR”

Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 2 nd Project Review 28 October 2014 “Integration tools.

Lecture 6: Multicore Systems

Introductions to Parallel Programming Using OpenMP

Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 2 nd Project Review October 2014 WP2 “Application use.

System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.

1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*

March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.

Week 1- Fall 2009 Dr. Kimberly E. Newman University of Colorado.

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

Blue Bear Systems Research Hardware Architectures for Distributed Agents Dr Simon Willcox 24 th Soar Workshop 9 th – 11 th June 2004 Building 32, Twinwoods.

Andrea Camesi, Jarle Hulaas Software Engineering Laboratory Swiss Federal Institute of Technology in Lausanne (EPFL) Switzerland.

Parallel Programming Models and Paradigms

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

Design of Embedded Systems Task partitioning between hardware and software Hardware design and integration Software development System integration.

Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

Out-of-Order OpenRISC 2 semesters project Semester A: Implementation of OpenRISC on XUPV5 board Final A Presentation By: Vova Menis-Lurie Sonia Gershkovich.

Developments of CCSDS Data Compression for Space Exploration in CSSAR/CAS Center for Space Science and Applied Research Chinese Academy of Sciences

1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.

SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.

October 26, 2006 Parallel Image Processing Programming and Architecture IST PhD Lunch Seminar Wouter Caarls Quantitative Imaging Group.

Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS Project Review 28 nd October 2014 Multimedia Demonstrator.

CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.

Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

Data processing group. General study of data processing architecture: - overall definition of the data processing functions - share of tasks between on-board.

HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.

CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.

Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 2 nd Project Review October 2014 Summary of technical.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.

CCSDS Security WG meeting October 2008, hosted by DLR at DIN premises (Berlin) 1 Data Link Security BOF An ESA contribution on Lessons Learned and Issues/Questions.

Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,

Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,

Lixia Liu, Zhiyuan Li Purdue University, USA PPOPP 2010, January 2009.

INTEL CONFIDENTIAL Shared Memory Considerations Introduction to Parallel Programming – Part 4.

Workshop - November Toulouse Astrium Use Case.

October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.

Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.

Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 3 rd Project Review October 2015 WP6 – Space Demonstrator.

Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.

… begin …. Parallel Computing: What is it good for? William M. Jones, Ph.D. Assistant Professor Computer Science Department Coastal Carolina University.

Specialized Virtual Configurable Arrays Dominique Lavenier - Frederic Raimbault IRISA Rennes, France UBS Vannes, France

A Pattern Language for Parallel Programming Beverly Sanders University of Florida.

Parallel Computing Presented by Justin Reschke

EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.

This project and the research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7 / ]

K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,

Decisive Themes, July, JL-1 ARTEMIS Decisive Theme for Integrasys Pedro A. Ruiz Integrasys July, 2011.

Fast iteration and prototyping in high-performance computing medical applications: a case study with Mentor Vista 8th INFIERI WORKSHOP 10/21/2016.

A survey of Exascale Linear Algebra Libraries for Data Assimilation

SHARED MEMORY PROGRAMMING WITH OpenMP

Texas Instruments TDA2x and Vision SDK

Contact person: Mats Brorsson

Chapter 4: Threads.

EMC2 – Embedded multi-core systems for mixed criticality applications in dynamic and changeable real-time environments TITLE: EMC2 – WP12 “Face detection.

Thales Alenia Space Competence Center Software Solutions

CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.

Evaluate the integral {image}

Presentation transcript:

Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 2 nd Project Review October 2014 Aerospace Demonstrator Ricardo Moreno (TAS-E) ARTEMIS PaPP Review 20131

Contents Rationale Hardware Platform Use case: algorithm CCSDS 122 Parallelization with OpenMP Results and demonstration Contribution to PaPP objectives Next year goals ARTEMIS PaPP Review 20142

Rationale Continuous growth of quality and size of images taken from satellite but, limited downlink bandwidth due to technological limitations Solution: Compression of on-board data prior to transmission to ground At the same time: Deal with strong dependability and safety constraints Multi-client and tedious certification process System deterministic behaviour Goal: increase performance and maintain quality ARTEMIS PaPP Review 20133

Hardware platform: multicore Leon ARTEMIS PaPP Review 20144

Algorithm CCSDS 122: overview Payload lossy and lossless data compressor for bidimensional images instruments and potentially for multispectral and hyperspectral imagers and sounders Two parts: DWT + BPE ARTEMIS PaPP Review 20145

Algorithm CCSDS 122: OpenMP benefits Shared memory: Pthread and OpenMP preferred against MPI or MCAPI OpenMP preferred: Lower modification of original source Easier to synchronize tasks Hybrid Task/Data parallelism solution: Split two tasks: DWT + BPE Data parallelism exploit within each task Pipelining not implemented due to unbalanced execution time on each task ARTEMIS PaPP Review 20146

Parallelization with OpenMP: First approach ARTEMIS PaPP Review FIR ¼ DWT

Each 1d DWT iteration is parallelized by using OpenMP paradigm Based on pragmas (for loops) DWT example: ARTEMIS PaPP Review Parallelization with OpenMP: First approach

Parallelization with OpenMP: Results I First test in x86 platform Intel 3770K Quad-core No BPE parallelization Real images from satellite: ARTEMIS PaPP Review 20149

Parallelization with OpenMP: Results II Amhdal’s law ARTEMIS PaPP Review

Demonstrator: PaPP development system ARTEMIS PaPP Review

Image generation ARTEMIS PaPP Review

TAS-E board : Quad-core Leon3 SoC synthesized in FPGA Real target attached to development PC ARTEMIS PaPP Review

Image download and application execution ARTEMIS PaPP Review

Contribution to PaPP objectives Primary objective 2: Portability of performance across at least two hardware platforms for the application use cases. Portability from x86_64 to Leon platform Primary objective 3: Portability of the software stack across application domains. Portability to aerospace domain Primary objective 4: Software developer productivity is increased OpenMP requires lower programmer skills compared to other parallel programming paradigms ARTEMIS PaPP Review

Next year steps Parallelization of bpe (second part of algorithm) Use of OpenMP tasks (currently for-loops parallelization) Better integration with WP3 tools for performance predictability Evaluation of results ARTEMIS PaPP Review

ARTEMIS PaPP Review