TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.

Slides:



Advertisements
Similar presentations
Barcelona Supercomputing Center. The BSC-CNS objectives: R&D in Computer Sciences, Life Sciences and Earth Sciences. Supercomputing support to external.
Advertisements

Performance Analysis Tools for High-Performance Computing Daniel Becker
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Introduction to Maven 2.0 An open source build tool for Enterprise Java projects Mahen Goonewardene.
University of Houston So What’s Exascale Again?. University of Houston The Architects Did Their Best… Scale of parallelism Multiple kinds of parallelism.
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
Web-based Distributed Flexible Manufacturing System (FMS) Monitoring and Control Student: Wei Liu Instructor: Dr. Chang Apr. 23, 2003.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Projects Using gem5 ParaDIME (2012 – 2015) RoMoL (2013 – 2018)
Contemporary Languages in Parallel Computing Raymond Hummel.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
1 THE ARCHITECTURE FOR THE DIGITAL WORLD TM THE ARCHITECTURE FOR THE DIGITAL WORLD Embedded Linux for ARM Architecture.
Database Environments Assignment Two By Benjamin Turner Assignment Two By Benjamin Turner 8/24/2015.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Testing Tools using Visual Studio Randy Pagels Sr. Developer Technology Specialist Microsoft Corporation.
Secure Web Applications via Automatic Partitioning Stephen Chong, Jed Liu, Andrew C. Meyers, Xin Qi, K. Vikram, Lantian Zheng, Xin Zheng. Cornell University.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
1 Copyright © 2004, Oracle. All rights reserved. Introduction to Oracle Forms Developer and Oracle Forms Services.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
A Distributed Computing System Based on BOINC September - CHEP 2004 Pedro Andrade António Amorim Jaime Villate.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Visual Linker Final presentation.
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
Bright Cluster Manager Advanced cluster management made easy Dr Matthijs van Leeuwen CEO Bright Computing Mark Corcoran Director of Sales Bright Computing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Service Computation 2010November 21-26, Lisbon.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.
07/06/11 New Features of WS-PGRADE (and gUSE) 2010 Q Q2 Miklós Kozlovszky MTA SZTAKI LPDS.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Nick Draper 05/11/2008 Mantid Manipulation and Analysis Toolkit for ISIS data.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Visual Linker Prototype presentation.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
Operating System What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. An operating.
1 The Portland Group, Inc. Brent Leback HPC User Forum, Broomfield, CO September 2009.
Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
Design of an Integrated Robot Simulator for Learning Applications Brendon Wilson April 15th, 1999.
Application Communities Phase II Technical Progress, Instrumentation, System Design, Plans March 10, 2009.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Configuring, Managing and Maintaining Windows Server® 2008 Servers Course 6419A.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Mobile Analyzer A Distributed Computing Platform Juho Karppinen Helsinki Institute of Physics Technology Program May 23th, 2002 Mobile.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Visual Linker ADD presentation. slide 2  Project Overview And Vision  Project System View  The System Life Cycle  Data Model and.
Daniele Lezzi Execution of scientific workflows on federated multi-cloud infrastructures IBERGrid Madrid, 20 September 2013.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Parallel Programming Models
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Self Healing and Dynamic Construction Framework:
Heterogeneous Computation Team HybriLIT
Many-core Software Development Platforms
Module 1: Getting Started
Soo Park and Janine Aquino
Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Presentation transcript:

TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15 1

Overview HPC Traces - Introduction Traces for Application Developers Traces for Computer Architects Traces - Objections Goals BSC Trace Tools Extrae Paraver TraceRep Architecture Design Implementation Limitations Snapshots Conclusions and Future Work TraceRep IWSG'15 2

1. HPC Traces – Introduction HPC traces are sequences of events and messages recorded during the execution of a parallel HPC program. TraceRep IWSG'15 3

1.1. Traces for Application Developers TraceRep IWSG'15 Computation Synchronization Waits Point to Point Messages Load Unbalance Evaluation, tuning and optimization of applications 4

1.2. Traces for Computer Architects TraceRep IWSG'15 5 Evaluate computer architectures. Workloads for feeding simulators. Application Binaries Application Execution Extraction Tool Hardware model 1 Hardware model 2 Hardware model 3 Stats 1 Stats 2 Stats 3 Simulator

1.3. Traces - Objections Complexity of tools and environment. Limited access to HPC clusters. Traces can reach very large sizes. Traces are often not shared between researchers Traces are hard to obtain and distribute. The tracing effort is not recognized. TraceRep IWSG'15 6

1.4. TraceRep - Goals User friendly interface to collect traces. Support with multiple clusters. Easy to incorporate new clusters. Public trace repository. Computer architects can access to traces of parallel applications for their experiments. Users can upload their own traces for the community. Author encouragement: Authorship: Users can set Creative Commons licenses which protect the authorship of their traces. Citation of related work: Users can add a citation (.bib file) of a paper which studied the traced application, so it can be cited when the trace is used. TraceRep IWSG'15 7

Overview HPC Traces - Introduction Traces for Application Developers Traces for Computer Architects Traces - Objections Goals BSC Trace Tools Extrae Paraver TraceRep Architecture Design Implementation Limitations Snapshots Conclusions and Future Work TraceRep IWSG'15 8

2.1. Extrae Collects information during the program execution and generates traces: Runtime entries and exits, hardware counters, user functions, periodic samples… Supported programming models: MPI, OpenMP, CUDA, OpenCL, pthreads, OmpSs, Java, Python. Supported platforms: Linux clusters, BlueGene/Q, Cray, nVidia GPUs, Intel Xeon Phi, ARM, Android. TraceRep IWSG'15 Extrae configuration file 9

2.2. BSC Tools - Paraver TraceRep IWSG'15 Very flexible visualization tool of trace-files. 10

Overview HPC Traces - Introduction Traces for Application Developers Traces for Computer Architectures Traces - Objections Goals BSC Trace Tools Extrae Paraver TraceRep Architecture Design Implementation Limitations Sanpshots Conclusions and Future Work IWSG'15 11

3.1. TraceRep - Architecture TraceRep IWSG'15 12

3.2. TraceRep - Design TraceRep IWSG'15 13

3.2. TraceRep - Implementation TraceRep IWSG'15 Drupal’s modules covered most of the features. Trace extraction service has implementations in both sides: Gateway side: new Drupal module. Clusters side: Python scripts adapted to the specific cluster. 14 Drupal Cluster Trace Extraction Experiment Periodic Task Cluster Filesystem TraceRep directory Compiltation ToolsExtrae Resource Manager Makefile Scripts Is the experiment over?

3.4. TraceRep – Current prototype limitations Security: TraceRep users upload code to the HPC clusters Alternatives: Restricted privileges for the user account of TraceRep Require a cluster account per-user to extract traces Compilation: Paths to compilers and libraries can vary from cluster to cluster Compilation constrains: a generic Makefile is currently used for all source codes. Applications that use complex building tools are currently no supported. Alternative: provide a unified environment for compilation. Storage: Storage in the gateway server is limited (limitation of the service used) Alternative: $$$ TraceRep IWSG'15 15

3.5. Snapshots TraceRep IWSG'

Overview HPC Traces - Introduction Traces for Application Developers Traces for Computer Architectures Traces - Objections Goals BSC Trace Tools Extrae Paraver TraceRep Architecture Design Implementation Limitations Snapshots Conclusions and Future Work TraceRep IWSG'15 17

4. TraceRep – Conclusions Traces are very useful for HPC parallel application developers and computer architects. TraceRep provides a user friendly interface to collect and share traces. It encourage to share traces through trace licensing and citations. There are some limitations that must be addressed, regarding security, compilation and storage. TraceRep IWSG'15 18

4. TraceRep – Future work Alternative frameworks to replace the Drupal prototype: Liferay [1] Apache Airavata [2] Improve the compilation toolchain to present a consistent view on different clusters and allow for more complex codes. Exploiting the advanced features of Paraver is complex. We are seeking for a way to integrate Paraver in TraceRep. TraceRep IWSG'15 19 [1] “Liferay” Available: [2] “Apache Airavata architecture overview,” Available:

TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo Jose Luis Bosque University of Cantabria TraceRep IWSG'15 20 Thank you for your attention