Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.

Slides:



Advertisements
Similar presentations
Threads. Objectives To introduce the notion of a thread — a fundamental unit of CPU utilization that forms the basis of multithreaded computer systems.
Advertisements

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Chapter 5 Threads os5.
Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 7 Chapter 4: Threads (cont)
Distributed Processing, Client/Server, and Clusters
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Threads Clients Servers Code Migration Software Agents Summary
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.
3.5 Interprocess Communication
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Figure 1.1 Interaction between applications and the operating system.
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
Session-02. Objective In this session you will learn : What is Class Loader ? What is Byte Code Verifier? JIT & JAVA API Features of Java Java Environment.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Java Introduction 劉登榮 Deng-Rung Liu 87/7/15. Outline 4 History 4 Why Java? 4 Java Concept 4 Java in Real World 4 Language Overview 4 Java Performance!?
Performance Technology for Complex Parallel Systems Part 2 – Complexity Scenarios Sameer Shende.
Computer System Architectures Computer System Software
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.
Performance Technology for Complex Parallel Systems Part 2 – Complexity Scenarios Sameer Shende.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Instrumentation in Software Dynamic Translators for Self-Managed Systems Bruce R. Childers Naveen Kumar, Jonathan Misurda and Mary.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
Department of Computer Science and Software Engineering
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Background Computer System Architectures Computer System Software.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Computer System Structures
Introduction to threads
Applications Active Web Documents Active Web Documents.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Productive Performance Tools for Heterogeneous Parallel Computing
University of Technology
Performance Technology for Complex Parallel and Distributed Systems
Chapter 4: Threads.
TAU: A Framework for Parallel Performance Analysis
Allen D. Malony Computer & Information Science Department
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
MPJ: A Java-based Parallel Computing System
Outline Introduction Motivation for performance mapping SEAA model
Performance Technology for Complex Parallel and Distributed Systems
Parallel Program Analysis Framework for the DOE ACTS Toolkit
JIT Compiler Design Maxine Virtual Machine Dhwani Pandya
Presentation transcript:

Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon Integration and Application of the TAU Performance System in Parallel Java Environments

SMPAG Java Interest Group May 24, 2002 Java HPC and Performance Technology  Interest in performance tools for Java HPC  Shared- and distributed-memory parallelism  Multi-level (semantic) performance views  Java environment challenges performance technology  Language and packages  object-oriented, interfaces, RMI, reflection, …  Java Virtual Machine (JVM) execution model  thread mapping, scheduling, SMP execution, event access  Just-In-Time (JIT) compilation and dynamic loading  Java Native Interface (JNI)  inter-language execution, non-Java events / execution  Portability of performance tools and methods

SMPAG Java Interest Group May 24, 2002 Research Problems  General How to create robust and ubiquitous performance technology for the analysis and tuning of parallel high-performance software and systems in the presence of (evolving) complexity challenges?  Specific Can performance technology developed for use in HPC environments be successfully applied to parallel Java environments, and how are the new performance instrumentation, measurement, and analysis problems addressed?

SMPAG Java Interest Group May 24, 2002 Talk Outline  Java HPC and Performance Technology  TAU Performance System  Computation model for performance technology  TAU performance system toolkit  Target HPC Java Environment  SMP clusters and distributed computing  Multi-threading + MPI message passing  Integration (Adaption) of TAU Performance System  User-level, JVM-level, JNI-level, inter-language  Example “Mixed-Mode” Application  Conclusions

SMPAG Java Interest Group May 24, 2002 TAU Performance System  Tuning and Analysis Utilities  Performance system framework  scalable parallel and distributed HPC  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated performance toolkit  instrumentation, measurement, analysis, visualization  Portable facility based on open software approach  Robust and widely applied

SMPAG Java Interest Group May 24, 2002 General Complex System Computation Model  Node: physically distinct shared memory machine  Message passing node interconnection network  Context: distinct virtual memory space within node  Thread: execution threads (user/system) in context memory Node VM space Context SMP Threads node memory … … Interconnection Network Inter-node message communication * * physical view model view   

SMPAG Java Interest Group May 24, 2002 TAU Performance System Framework EPILOG Paraver

SMPAG Java Interest Group May 24, 2002 Target HPC Java Environment  Hybrid, multi-language scientific applications  Java + {C, C++, Fortran} libraries  Numerical, system, communications support  Performance optimization  Mixed-mode parallelism  Multi-threaded shared memory parallelism  Distributed memory parallelism using communications  Cluster of SMP nodes  Scalable parallelism  Distributed

SMPAG Java Interest Group May 24, 2002 Performance Technology Issues  Object-oriented programming  Object-based performance analysis  High-level classes and performance mapping  Multi-level performance events  User / source / byte code / VM / OS / libraries / external  Multiple performance instrumentation strategies  Integration of performance measurements  Mixed-mode parallel computation  Multi-threading performance measurement  Cross-mode performance correspondence  Hybrid, multi-language performance measurement

SMPAG Java Interest Group May 24, 2002 Java Source-Level Instrumentation  TAU Java package  User-defined events  TAU.Profile class for new “timers”  Start/Stop  Performance data output at end

SMPAG Java Interest Group May 24, 2002 TAU Java Source Instrumentation Architecture  Any code section can be measured  Portability  Measurement options  Profiling, tracing  Limitations  Source access only  Lack of thread information  Lack of node information Java program TAU.Profile class (init, data, output) TAU package Profile database stored in JVM heap TAU as dynamic shared object JNI C bindings Profile DB JNI TAU

SMPAG Java Interest Group May 24, 2002 Multi-Threading Performance Measurement  General issues  Thread identity and per-thread data storage  Performance measurement support and synchronization  Fine-grained parallelism  different forms and levels of threading  greater need for efficient instrumentation  TAU general threading and measurement model  Common thread layer and measurement support  Interface to system specific libraries (reg, id, sync)  Target different thread systems with core functionality  Pthreads, Windows, Java, OpenMP

SMPAG Java Interest Group May 24, 2002 Virtual Machine Performance Instrumentation  Integrate performance system with VM  Captures robust performance data (e.g., thread events)  Maintain features of environment  portability, concurrency, extensibility, interoperation  Allow use in optimization methods  JVM Profiling Interface (JVMPI)  Generation of JVM events and hooks into JVM  Profiler agent (TAU) loaded as shared object  registers events of interest and address of callback routine  Access to information on dynamically loaded classes  No need to modify Java source, bytecode, or JVM

SMPAG Java Interest Group May 24, 2002 JVMPI Events  Method transition events  Memory events  Heap arena events  Garbage collection events  Class events  Global reference events  Monitor events  Monitor wait events  Thread events  Dump events  Virtual machine events

SMPAG Java Interest Group May 24, 2002 TAU Java JVM Instrumentation Architecture JVMPI Thread API Event notification Java program Profile DB JNI TAU  Robust set of events  Portability  Access to thread info  Measurement options  Limitations  Overhead  Many events  Event control  No user-defined events

SMPAG Java Interest Group May 24, 2002 Java Multi-Threading Performance (Test Case)  Profile and trace Java (JDK 1.2+) applications  Observe user-level and system-level threads  Observe events for different Java packages  /lang, /io, /awt, …  Test application  SciVis, NPAC, Syracuse University %./configure -jdk= % setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH\: / /lib % java -XrunTAU svserver

SMPAG Java Interest Group May 24, 2002 TAU Profiling of Java Application (SciVis) Profile for each Java thread Captures events for different Java packages 24 threads of execution!

SMPAG Java Interest Group May 24, 2002 TAU Tracing of Java Application (SciVis) Performance groups Timeline display Parallelism view

SMPAG Java Interest Group May 24, 2002 Vampir Dynamic Call Tree View (SciVis) Per thread call tree Annotated performance Expanded call tree

SMPAG Java Interest Group May 24, 2002 Message Communications Performance  Explicit message communications libraries for Java  MPI performance measurement  MPI profiling interface - link-time interposition library  TAU wrappers in native profiling interface library  Send/Receive events and communication statistics  mpiJava (Syracuse, JavaGrande, 1999)  Java wrapper package  JNI C bindings to MPI communication library  Dynamic shared object (libmpijava.so) loaded in JVM  prunjava calls mpirun to distribute program to nodes  Contrast to Java RMI-based schemes (MPJ, CCJ)

SMPAG Java Interest Group May 24, 2002 TAU Java Instrumentation Architecture Java program mpiJava package Native MPI library  No source instrumentation  Portability  Measurement options  Limitations  MPI events only  No mpiJava events  Node info only  No thread info JNI TAU package Profile DB TAU MPI profiling interface TAU wrapper Native MPI library

SMPAG Java Interest Group May 24, 2002 Mixed-mode Parallel Programs (Java + MPI)  Java threads and MPI communications  Shared-memory multi-threading events  Message communications events  Unified performance measurement and views  Integration of performance mechanisms  Integrated association of performance events  thread event and communication events  user-defined (source-level) performance events  JVM events  Support for performance measurement scaling  Support for performance data access

SMPAG Java Interest Group May 24, 2002 Instrumentation and Measurement Cooperation  Problem  JVMPI doesn’t see MPI events (e.g., rank (node))  MPI profiling interfaces doesn’t see threads  Source instrumentation doesn’t see either!  Need cooperation between interfaces  MPI exposes rank, gets thread information  JVMPI exposes thread information, get rank  Source instrumentation gets both  Post-mortem matching of sends and receives  Selective instrumentation  java -XrunTAU:exclude=java/io,sun

SMPAG Java Interest Group May 24, 2002 JVMPI Thread API Event notification TAU Java Instrumentation Architecture Java program TAU package mpiJava package MPI profiling interface TAU wrapper Native MPI library Profile DB JNI TAU

SMPAG Java Interest Group May 24, 2002 Parallel Java Game of Life (Profile)  mpiJava testcase  4 nodes, 28 threads Thread 4 executes all MPI routines Merged Java and MPI event profiles Node 0 Node 1 Node 2

SMPAG Java Interest Group May 24, 2002 Parallel Java Game of Life (Trace)  Integrated event tracing  Merged trace viz  Node process grouping  Thread message pairing  Vampir display  Multi-level event grouping

SMPAG Java Interest Group May 24, 2002 Node / Thread Event Timeline  Temporal event behavior  Event relationships

SMPAG Java Interest Group May 24, 2002 Integrated Performance View (Callgraph)  Source level  MPI level  Java packages level

SMPAG Java Interest Group May 24, 2002 Conclusion  Integrate robust and portable performance system (TAU) in Java HPC environment  Apply performance system to observe multiple levels of Java HPC operation  Leverage performance system framework based on common performance measurement API  Key: define multi-level events and define associations  Opportunities for improvement and application  JVM instrumentation and JIT (dynamic compilation)  Runtime access to performance data  Java scientific packages, communication libraries (CCJ, MPJ, RMI), // compilers (JOMP), applications,..

SMPAG Java Interest Group May 24, 2002 More Information and Acknowledgments  URLs  TAU:  Grant support (TAU)  DOE 2000 ACTS    DOE ASCI Level 3 (LANL, LLNL)  DARPA

SMPAG Java Interest Group May 24, 2002 TAU Distributed Monitoring Framework  Extend usability of TAU performance analysis  Access TAU performance data during execution  Framework model  each application context is a performance data server  monitor agent thread is created within each context  client processes attach to agents and request data  server thread synchronization for data consistency  pull mode of interaction  Distributed TAU performance data space  “A Runtime Monitoring Framework for the TAU Profiling System” (ISCOPE ‘99)

SMPAG Java Interest Group May 24, 2002 TAU Distributed Monitor Architecture  Each context has a monitor agent  Client in separate thread directs agent  Pull model of interaction TAU profile database

SMPAG Java Interest Group May 24, 2002 Java Implementation of TAU Monitor  Motivations  More portable monitor middleware system (RMI)  More flexible and programmable server interface (JNI)  More robust client development (EJB, JDBC, Swing)