I/O Coordination ER25948 Patricia J Teller and Sarala Arunagiri The University of Texas-El Paso 1.

Slides:



Advertisements
Similar presentations
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
Advertisements

Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Multimedia Streaming in Dynamic Peer-to-Peer Systems and Mobile Wireless.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
MASPLAS ’02 Creating A Virtual Computing Facility Ravi Patchigolla Chris Clarke Lu Marino 8th Annual Mid-Atlantic Student Workshop On Programming Languages.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
1 | Program Name or Ancillary Texteere.energy.gov Water Power Peer Review 51-Mile Hydroelectric Power Project PI: Jim Gordon Earth By Design, Inc.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Abstract Cloud data center management is a key problem due to the numerous and heterogeneous strategies that can be applied, ranging from the VM placement.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
Preserving the Scientific Record: Establishing Relationships with Archives Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review.
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
C HU H AI C OLLEGE O F H IGHER E DUCATION D EPARTMENT O F C OMPUTER S CIENCE Preparation of Final Year Project Report Bachelor of Science in Computer Science.
Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
Computer Science Department University of Texas at El Paso PCAT Performance Counter Assessment Team PAPI Development Team SC 2003, Phoenix, AZ – November.
Education 793 Class Notes Welcome! 3 September 2003.
CSC-115 Introduction to Computer Programming
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Presented by Reliability, Availability, and Serviceability (RAS) for High-Performance Computing Stephen L. Scott and Christian Engelmann Computer Science.
1/20 Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications Sheng Di, Mohamed Slim Bouguerra, Leonardo Bautista-gomez, Franck Cappello.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
1 Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.
Profiling Memory Subsystem Performance in an Advanced POWER Virtualization Environment The prominent role of the memory hierarchy as one of the major bottlenecks.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
DV/dt - Accelerating the Rate of Progress towards Extreme Scale Collaborative Science DOE: Scientific Collaborations at Extreme-Scales:
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Alessandra CiocioMay 11, CSAC meeting1 Mid-Range Computing Working Group Report CSAC and ITSD are working in partnership to determine the value.
Supporting Molecular Simulation-based Bio/Nano Research on Computational GRIDs Karpjoo Jeong Konkuk Suntae.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
The Research Alliance in Math and Science program is sponsored by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department.
Computer and Computational Sciences Division Los Alamos National Laboratory On the Feasibility of Incremental Checkpointing for Scientific Computing Jose.
Diskless Checkpointing on Super-scale Architectures Applied to the Fast Fourier Transform Christian Engelmann, Al Geist Oak Ridge National Laboratory Februrary,
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick.
DER Program Weibull Strength Parameter Requirements for Si 3 N 4 Turbine Rotor Reliability Steve Duffy & Eric Baker Connecticut Reserve Technologies, LLC.
Scalable Information Infrastructure and the Research University Community SC99 19 November 1999 Portland, Oregon Douglas Van Houweling President & CEO,
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
The 2006 REU Summer Symposium marked the end of PSU's 8-week summer undergraduate research program where each participant gave a 15-minute power point.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
C HU H AI C OLLEGE O F H IGHER E DUCATION D EPARTMENT O F C OMPUTER S CIENCE Preparation of Final Year Project Report Bachelor of Science in Computer Science.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
Lawrence Livermore National Laboratory 1 Science & Technology Principal Directorate - Computation Directorate Scalable Fault Tolerance for Petascale Systems.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
ORNL Site Report ESCC Feb 25, 2014 Susan Hicks. 2 Optical Upgrades.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
Quarterly Meeting Spring 2007 NSTG: Some Notes of Interest Adapting Neutron Science community codes for TeraGrid use and deployment. (Lynch, Chen) –Geared.
Advanced Algorithms Analysis and Design
CI Updates and Planning Discussion
OPERATING SYSTEMS CS 3502 Fall 2017
Performance Technology for Scalable Parallel Systems
Building a CMMI Data Infrastructure
CHU HAI COLLEGE OF HIGHER EDUCATION DEPARTMENT OF COMPUTER SCIENCE Preparation of Mid-Term Progress Report Bachelor of Science in Computer Science.
Parallelizing Incremental Bayesian Segmentation (IBS)
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Unistore: Project Updates
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Objective of This Course
ArcGIS Interface to the WRAP Model
Parallel Exact Stochastic Simulation in Biochemical Systems
Presentation transcript:

I/O Coordination ER25948 Patricia J Teller and Sarala Arunagiri The University of Texas-El Paso 1

I/O Coordination ER Extend scalability of checkpoint/restart 2. Reduce I/O system stress & resultant failures 3. Increase meaningful utilization of allocated HEC resources → enhance HEC system performance 2

I/O Coordination ER25948  Problem description:  Via analytical and parametric models, which used parameter sets that describe a few high-end computer systems and a hypothetical application, we demonstrated that there exist potential opportunities for periodic-checkpointing applications to improve their performance by intelligent selection of the checkpoint interval, i.e., the frequency with which checkpointing is performed.  Selection of the checkpoint interval involves a trade-off between execution time and the number of checkpoint I/O operations performed.  Challenge: Prove that such opportunities exist for all values of the models’ parameters  Using our analytical models, in collaboration with an ORNL scientist (Dr. Nagaswara S. V. Rao) mathematically proved that “for all values of the parameters of our analytical models of a periodic-checkpointing application’s execution time and the number of checkpoint I/O operations, regions of opportunity exist that potentially enhance system performance”. Solved a problem identified in our proposal Related paper (for publication) in progress 3

I/O Coordination ER25948  Performed bottleneck analysis of RAID storage systems  Designed, developed, and implemented FAIRIO, an I/O scheduling algorithm that provides differentiated service with respect to disk time utilization of RAID storage utilities  Demonstrated the provisioning of RAID I/O differentiated service under various conditions by performing a set of experiments using a modified version of DiskSim 3.0  Submitting a related paper for publication 4 Designed FAIRIO, an I/O scheduling algorithm that provides differentiated service w.r.t. disk-time utilization of RAID storage utilities Paper being submitted (September) for publication

I/O Coordination ER25948  Established a collaborative working relationship with Texas Advanced Computing Center (TACC), Austin, TX – Drs. Jim Browne, John Hammond, and Dan Stanzione – and John Phillips, NAMD developer  Designing experiments regarding periodic-checkpointing applications, the first two of which will be NAMD and RAxML  NAMD, recipient of a 2002 Gordon Bell Award, is a parallel molecular dynamics code designed for high- performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of processors on high-end parallel platforms and tens of processors on commodity clusters using gigabit ethernet.  RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood-based inference of large phylogenetic trees – it is highly parallelizable.  Analyzing data regarding the experimental system, in particular, MTTI and statistics re: I/O contention  Instrumenting NAMD and RAxMLto record information that can be used to characterize solution time, productive I/O, and defensive I/O  Once the experimental procedure has been tested, we will collaborate with the bigger users of TACC systems to conduct experiments using their codes (we will target codes of interest to DoE) and then will establish collaboration with DoE Labs (likely ORNL, who we have already visited, and LLNL) 5 Almost ready to commence experiments using NAMD and RAxML that, hopefully, will validate the results of our analytical models

I/O Coordination ER25948  Posters  "Using Mathematical Modeling to Enhance the Scalability of Checkpointing Applications," Sarala Arunagiri, John Daly, Patricia J. Teller, and Ricardo Portillo, poster presented by Sarala Arunagiri at the 2010 DOE Applied Mathematics Program Meeting, 3-4 May 2010, Berkeley, CA.  Presentations  “HEC I/O Coordination based on Judicious Checkpointing and I/O QoS,” Patricia J. Teller (PI), DoE Joint Mathematics/Computer Science Institute Kickoff Meeting, SC09, Portland, Oregon, 17 November  “HEC I/O Coordination based on Judicious Checkpointing and I/O QoS (Extended version),” Patricia J. Teller (PI), The National Center for Computational Sciences (NCCS) Seminar Series, Oak Ridge National Laboratory, Oak Ridge, TN, 10 October Note that this visit to Oak Ridge National Laboratory, which was made by both Drs. Teller and Arunagiri (Co-PI) included meetings with many different research groups. We are near a point in our research where it makes sense to follow-up on this visit and suggest possible collaborative activities.  Publications in preparation  “FAIRIO: An Algorithm for I/O Performance Differentiation,” Sarala Arunagiri, Yipkei Kwok, Seetharami R. Seelam, Patricia J. Teller, and Ricardo Portillo, submitting in September