SDN-SF LANL Tasks. LANL Research Tasks Explore parallel file system networking (e.g. LNet peer credits) in order to give preferential treatment to isolated.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Starfish: A Self-tuning System for Big Data Analytics.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Networking Problems in Cloud Computing Projects. 2 Kickass: Implementation PROJECT 1.
AT LOUISIANA STATE UNIVERSITY CCT: Center for Computation & LSU Stork Data Scheduler: Current Status and Future Directions Sivakumar Kulasekaran.
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:
Grid Monitoring Discussion Dantong Yu BNL. Overview Goal Concept Types of sensors User Scenarios Architecture Near term project Discuss topics.
A Flexible Model for Resource Management in Virtual Private Networks Presenter: Huang, Rigao Kang, Yuefang.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Computational Biology: A Measurement Perspective Alden Dima Information Technology Laboratory
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 3 Performance Measurement of TCP/IP Networks.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE II - Network Service Level Agreement (SLA) Establishment EGEE’07 Mary Grammatikou.
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
Space-Based Network Centric Operations Research. Secure Autonomous Integrated Controller for Distributed Sensor Webs Objective Develop architectures and.
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Software Project Management Lecture # 7. What are we studying today? Chapter 24 - Project Scheduling  Effort distribution  Defining task set for the.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Advanced Techniques for Scheduling, Reservation, and Access Management for Remote Laboratories Wolfgang Ziegler, Oliver Wäldrich Fraunhofer Institute SCAI.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
Systems Life Cycle A2 Module Heathcote Ch.38.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
1 MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Net-Centric Software and Systems I/UCRC A Framework for QoS and Power Management for Mobile Devices in Service Clouds Project Lead: I-Ling Yen, Farokh.
Topics Covered Phase 1: Preliminary investigation Phase 1: Preliminary investigation Phase 2: Feasibility Study Phase 2: Feasibility Study Phase 3: System.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Cyberinfrastructure: An investment worth making Joe Breen University of Utah Center for High Performance Computing.
Mantid Stakeholder Review Nick Draper 01/11/2007.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Unit – I Presentation. Unit – 1 (Introduction to Software Project management) Definition:-  Software project management is the art and science of planning.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
Logistical Networking: Buffering in the Network Prof. Martin Swany, Ph.D. Department of Computer and Information Sciences.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Abstract Machine Layer Research in VGrADS
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
SDM workshop Strawman report History and Progress and Goal.
Overview of Workflows: Why Use Them?
Microsoft Virtual Academy
Parallel I/O for Distributed Applications (MPI-Conn-IO)
Process Wind Tunnel for Improving Business Processes
Presentation transcript:

SDN-SF LANL Tasks

LANL Research Tasks Explore parallel file system networking (e.g. LNet peer credits) in order to give preferential treatment to isolated routes within the storage area network. This may be done by adding additional virtual routes to existing LNet routers. Further, LANL will expose this tuning as a service that can be invoked by a custom software-based network controller. Explore dynamically adjusting Lustre network request schedulers to allow preferential storage operation en-queuing. Replicate SDN-SF rack at the local site and collaborate with ORNL in developing and customizing the emulation test bed. Evaluate the role of Data Center TCP in ensuring high-performance flows can be constructed within the data center infrastructure. Extend the concepts within Data Center TCP to high-speed networking technology. In particular, LANL plans to develop a practical reservation- aware congestion control for Data Center TCP, and then extend Data Center TCP techniques to the Lustre networking (LNet) protocol to alleviate bottlenecks (e.g. the parking lot problem).

LANL Task Timetable Year 1: I/O and File System Orchestrator module; Lustre performance optimization for intra-datacenter transfers Year 2: I/O and FS testing; SDN-SF site installation Year 3: I/O testing with remote computation

LANL Year 1 Overview Frequent data movement within the data center – We’ve examined ~30 scientist allocations for LANL’s data center – 3 basic types of science: Simulation, Uncertainty quantification, and High-throughput computing – Each generates massive long-lived data sets that flow throughout the data center – (Published in workflow report w/APEX) Goal: Develop I/O and File System Orchestrator module to improve/reserve storage performance during inter/intra-datacenter transfers

LANL Year 1 Progress To Date: Measuring existing transfers LANL uses a cluster of file transfer agents to move data between platforms, file systems, archive, DTN staging area, etc. – Transfers use a scheduled pftool session, an MPI- based data mover Production FTAs pftool instrumented with Darshan, an I/O tracing framework – Capture and profile all data movement between storage systems within LANL’s center

LANL’s Turquoise Enclave Wolf Mustang Cluster Pinto Cluster L1 L2 L3 WAN Staging Tape Archive Wolf Cluster FTA Cluster Campaign Storage Platform Storage I/O Backbone Network

LANL’s Turquoise Enclave Wolf Mustang Cluster Pinto Cluster L1 L2 L3 WAN Staging Tape Archive Wolf Cluster FTA Cluster Campaign Storage Platform Storage I/O Backbone Network Darshan Instrumentation

Data Retention Time Forever Temporary Setup/Parame terize/Create Geometry Simulate Physics Simulate Physics Viz Initial Input Deck Initial Input Deck Checkpoint Dump Checkpoint Dump Γ*JMTTI Job Begin Job Begin Job End Job End Campaign Initial State Initial State Checkpoint Dump Checkpoint Dump Timeste p Data Set Timeste p Data Set Sampled Data Set Sampled Data Set Down- Sample Down- Sample Post- Process Post- Process Analysis Data Set Analysis Data Set Sim Input Deck Sim Input Deck Phase S1 Phase S2 Phase S3 Phase S4 Phase S5 Checkpoint Dump Checkpoint Dump 4 – 8x per week x per pipeline Timeste p Data Set Timeste p Data Set 5 – 10x per week Simulation Science Pipeline Simulation Science Workflow Data Center Transfers

HTC Science Pipeline Data Retention Time Forever Temporary Generate and/or Gather Input Data Generate and/or Gather Input Data HTC Analysis or UQ Simulation Checkpoint Dump Checkpoint Dump Campaign Shared Input Checkpoint Dump Checkpoint Dump Analysis Phase H1 Phase U1 Phase H2 Phase U2 Phase H3 Phase U3 Checkpoint Dump Checkpoint Dump 4 – 8x per week x per pipeline Private Input File- based Comm. Analysis Data Sets Analysis Data Sets Analysis Data Sets Analysis Data Sets or UQ Science Pipeline … … … … Data-Intensive Science Workflow Data Center Transfers

LANL Year 1: Develop orchestration FTA cluster provides a natural mechanism to orchestrate science flows within data center – Collect data to describe quantities of flows for provisioning – Techniques for guaranteeing flow QoS using FTAs/scheduler/pftool Opportunity to re-play pftool traces to measure approaches for limiting performance variability – Measure multiple pilot approaches

LANL Year 1: Develop orchestration Candidate pilot approaches possible due to FTA control of data transfers – File mix (small/large) co-scheduling Small files limited by MDS throughput May still generate significant interference – Manage total number of transfer-I/O threads – Network and storage watermarking Measurements via darshan critical – PFS modifications can be years-long efforts then longer to make it to production usage – Currently, Lnet route changes are expensive – would making it cheaper be worth the effort

Deliverables Year – Share PFTool data across complex Anonymize data from some LANL enclaves Year 1 – Comparison of performance isolation/maximization techniques Year 2 – Integrate SDN rack into Darwin – Orchestrate techniques identified as valuable Goal is to feed this information into the PFS community, so that feasible algorithms are implemented in NRS, etc. Year 3 – Isolation techniques for remote burst buffer access – BB Software is immature, influence possible All are based on PLFS Chance to get small QoS hooks added – just need to know what we request

Closing Questions