International Data Placement Laboratory

Slides:



Advertisements
Similar presentations
White Paper Exercise Objective: Develop specific joint projects in areas of mutual interest and based on existing relationships, which could result in.
Advertisements

Todd Tannenbaum Condor Team GCB Tutorial OGF 2007.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Chapter 17 Networking Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
A Computation Management Agent for Multi-Institutional Grids
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Lesson 11-Virtual Private Networks. Overview Define Virtual Private Networks (VPNs). Deploy User VPNs. Deploy Site VPNs. Understand standard VPN techniques.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Module 2: Planning to Install SQL Server. Overview Hardware Installation Considerations SQL Server 2000 Editions Software Installation Considerations.
Week #10 Objectives: Remote Access and Mobile Computing Configure Mobile Computer and Device Settings Configure Remote Desktop and Remote Assistance for.
Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.
White Paper Exercise Objective: Develop specific joint projects in areas of mutual interest and based on existing relationships, which could result in.
Remote Desktop Services Remote Desktop Connection Remote Desktop Protocol Remote Assistance Remote Server Administration T0ols.
OSG Public Storage and iRODS
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Cisco S2 C4 Router Components. Configure a Router You can configure a router from –from the console terminal (a computer connected to the router –through.
Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
1 Networking Chapter Distributed Capabilities Communications architectures –Software that supports a group of networked computers Network operating.
ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Networking in Linux. ♦ Introduction A computer network is defined as a number of systems that are connected to each other and exchange information across.
Objective What is RFT ? How does it work Architecture of RFT RFT and OGSA Issues Demo Questions.
Sponsored by the National Science Foundation Today’s Exercise.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
1 Tips for the assignment. 2 Socket: a door between application process and end- end-transport protocol (UDP or TCP) TCP service: reliable transfer of.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
Building the International Data Placement Lab Greg Thain Center for High Throughput Computing.
HTCondor-CE. 2 The Open Science Grid OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories.
Secure services Unit-IV CHAP-1
Chapter 7: Using Windows Servers
Dynamic Extension of the INFN Tier-1 on external resources
VMware ESX and ESXi Module 3.
New Capabilities of the Web Enabled User Interfaces
Chapter 7. Identifying Assets and Activities to Be Protected
Open OnDemand: Open Source General Purpose HPC Portal
Computing Clusters, Grids and Clouds Globus data service
Science, Networks, CANS: A proposal for a new model of collaboration
HTCondor Networking Concepts
HTCondor Networking Concepts
Backdoor Attacks.
Example: Rapid Atmospheric Modeling System, ColoState U
Computing infrastructure for accelerator controls and security-related aspects BE/CO Day – 22.June.2010 The first part of this talk gives an overview of.
SUBMITTED BY: NAIMISHYA ATRI(7TH SEM) IT BRANCH
Cloud based Open Source Backup/Restore Tool
Lab 1 introduction, debrief
NAT , Device Discovery Chapter 9 , chapter 10.
ECE544: Software Assignment 3
Cloud Computing.
Philip Papadopoulos, Ph.D University of California, San Diego
International Data Placement Laboratory
OpenFlow Switch as a low-impact Firewall
Haiyan Meng and Douglas Thain
Providing Secure Storage on the Internet
HTCondor Security Basics HTCondor Week, Madison 2016
ECEN “Internet Protocols and Modeling”
Operating Systems Lecture 4.
Chapter 2: The Linux System Part 1
Cloud computing mechanisms
Genre1: Condor Grid: CSECCR
* Introduction to Cloud computing * Introduction to OpenStack * OpenStack Design & Architecture * Demonstration of OpenStack Cloud.
Chapter 2: Operating-System Structures
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
Chapter 2: Operating-System Structures
Preventing Privilege Escalation
Presentation transcript:

International Data Placement Laboratory Philip Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Qualcomm Institute

Partners University of Wisconsin, Madison Beihang University CNIC Miron Livny Greg Thain Beihang University Prof. Depei Qian Dr. Hailong Yang Guang Wei Zhongzi Luan CNIC Dr. Luo Ze Dr. Gang Qin

Existential Problems We know the “speed of light” through the network via network measurements tools/suites like perfSONAR Many researchers only really cares about Can I move my data reliably from point A to point B Will it complete in a timely manner? D2D: Disk-to-disk AND Day-to-Day. “Security” is more involved than memory-to-memory networking tests -- touching disks is inherently more invasive Is network measurement always a good proxy for disk-to-disk performance?

iDPL – Data Placement Lab Proof-of-principle project (EAGER, NSF ACI#1339508) Routinely measure end-to-end and disk-to-disk performance among a set of international endpoints Compare performance of different data movement protocols raw socket, scp, FDT, UDT, GridFTP, iRODs, … Correlate to raw network performance IPv4 and IPv6 whenever possible Re-use as much existing “command-and-control” infrastructure as possible Pay attention to some routine security concerns

Known Network Traversals Participants University of Wisconsin, Madison University of California, San Diego University of AZ (iRODS testing) CNIC – Computer Network Information Center, Beijing Beihang University, Beijing Known Network Traversals Internet2 CSTNET CERNET CENIC PacificWave Big10 OmniPOP Sun Corridor Network Campus Networks

Repeat Test every N hours High Level Structure Test Manifest Network test (iperf) Network test (iperfV6) Move file via raw socket Move file via FDTv6 Move file via UDT Move file via GridFtp Submit Test as an HTCondor Job. Let HTCondor handle errors, recovery, reporting, iteration Repeat Test every N hours Separate concerns – Let Condor do what it does well. Scheduling Recovery Output back to submitter Wisconsin  Beihang is a different experiment than Beihang  Wisconsin

HTCondor: Generic Execution Pool Transpacific HTCondor Execution Pool Use HTCondor to assemble resources into a common execution environment Must trust each other enough to support remote execution NO common username is required Pool password limits who can be added to global exec pool Current Configuration Job submission at 4 sites Host-based firewalls limits access

Sample Results 1: UCSD  Beijing (v4 & v6) IPv6 to BUAA has very good throughput, but highly variable. Testing at 1gbit/s CNIC IPv6 missing performance (IP address changed), Now highly variable 1/10th peak to BUAA BUAA and CNIC are less than 5KM apart in Beijing Two different networks Raw Network V4 and V6 US  CHINA (Beihang,CNIC)

Sample Results 2 : UCSD  BUAA (v4 & v6) Highly variable raw network performance Netcat (Raw Socket) mirrors network – good correlation FDT stable, but low performance. Still investigating SCP, Similar to FDT IPv4 essentially flatlined

Sample Results 3 – 24 Day Trace Wisc -> UA Y axis scale is 1/10th of left graph Highly variable raw network performance iRODS and iRODSPut 5-15 % of the raw network Raw socket ~ 90MB/sec 35-100% of network (Likely, this is disk performance limits)

Some Details on Implementation Custom Python Classes and Program to Define Placement Experiments Needs to support what happens when something goes wrong Disk space, firewall, broken stack after software upgrade, … Current logic, if any step times out in Test Manifest, abort and retry next hour (record what you can) On github, of course Some design tenets “Servers” are setup and torn down in user space for experiments E.g., GridFTP server does not need to be installed by root to support GridFTP) Servers are setup as one-shot, whenever possible As secure as HTCondor Record everything (even if most of it is redundant)

Partial Class Hierarchy Custom software Key Observation Different Placement Algorithms follow roughly the same pattern Client (data flows from client), Server (data flows to server) Setup exchange - Port utilized (e.g. iperf server is on port 5002) Public key credentials (e.g. ssh daemon only supports connection with specific key) Completion exchange MD5 sum (or other hash) of sent file Mover FDT SCP Netcat iPerf iPerfV6 FDT v6 SCPv6 Git Clone Partial Class Hierarchy Git Clone V6 netcat V6

HTCondor Chirp – Per Job Scratchpad Client Server Job 36782 Job 36782 Chirp Common scratchpad Specific to Job 36782 placement.py read placement.py write Test Manifest Test Manifest SCPport = 5003 MD5 = 0x12798ff write read Executes as local user “userAlpha” Executes as local user “userBeta” Chirp provides generic rendevous information All Chirp information is logged, No private (e.g. temporary passwords) info written into the scratchpad Client executing in an HTCondor slot Server executing in an HTCondor slot

Commonly Available Components Use HTCondor as job launching, job control Directed Acyclic Graph Scheduler enables periodic submission HTCondor’s Chirp mechanism enables “rendevouz” for port advertisement Well understood user/security model Scales well, but iDPL uses it in a Non-HTC mode Graphite, Carbon and Whisper Database Time-series display. Open-sourced from Orbitz Used at multiple large-scale data centers Python > 2.6.x Git – Software revision AND raw data stewardship

Sites Code: github.com/idpl/placement Graphite server (just a VM): http://vi-1.rocksclusters.org HTCondor - https://research.cs.wisc.edu/htcondor Others FDT - http://monalisa.caltech.edu/FDT/ Graphite/Carbon/Whisper - http://graphite.readthedocs.io/ GridFTP - http://toolkit.globus.org/toolkit/docs/latest- stable/gridftp/ iRODS https://irods.org/ UDT - http://udt.sourceforge.net/