International Data Placement Laboratory

International Data Placement Laboratory
Philip Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Qualcomm Institute

Partners University of Wisconsin, Madison Beihang University CNIC
Miron Livny Greg Thain Beihang University Prof. Depei Qian Dr. Hailong Yang Guang Wei Zhongzi Luan CNIC Dr. Luo Ze Dr. Gang Qin

Existential Problems We know the “speed of light” through the network via network measurements tools/suites like perfSONAR Many researchers only really cares about Can I move my data reliably from point A to point B Will it complete in a timely manner? D2D: Disk-to-disk AND Day-to-Day. “Security” is more involved than memory-to-memory networking tests -- touching disks is inherently more invasive Is network measurement always a good proxy for disk-to-disk performance?

iDPL – Data Placement Lab
Proof-of-principle project (EAGER, NSF ACI# ) Routinely measure end-to-end and disk-to-disk performance among a set of international endpoints Compare performance of different data movement protocols raw socket, scp, FDT, UDT, GridFTP, iRODs, … Correlate to raw network performance IPv4 and IPv6 whenever possible Re-use as much existing “command-and-control” infrastructure as possible Pay attention to some routine security concerns

Known Network Traversals
Participants University of Wisconsin, Madison University of California, San Diego University of AZ (iRODS testing) CNIC – Computer Network Information Center, Beijing Beihang University, Beijing Known Network Traversals Internet2 CSTNET CERNET CENIC PacificWave Big10 OmniPOP Sun Corridor Network Campus Networks

Repeat Test every N hours
High Level Structure Test Manifest Network test (iperf) Network test (iperfV6) Move file via raw socket Move file via FDTv6 Move file via UDT Move file via GridFtp Submit Test as an HTCondor Job. Let HTCondor handle errors, recovery, reporting, iteration Repeat Test every N hours Separate concerns – Let Condor do what it does well. Scheduling Recovery Output back to submitter Wisconsin  Beihang is a different experiment than Beihang  Wisconsin

HTCondor: Generic Execution Pool
Transpacific HTCondor Execution Pool Use HTCondor to assemble resources into a common execution environment Must trust each other enough to support remote execution NO common username is required Pool password limits who can be added to global exec pool Current Configuration Job submission at 4 sites Host-based firewalls limits access

Sample Results 1: UCSD  Beijing (v4 & v6)
IPv6 to BUAA has very good throughput, but highly variable. Testing at 1gbit/s CNIC IPv6 missing performance (IP address changed), Now highly variable 1/10th peak to BUAA BUAA and CNIC are less than 5KM apart in Beijing Two different networks Raw Network V4 and V6 US  CHINA (Beihang,CNIC)

Sample Results 2 : UCSD  BUAA (v4 & v6)
Highly variable raw network performance Netcat (Raw Socket) mirrors network – good correlation FDT stable, but low performance. Still investigating SCP, Similar to FDT IPv4 essentially flatlined

Sample Results 3 – 24 Day Trace Wisc -> UA
Y axis scale is 1/10th of left graph Highly variable raw network performance iRODS and iRODSPut % of the raw network Raw socket ~ 90MB/sec % of network (Likely, this is disk performance limits)

Some Details on Implementation
Custom Python Classes and Program to Define Placement Experiments Needs to support what happens when something goes wrong Disk space, firewall, broken stack after software upgrade, … Current logic, if any step times out in Test Manifest, abort and retry next hour (record what you can) On github, of course Some design tenets “Servers” are setup and torn down in user space for experiments E.g., GridFTP server does not need to be installed by root to support GridFTP) Servers are setup as one-shot, whenever possible As secure as HTCondor Record everything (even if most of it is redundant)

Partial Class Hierarchy
Custom software Key Observation Different Placement Algorithms follow roughly the same pattern Client (data flows from client), Server (data flows to server) Setup exchange - Port utilized (e.g. iperf server is on port 5002) Public key credentials (e.g. ssh daemon only supports connection with specific key) Completion exchange MD5 sum (or other hash) of sent file Mover FDT SCP Netcat iPerf iPerfV6 FDT v6 SCPv6 Git Clone Partial Class Hierarchy Git Clone V6 netcat V6

HTCondor Chirp – Per Job Scratchpad
Client Server Job 36782 Job 36782 Chirp Common scratchpad Specific to Job 36782 placement.py read placement.py write Test Manifest Test Manifest SCPport = 5003 MD5 = 0x12798ff write read Executes as local user “userAlpha” Executes as local user “userBeta” Chirp provides generic rendevous information All Chirp information is logged, No private (e.g. temporary passwords) info written into the scratchpad Client executing in an HTCondor slot Server executing in an HTCondor slot

Commonly Available Components
Use HTCondor as job launching, job control Directed Acyclic Graph Scheduler enables periodic submission HTCondor’s Chirp mechanism enables “rendevouz” for port advertisement Well understood user/security model Scales well, but iDPL uses it in a Non-HTC mode Graphite, Carbon and Whisper Database Time-series display. Open-sourced from Orbitz Used at multiple large-scale data centers Python > 2.6.x Git – Software revision AND raw data stewardship

Sites Code: github.com/idpl/placement
Graphite server (just a VM): HTCondor - Others FDT - Graphite/Carbon/Whisper - GridFTP - stable/gridftp/ iRODS UDT -

International Data Placement Laboratory

Similar presentations

Presentation on theme: "International Data Placement Laboratory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

International Data Placement Laboratory

Similar presentations

Presentation on theme: "International Data Placement Laboratory"— Presentation transcript:

Similar presentations

About project

Feedback