The Data Logistics Toolkit Martin Swany Professor, School of Informatics and Computing Executive Associate Director, Center for Research in Extreme Scale Computing (CREST) Indiana University
The Data Logistics Toolkit Logistics - the management of the flow of resources from the point of origin to the point of consumption The DLT integrates local and distributed storage infrastructure, file transfer software, performance monitoring and tuning The DLT software distribution supports the creation of network- optimized data nodes
DLT Overview Set of packages with configuration scripts, etc. Allows the configuration of –DTN with GridFTP –IBP storage depot for content distribution –Phoebus WAN accelerator –On-ramp for Internet2 AL2S using XSP Includes Periscope/perfSONAR monitoring Automatic network tuning
DTN with AL2S On-Ramp Working with the Globus team at U. Chicago and Argonne Leveraging our eXtensible Session Protocol (XSP) to create end-to-end, “sessions” –user-network interface (UNI) XSP daemon acts as network controller –signals AL2S/OESS, OSCARS, OpenFlow GridFTP XIO driver, updating to use the Globus Transfer Network Controller API Generic, transparent on-ramp to circuit networks like AL2S
WAN Acceleration A key reason the Science DMZ model “works” is the separation of lossy access networks from high-bandwidth, long-latency links Termination of TCP connections in “middleboxes” can increase throughput by reducing the RTT Protocol translation Storage in the network to buffer and burst
Distributed Storage for Content Distribution IBP provides a primitive, scalable, in-network storage service File-like abstractions can be built on top of this Uses a data structure known as an exNode (like a Unix inode) to track allocations These basic building blocks can be used to build various instances –Parallel filesystem –Distributed RAID-like storage –Content distribution network –Bittorrent-like peer to peer transfers
Architecture Unified Network Information Service (UNIS) –Descendant of perfSONAR Lookup and Topology Services –Network and service “graph” Intelligent Data Movement Service (IDMS) –Data dispatcher –Operates on UNIS data –Spawn storage services dynamically in GENI Periscope/perfSONAR –Monitoring for operational integrity and optimization, BLiPP Storage Services –IBP, prototype based on Ceph Other services –Data transfer (GridFTP), WAN acceleration
Earth Observation Depot Network (EODN) – An open, community specific content distribution network for remote sensing data
Landsat data Landsat 8 launched February 13 th, 2013 Covers the entire land surface of the Earth every 16 days – 8 day offset from Landsat 7 –~700 scenes each day Each scene contains a GeoTIFF product: high-resolution sensor images –~1GB compressed, 2GB uncompressed Traditionally used for environmental monitoring and land use and land cover change studies
EODN Client EODN (DLT) WISC IU NYSER MIZZ RealEarth UW-Madison UNIS DMS discover / measure (3) stage sensing data (2) harvest (6) Processing… (7) WMS upload (5) fast download EODN Harvester (1) subscribe (4) publish web GUI Landsat Ground Network
Cisco Appliance Platform In collaboration with Internet2, Cisco and Fusion-io Cisco C220 server –2x Intel® Xeon® E5-2680, 16 64GB DDR3 RAM –Fusion-io ioDrive2 1.2 TB CentOS 6.4 Linux with DLT RPMs and tuning for data transfer throughput
Acknowledgements Staff Scientist Dr. Ezra Kissel leads the DLT development efforts, PI of the GENI IDMS effort CC-NIE integration project with U. Tennessee and Vanderbilt U. CC-NIE integration project with the Globus team at U. Chicago and Argonne Nat’l Lab EODN development with AmericaView, U. Wisconsin 12
Phoebus-SLaBS performance GridFTP transfers over dedicated 10G path, increasing WAN latency, 4ms LAN RTT and.001% edge loss