Download presentation
Presentation is loading. Please wait.
Published byDorte Møller Modified over 6 years ago
1
BigData Express: Toward Predictable, Schedulable, and High-performance Data Transfer
Wenji Wu, April 4, 2019
2
Why BigData Express? Targeted at optimizing data transfers in high-speed networks Large-scale data movement of Big Data Science High-speed network environments (40/100GE+) Builds on Multicore-Aware Data Transfer Middleware (MDTM) mdtmFTP: a high-performance data transfer tool Pipelined I/O-centric design to streamline data transfer MDTM optimizes use of underlying multicore system Extremely efficient in transferring of Lots Of Small Files (LOSF) Orchestrates system (DTN), storage, & network (SDN) resources To provide full end-to-end performance optimization
3
BigData Express BigData Express: a schedulable, predictable, and high-performance data transfer service A peer-to-peer, scalable, and extensible data transfer model A visually appealing, easy-to-use web portal A high-performance data transfer engine On-demand provisioning of end-to-end network paths with guaranteed QoS Robust and flexible error handling CILogon-based security A DOE/SC/ASCR-sponsored research project Software is available at:
4
BigData Express Major Components
BDE Web Portal Allow users to access BigData Express data transfer services BDE Scheduler DTN as a service Co-scheduling of DTN, storage, and network BDE AmoebaNet Network as a service mdtmFTP a high-performance data transfer engine 1. BDE Web Portal - Providing a visual interface to allow users to submit data transfer requests and obtain data transfer status - Launching data transfer tasks on users/applications’ behalves - Providing HTTPS communication channels among BigData Express sites to support BigData Express mechanisms - Allowing system admins to manage and monitor BigData Express sites - A common single-point sign-on service - CILogon service – to obtain X.509 certificates for secure access to BigData Express sites 2. BDE Scheduler Orchestrate resources at local sites DTNs Storage Local network resources Negotiate and broker resources with remote sites 3. AmoebaNet A SDN-enabled network service that provide “Application-aware” network Allow application to program network at run-time for optimum performance Network as a service Note: SDN is software defined networking. Originally designed for BigData Express. Can be independently deployed to support any applications AmoebaNet key features: Providing AMQP interface to allow application to program network at run-time for optimum performance JSON-RPC style communication Dynamic and flexible interaction with network Qos-based routing and path calculation Bandwidth Delay Fast provisioning end-to-end network path with guaranteed Qos VLAN-based layer-2 path TCP/IP-based layer-3 path REST-based network initialization and configuration 4. mdtmFTP is a high performance data transfer tool developed by FNAL network research group Pipelined I/O-centric design to streamline data transfer Multicore-aware data transfer middleware (MDTM) optimizes use of underlying multicore system Extremely efficient in transferring of Lots Of Small Files Various optimization mechanisms Zero copy Asynchronous I/O Batch processing
5
BigData Express Major Components (cont.)
DTN Agents Manage and configure DTNs Collect and report the DTN configuration and status Storage Agents Manage and configure storage systems Data Transfer Launching Agent Launch data transfer jobs Support different data transfer protocols
6
BigData Express -- Distributed
BigData Express adopts a distributed, peer-to-peer model. A BigData Express site can independently provide data transfer service. Essentially, BigData Express orchestrates resources (DTNs, storage, and networking) within a site, and negotiates/brokers resources across sites. A Peer-to-Peer model
7
BigData Express -- Flexible
Flexible to set up data transfer federations Providing inherent support for incremental deployment BigData Express is flexible to set up various data transfer federations.
8
BigData Express -- Scalable
BigData Express has a scalable software architecture. At a BigData Express site, BigData Express scheduler manages site resources through various agents DTN agents SDN agents Storage agents Data transfer launching agent Use MQTT as message bus to enable communication between BigData Express scheduler and agents BigData Express scheduler manages site resources through agents Use MQTT as message bus
9
BigData Express -- Extensible
Extensible Plugin framework to support various data transfer protocols mdtmFTP, GridFTP, SRM, XrootD, …
10
BigData Express -- End-to-End Data Transfer Model
Application-aware network service On-demand programming Fast-provisioning of end-to-end network paths with guaranteed QoS Distributed resource negotiation & brokering BigData Express intelligently programs networks at run-time to suit data transfer requirements, It fast provisions end-to-end network paths with guaranteed QoS. BigData Express typically performs the following operations to provision an end-to-end network path: Estimate and calculate DTN-to-DTN traffic matrix, and the related QoS requirements (e.g. throughput, delay). Calculate site-to-site traffic matrix, and the related QoS requirements (e.g., throughput, delay). Call ESnet OSCARS or Internet2 AL2S circuit service to set up or modify point-to-point layer 2 circuits between sites. Call AmoebaNet at each site to program LAN or campus network. Traffic between systems in physically disperse sites are multiplexed/de-multiplexed to/from the corresponding point-to-point layer 2 circuits between sites.
11
BigData Express – High Performance Data Transfer (I)
mdtmFTP FDT GridFTP BBCP Large file data transfer (1 X 100G) 74.18 79.89 91.18 Poor performance Folder data transfer (30 x 10G) 192.19 217 320.17 Folder data transfer (Linux ) 10.51 - Time-to-completion (Seconds) – Client/Server mode Lower is better mdtmFTP FDT GridFTP BBCP Large file data transfer (1 X 100G) 34.976 N/A 106.84 Folder data transfer (30 x 10G) 95.61 - Folder data transfer (Linux ) 9.68 Time-to-completion (Seconds) – 3rd party mode Lower is better Note 1: “-” indicates inability to get transfer to work Note 2: BBCP performance is very poor, we do not list its results here Note 3: BBCP and FDT support 3rd party data transfer. But BBCP and FDT couldn’t run 3rd party data transfer on ESNET testbed due to testbed limitation mdtmFTP is faster than existing data transfer tools, ranging from 8% to 9500%! @ESnet 100GE SDN Testbed,
12
BigData Express – High Performance Data Transfer (II)
StarLight 100GE Testbed mdtmFTP is faster than GridFTP, ranging from 40% to 114%! @StarLight 100GE Testbed
13
BigData Express -- Three Types of Data Transfer
Real-time data transfer Deadline-bound data transfer Best-effort data transfer BigData Express prioritizes data transfer jobs, and schedules resources accordingly. Real-time data transfer. The data transfer task is on the critical path for the end-user experience or a real-time experimental control loop. Scientific applications, such as real-time data analysis, remote visualization, and real-time experimental control, are highly sensitive to data transfer delays. Even small increases in data transfer time will degrade the user experience or result in inaccurate scientific results. Deadline-bound data transfer. The data transfer task is not on the critical path for the end-user experience or a real-time experimental control loop, but it does have an explicit deadline. For example, job startup and scratch storage space purge deadlines in supercomputer centers require deadline-bound data movement. Background data transfer. The data transfer task has a long deadline or no explicit deadline. For example, replicating data from one data center to another data center for long-term storage is a background data transfer task.
14
BigData Express vs. Globus Online
Features BigData Express Globus Online Architecture Distributed service Flexible to set up data transfer federations Centralized service Supported Protocols Extensible plugin framework to support multiple protocols: mdtmFTP GridFTP, XrootD, SRM (coming soon) GridFTP SDN Support Yes, Network as a service Fast-provisioning end-to-end network paths with guaranteed QoS Not in production Supported Data Transfers Real-time data transfer Deadline-bound data transfer Best-effort data transfer Error Handling Checksum Retransmit Globus online is a popular and widely used data transfer service developed and operated by the University of Chicago and Argonne National Laboratory. Here is the comparison between BigData Express and Globus Online.
15
BigData Express SC18 DEMO
16
BigData Express -- Deployment
Asia KISTI, South Korea KSTAR, South Korea Europe University of Amsterdam, Netherlands North America Fermilab StarLight, Northwestern University UMD/MAX, University of Maryland, College Park BigData Express prioritizes data transfer jobs, and schedules resources accordingly. Real-time data transfer. The data transfer task is on the critical path for the end-user experience or a real-time experimental control loop. Scientific applications, such as real-time data analysis, remote visualization, and real-time experimental control, are highly sensitive to data transfer delays. Even small increases in data transfer time will degrade the user experience or result in inaccurate scientific results. Deadline-bound data transfer. The data transfer task is not on the critical path for the end-user experience or a real-time experimental control loop, but it does have an explicit deadline. For example, job startup and scratch storage space purge deadlines in supercomputer centers require deadline-bound data movement. Background data transfer. The data transfer task has a long deadline or no explicit deadline. For example, replicating data from one data center to another data center for long-term storage is a background data transfer task.
17
Next Stage R&D Plan – Functional Perspective
Centralized Configuration Management capabilities Centralized User Account Management capabilities BigData Express APIs allow scientific applications to access BigData Express services allow scientific applications to keep track of data transfer status Enhance error handling Enhance BigData Express to support Rucio (HEP) Rucio interface, Function enhancement, support multiple protocols Enhance BigData Express to support the integration of BigData Express and XrootD Develop XrootD plugin Add data streaming capabilities to BigData Express ADIOS interfaces Enhance mdtmFTP with 3rd party data streaming capabilities
18
More information about BigData Express
Contact:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.