StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong Yu (BNL) ESCC meeting JTW2010 1.

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

LambdaStation Phil DeMar Don Petravick NeSC Oct. 7, 2004.
Packet Switching COM1337/3501 Textbook: Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann Chapter 3.
Chapter 7: Transport Layer
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 7: Transport Layer Introduction to Networking.
Traffic Shaping Why traffic shaping? Isochronous shaping
TeraPaths TeraPaths: Flow-Based End-to-End QoS Paths through Modern Hybrid WANs Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos,
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
TeraPaths: End-to-End Network Path QoS Configuration Using Cross-Domain Reservation Negotiation Bruce Gibbard Dimitrios Katramatos Shawn McKee Dantong.
1 Chin Guok ESnet Network Engineer David Robertson DSD Computer Software Engineer Lawrence Berkeley National Laboratory.
December 20, 2004MPLS: TE and Restoration1 MPLS: Traffic Engineering and Restoration Routing Zartash Afzal Uzmi Computer Science and Engineering Lahore.
LYU9802 Quality of Service in Wired/Wireless Communication Networks: Techniques and Evaluation Supervisor: Dr. Michael R. Lyu Marker: Dr. W.K. Kan Wan.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
Introduction. 2 What Is SmartFlow? SmartFlow is the first application to test QoS and analyze the performance and behavior of the new breed of policy-based.
Efficient agent-based selection of DiffServ SLAs over MPLS networks Thanasis G. Papaioannou a,b, Stelios Sartzetakis a, and George D. Stamoulis a,b presented.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
ESnet On-demand Secure Circuits and Advance Reservation System (OSCARS) Chin Guok Network Engineering Group Thomas Ndousse Visit February Energy.
Virtual LANs. VLAN introduction VLANs logically segment switched networks based on the functions, project teams, or applications of the organization regardless.
Lawrence G. Roberts CEO Anagran September 2005 Advances Toward Economic and Efficient Terabit LANs and WANs.
TeraPaths : A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research USATLAS Tier 1 & Tier 2 Network Planning Meeting December.
Chapter 4: Managing LAN Traffic
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
TeraPaths: A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research Bruce Gibbard & Dantong Yu High-Performance Network Research.
End-to-end resource management in DiffServ Networks –DiffServ focuses on singal domain –Users want end-to-end services –No consensus at this time –Two.
© 2006 Cisco Systems, Inc. All rights reserved. 3.3: Selecting an Appropriate QoS Policy Model.
© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 3: Introduction to IP QoS.
Common Devices Used In Computer Networks
Protocols and the TCP/IP Suite
Our Last Class!!  summary  what does the future look like?
“Intra-Network Routing Scheme using Mobile Agents” by Ajay L. Thakur.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
TeraPaths TeraPaths: establishing end-to-end QoS paths - the user perspective Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos,
Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned.
Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
The University of Bolton School of Games Computing & Creative Technologies LCT2516 Network Architecture CCNA Exploration LAN Switching and Wireless Chapter.
Andrew C. Smith – Storage Resource Managers – 10/05/05 Functionality and Integration Storage Resource Managers.
TeraPaths TeraPaths: Establishing End-to-End QoS Paths through L2 and L3 WAN Connections Presented by Presented by Dimitrios Katramatos, BNL Dimitrios.
TCP Trunking: Design, Implementation and Performance H.T. Kung and S. Y. Wang.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Practical Distributed Authorization for GARA Andy Adamson and Olga Kornievskaia Center for Information Technology Integration University of Michigan, USA.
TeraPaths The TeraPaths Collaboration Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos, BNL.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
GridNets, October 1, AR-PIN/PDC: Flexible Advance Reservation of Intradomain and Interdomain Lightpaths Eric He, Xi Wang, Jason Leigh Electronic.
Terapaths: MPLS based Data Sharing Infrastructure for Peta Scale LHC Computing Bruce Gibbard and Dantong Yu USATLAS Computing Facility DOE Network Research.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team CHEP 06.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 TeraPaths and dynamic circuits  Strong interest to expand testbed to sites connected to Internet2 (especially US ATLAS T2 sites)  Plans started in.
Internet2 Joint Techs Workshop, Feb 15, 2005, Salt Lake City, Utah ESnet On-Demand Secure Circuits and Advance Reservation System (OSCARS) Chin Guok
Dynamic Circuit Network An Introduction John Vollbrecht, Internet2 May 26, 2008.
SDN and OSCARS how-to Evangelos Chaniotakis Network Engineering Group ESCC Indianapoilis, July 2009 Energy Sciences Network Lawrence Berkeley National.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
1 Revision to DOE proposal Resource Optimization in Hybrid Core Networks with 100G Links Original submission: April 30, 2009 Date: May 4, 2009 PI: Malathi.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Bearer Control for VoIP and VoMPLS Control Plane Francois Le Faucheur Bruce Thompson Cisco Systems, Inc. Angela Chiu AT&T March 30, 2000.
Computer Networks 0110-IP Gergely Windisch
Supporting Advanced Scientific Computing Research Basic Energy Sciences Biological and Environmental Research Fusion Energy Sciences High Energy Physics.
TeraPaths TeraPaths:Configuring End-to-End Virtual Network Paths With QoS Guarantees Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos,
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
Fermilab Cal Tech Lambda Station High-Performance Network Research PI Meeting BNL Phil DeMar September 29, 2005.
The TeraPaths Testbed: Exploring End-to-End Network QoS Dimitrios Katramatos, Dantong Yu, Bruce Gibbard, Shawn McKee TridentCom 2007 Presented by D.Katramatos,
Instructor Materials Chapter 6: Quality of Service
Grid Optical Burst Switched Networks
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
Virtual LANs.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 6: Quality of Service Connecting Networks.
ExaO: Software Defined Data Distribution for Exascale Sciences
Towards Predictable Datacenter Networks
Presentation transcript:

StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong Yu (BNL) ESCC meeting JTW2010 1

Outline Project MotivationProject Motivation Proposed Services and ContributionsProposed Services and Contributions Approach and architectureApproach and architecture SRM TeraPaths Required new functionalitiesRequired new functionalities Gap between existing work and project goals PlansPlans Feedback collectionFeedback collection 2

Motivation End-to-end scheduling of data movement requires:End-to-end scheduling of data movement requires: Availability of network bandwidth on the backbone wide area network (WAN) Availability of local area network (LAN) bandwidth from end machines to the border nodes of the WAN But alsoBut also Availability of data to be moved at the source Availability of storage space at the target Availability of bandwidth at the source storage system Availability of bandwidth at the target storage system Why is that hard?Why is that hard? Need to coordinate source and target bandwidth to match resource availability windows Also, need to coordinate these with network bandwidth Project Participants: Project Participants: LBNL: Arie Shoshani, Alex Sim, Junmin Gu, Viji Natarajan BNL: Dantong Yu, Dimitrios Katramatos, and Xin Liu 3

User Cases LHC data transfer between Tier 1 and Tier 2 sites. Nuclear Physics: STAR data transfer between BNL and LBL. Climate Modeling Earth System Grid (ESG) and BNL Fast- physics System Testbed and Research (FASTER) We are collecting user requirements from sites. If you are interested in high- performance data transfer and evaluating our tool, please let us know

Services and Contributions SRM Services:SRM Services: Processing of service request and subsequent coordinating of network planesProcessing of service request and subsequent coordinating of network planes Network Services:Network Services: Establishment of end-to-end virtual paths connecting two storage locations Service Status:Service Status: SRM data transfer progress and performance End-to-end virtual path status and performance An integrated end-to-end data transfer service with negotiated transfer completion timeline Co-scheduling of storage and bandwidth to improve resource utilization and user experienceCo-scheduling of storage and bandwidth to improve resource utilization and user experience An integrated approach Bridge the gap between dynamic circuit network and data intensive users.Bridge the gap between dynamic circuit network and data intensive users. Current status: statically use long duration dynamic circuits. Add burden to LAN network admins (BGP setup between 1 n) Wasteful and expensive 5

6 View of the Network WAN ctrl WAN 1 WAN 2 WAN 3 TeraPaths Domain ctrl TeraPaths RN TeraPaths WAN ctrl Site ASite BSite CSite D MPLS tunnel Dynamic circuit Domain control

Approach and Architecture Leverage existing technologiesLeverage existing technologies TeraPaths on top of OSCARS Storage Resource Managers (SRMs) on top of TeraPaths Use Berkeley Storage Manager (BeStMan) implementation of SRM 7 TeraPaths

SRM and TeraPaths SRM and TeraPaths SRMs are middleware components whose function is to provide dynamic space allocation and file management for storage components and coordinate data transfer.SRMs are middleware components whose function is to provide dynamic space allocation and file management for storage components and coordinate data transfer. SRM is a functional definitionSRM is a functional definition Multiple implementations interoperate Berkeley Storage Manager (BeStMan) is the Berkeley implementation of SRM TeraPaths provides QoS guarantees at the individual data flow levelTeraPaths provides QoS guarantees at the individual data flow level From end host to end host; transparently Improvement over the best effort. Different data flows have varying priority/importance Video streams, critical data, long duration transfers It schedules network utilization in “high impact” domainsIt schedules network utilization in “high impact” domains Regulate and classify (prioritize) traffic according to policy Dynamically establish flow-based SLAs 8

What’s missing in these tools to achieve goal BeStMan needs to be enhanced to:BeStMan needs to be enhanced to: Keep track of bandwidth commitments for multiple request Coordinate between source and target BeStMan’s for storage space and bandwidth Provide advanced reservation for future time window commitments Communication and coordination with underlying TeraPaths TeraPaths needs to be enhanced to:TeraPaths needs to be enhanced to: Receive bandwidth requests from BeStMan in the form of (volume, max-bandwidth, max-completion-time) Negotiate with OSCARS for “best” time window “best” can be earliest completion time, or shortest transfer time If success, commit reservation, and return to BeStMan If failure, find closest solution to suggest to BeStMan 9

Development plan for the next twelve months The first six months:The first six months: April 01, Interface implementation between BeStman at LBNL and terapaths BNLApril 01, Interface implementation between BeStman at LBNL and terapaths BNL June 1, Prototype testbed on BNL and U. MichJune 1, Prototype testbed on BNL and U. Mich The next six months:The next six months: Testing goals - end-to-end transfer with reserved bandwidthTesting goals - end-to-end transfer with reserved bandwidth Basic test: small amount of data Scaling test: large amount of data with many files 10

Research Challenges Reservation negotiation at 3 levelsReservation negotiation at 3 levels BeStMan to TeraPaths to OSCARS At each level, policy rules affect availability The goal is to generate an availability graph that expresses the availability of the overall system and find a solution by fitting, or modifying and then fitting the request Optimization of transit circuit reservationsOptimization of transit circuit reservations Consolidate circuits with common source and destination and share their resources between multiple end-site reservations Combination of the twoCombination of the two Modify request to OSCARS to obtain transit circuit that accommodates multiple reservations Modify existing OSCARS reservation? Fall back to satisfying original request only if not enough resources available Details are described in our journal paper submission. 11

Summary StorNet integrates network scheduling with storage schedulingStorNet integrates network scheduling with storage scheduling New functionalities and interfaces are being developed to allow BeStMan to interoperate with TeraPathsNew functionalities and interfaces are being developed to allow BeStMan to interoperate with TeraPaths Ongoing research work on reservation negotiation and circuit reservation optimizationOngoing research work on reservation negotiation and circuit reservation optimization 12

SRM functionality Space reservationSpace reservation Negotiate and assign space to users Manage “lifetime” of spaces Release and compact space File managementFile management Assign space for putting files into SRM Pin files in storage when requested till they are released Manage “lifetime” of files Manage action when pins expire (depends on file types) Get files from remote locations when necessaryGet files from remote locations when necessary Purpose: to simplify client’s task srmCopy: in “pull” and “push” modes 13

14 TeraPaths TeraPaths is a DOE/Office of Science project on end-to-end QoS (BNL, Michigan, and Stony Brook)TeraPaths is a DOE/Office of Science project on end-to-end QoS (BNL, Michigan, and Stony Brook) It provides QoS guarantees at the individual data flow levelIt provides QoS guarantees at the individual data flow level From end host to end host; transparently Because not all data flows are the same… Default “best effort” network behavior treats all data flows as equal Capacity is limited Congestion causes bandwidth and latency variations Performance and service disruption problems, unpredictability Data flows have varying priority/importance Video streams, critical data, long duration transfers It schedules network utilization in “high impact” domainsIt schedules network utilization in “high impact” domains Regulate and classify (prioritize) traffic according to policy Dynamically establish flow-based SLAs

15 L2 vs. L3 (1/2) MPLS tunnel starts and ends within WAN domainMPLS tunnel starts and ends within WAN domain Packets are admitted into the tunnel based on flow ID information (IP src, port src, IP dst, port dst ) WAN admission performed at the first router of the tunnel (ingress) WAN border router MPLS tunnel ingress/egress router MPLS tunnel ingress/egress router

16 L2 vs. L3 (2/2) TDynamic circuit appears as VLAN connecting end site border routers with single hop qCannot use flow ID data directly qFlow must be directed to the proper VLAN qWAN admission performed within end site LAN qSelect VLAN with Policy Based Routing (PBR) at both ends TRoute can be selected on a per-flow basis WAN switch border router

Multi-Layer Capability View QoS MPLS IP MPLS TeraPaths Services SRM/GridFtp Applications Application, Middleware security AA PlaneControl Plane Data Plane Service Plane Management Plane Manage ment Plane AA Layer 3 Application, Middleware Layer Security TCP UDP TeraPaths Services Manage ment Plane AA Layer 4 Security TeraPaths Control Layer2 Control VLANs TeraPaths Services AA Manage ment Plane Layer 2 Security Application, Middleware Management No in implementation 17

Multiple-Layer Architecture View BeStMan/Application Plane TeraPaths Service Plane TeraPaths Management Plane TeraPaths Control Plane Generic DataPlane Layer AA Plane 18

Example Use Case: BeStMan in “pull’ mode 1) Target BeStMan gets request (userID (credential, priority), files/directory, maxCompletionTime) 1) Target BeStMan gets request (userID (credential, priority), files/directory, maxCompletionTime) 2) T-BeStMan checks if it has any of the files, and pins them (till maxCompletionTime) 3) T-BeStMan contact S-BeStMan (get volumeOfRestOfFiles, get S-maxBandwidth) -> sent, get response 4) T-BeStMan allocates space (for volume), finds its own T- maxBandwidth 5) Determines desiredMaxBandwidth = min(T-maxBandwidth, S- maxBandwidth) 6) T-BeStMan calls local TeraPaths for “reserve and commit” (userID, DesiredBeginTime=now, volume, desiredMaxBandwidth, maxCompletionTime) 7) TeraPaths checks validity of UserID, priority, and authorization, negotiates with OSCARS 8) TeraPaths returns (a) (reservationID, reservedBeginTime, reservedEndTime, reservedBandwidth), or (b) “can’t do it by maxCompletionTime, but here is new (longer) completion time. 9) T-BeStMan informs user case a) “here is your reservation”. OK? If yes, no actions; if no, issue cancel reservation to TeraPaths case b) “can’t do it, do you wish to use extended maxCompletionTime? If no, cancel; if yes, accept. 19

New APIs, Functionality, and Communication Flows to be developed Target BeStMan Space Management Bandwidth management Source BeStMan Space Management Bandwidth management TeraPaths Bandwidth coordination and reservation Data Flow Control Flow Pulling ClientPushing Client Notes: Push and Pull modes are needed because of security limitations Data Flow Client-to-BeStMan BeStMan-to-BeStMan BeStMan-to-TeraPaths 20

Control and data flows 21 WAN TeraPaths BeStMan Application 4 5 TeraPaths control flow data flow IDC (OSCARS)

Distributed Reservation Negotiation End-to-end paths comprise multiple segments Each segment is established by a reservation Domains have to agree on parameter ranges Each domain is characterized by a resource availability graph, e.g., for bandwidth The availability of all domains can be established by calculating the minimum availability graph Each new reservation has to fit in the available area Reservations that don’t fit have to be modified If no modification makes a reservation fit, it is rejected TeraPaths currently modifies only start time on a individual site basis and iterates with counter offers OSCARS is tried if/after end-sites agree Will extend to modify start time, end time, and bandwidth, using end- to-end BAGs if applicable or combination of BAGs + trial and error otherwise Ongoing collaboration with the OSCARS team to move from trial- and-error to BAGs 22

Combination of Algorithms Obtain and intersect BAGs (bandwidth) from end-sites, fit request, optimize/consolidate multiple end-site reservations, submit OSCARS reservation that accommodates them allObtain and intersect BAGs (bandwidth) from end-sites, fit request, optimize/consolidate multiple end-site reservations, submit OSCARS reservation that accommodates them all Obtain and intersect BAGs from all domains, fit request, optimize/consolidate, then fit resulting (bigger) request to transit domain BAG and submit to OSCARSObtain and intersect BAGs from all domains, fit request, optimize/consolidate, then fit resulting (bigger) request to transit domain BAG and submit to OSCARS If sufficiently bigger slot has already been reserved, request can be serviced without further negotiation with the transit domainIf sufficiently bigger slot has already been reserved, request can be serviced without further negotiation with the transit domain 23

Research Plan: Bandwidth Allocation and Circuit Assignment GivenGiven Offline case: a set of reservation requests Decision to makeDecision to make Allocate bandwidths to circuits (VLANs) Assign reservation requests to circuits ObjectiveObjective Maximize the number of requests that can be satisfied Major constraintsMajor constraints Each reservation is assigned to one circuit The total capacity WAN provides The bandwidth utilization must be higher than a given value The number of available circuit IDs are constrained by a given value 24

Preliminary Results Algorithm SketchAlgorithm Sketch Order requests Use consolidation when possible (bandwidth utilization is high enough) Assign new circuit when necessary (if circuit IDs and bandwidths are available) 25 time bandwidth reject

Preliminary Results Online caseOnline case Choose an “optimization window” near the new request and perform reservation consolidation 26

time bandwidth t s1 t s4 t s2 t s3 t s5 t s6 t e1 t e3 t e5 t e4 t e2 t e6 t1t1 t7t7 t2t2 t3t3 t8t8 t 11 t4t4 t5t5 t 10 t9t9 t6t6 t 12 max reserved available reservation Bandwidth Reservation Requests Bandwidth Availability Graph (BAG) 27

time bandwidth t1t1 t7t7 t2t2 t3t3 t8t8 t 11 t4t4 t5t5 t 10 t9t9 t6t6 t 12 max T Smin T Smax TSTS (a) (b) new new (modified) Find Resources for New Request 28

time bandwidth t1t1 t8t8 t2t2 t3t3 t 10 t 13 t4t4 t5t5 t 12 t 11 t7t7 t 14 max A max B (a) (b) max B t6t6 t9t9 End-to-End Bandwidth Availability Graph 29 Domain A Domain B Combined BAG (intersection)

Reservation Consolidation MotivationMotivation To survive from limited number of VLAN (circuit) IDs To reduce WAN operations (tear-down and setup) IdeaIdea Use one VLAN (WAN reservation) for multiple reservations However, bandwidth will not be fully utilized 30 time bandwidth Bandwidth Utilization = sum of user reservations / WAN reservation Bandwidth utilizationVLAN ID ConsumptionCapacity Consumption high low high