Presentation is loading. Please wait.

Presentation is loading. Please wait.

StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong Yu (BNL) ESCC meeting JTW2010 1.

Similar presentations


Presentation on theme: "StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong Yu (BNL) ESCC meeting JTW2010 1."— Presentation transcript:

1 StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong Yu (BNL) ESCC meeting JTW2010 1

2 Outline Project MotivationProject Motivation Proposed Services and ContributionsProposed Services and Contributions Approach and architectureApproach and architecture SRM TeraPaths Required new functionalitiesRequired new functionalities Gap between existing work and project goals PlansPlans Feedback collectionFeedback collection 2

3 Motivation End-to-end scheduling of data movement requires:End-to-end scheduling of data movement requires: Availability of network bandwidth on the backbone wide area network (WAN) Availability of local area network (LAN) bandwidth from end machines to the border nodes of the WAN But alsoBut also Availability of data to be moved at the source Availability of storage space at the target Availability of bandwidth at the source storage system Availability of bandwidth at the target storage system Why is that hard?Why is that hard? Need to coordinate source and target bandwidth to match resource availability windows Also, need to coordinate these with network bandwidth Project Participants: Project Participants: LBNL: Arie Shoshani, Alex Sim, Junmin Gu, Viji Natarajan BNL: Dantong Yu, Dimitrios Katramatos, and Xin Liu 3

4 User Cases LHC data transfer between Tier 1 and Tier 2 sites. Nuclear Physics: STAR data transfer between BNL and LBL. Climate Modeling Earth System Grid (ESG) and BNL Fast- physics System Testbed and Research (FASTER) We are collecting user requirements from sites. If you are interested in high- performance data transfer and evaluating our tool, please let us know

5 Services and Contributions SRM Services:SRM Services: Processing of service request and subsequent coordinating of network planesProcessing of service request and subsequent coordinating of network planes Network Services:Network Services: Establishment of end-to-end virtual paths connecting two storage locations Service Status:Service Status: SRM data transfer progress and performance End-to-end virtual path status and performance An integrated end-to-end data transfer service with negotiated transfer completion timeline Co-scheduling of storage and bandwidth to improve resource utilization and user experienceCo-scheduling of storage and bandwidth to improve resource utilization and user experience An integrated approach Bridge the gap between dynamic circuit network and data intensive users.Bridge the gap between dynamic circuit network and data intensive users. Current status: statically use long duration dynamic circuits. Add burden to LAN network admins (BGP setup between 1 n) Wasteful and expensive 5

6 6 View of the Network WAN ctrl WAN 1 WAN 2 WAN 3 TeraPaths Domain ctrl TeraPaths RN TeraPaths WAN ctrl Site ASite BSite CSite D MPLS tunnel Dynamic circuit Domain control

7 Approach and Architecture Leverage existing technologiesLeverage existing technologies TeraPaths on top of OSCARS Storage Resource Managers (SRMs) on top of TeraPaths Use Berkeley Storage Manager (BeStMan) implementation of SRM 7 TeraPaths

8 SRM and TeraPaths SRM and TeraPaths SRMs are middleware components whose function is to provide dynamic space allocation and file management for storage components and coordinate data transfer.SRMs are middleware components whose function is to provide dynamic space allocation and file management for storage components and coordinate data transfer. SRM is a functional definitionSRM is a functional definition Multiple implementations interoperate Berkeley Storage Manager (BeStMan) is the Berkeley implementation of SRM TeraPaths provides QoS guarantees at the individual data flow levelTeraPaths provides QoS guarantees at the individual data flow level From end host to end host; transparently Improvement over the best effort. Different data flows have varying priority/importance Video streams, critical data, long duration transfers It schedules network utilization in “high impact” domainsIt schedules network utilization in “high impact” domains Regulate and classify (prioritize) traffic according to policy Dynamically establish flow-based SLAs 8

9 What’s missing in these tools to achieve goal BeStMan needs to be enhanced to:BeStMan needs to be enhanced to: Keep track of bandwidth commitments for multiple request Coordinate between source and target BeStMan’s for storage space and bandwidth Provide advanced reservation for future time window commitments Communication and coordination with underlying TeraPaths TeraPaths needs to be enhanced to:TeraPaths needs to be enhanced to: Receive bandwidth requests from BeStMan in the form of (volume, max-bandwidth, max-completion-time) Negotiate with OSCARS for “best” time window “best” can be earliest completion time, or shortest transfer time If success, commit reservation, and return to BeStMan If failure, find closest solution to suggest to BeStMan 9

10 Development plan for the next twelve months The first six months:The first six months: April 01, 2010 - Interface implementation between BeStman at LBNL and terapaths BNLApril 01, 2010 - Interface implementation between BeStman at LBNL and terapaths BNL June 1, 2010 - Prototype testbed on BNL and U. MichJune 1, 2010 - Prototype testbed on BNL and U. Mich The next six months:The next six months: Testing goals - end-to-end transfer with reserved bandwidthTesting goals - end-to-end transfer with reserved bandwidth Basic test: small amount of data Scaling test: large amount of data with many files 10

11 Research Challenges Reservation negotiation at 3 levelsReservation negotiation at 3 levels BeStMan to TeraPaths to OSCARS At each level, policy rules affect availability The goal is to generate an availability graph that expresses the availability of the overall system and find a solution by fitting, or modifying and then fitting the request Optimization of transit circuit reservationsOptimization of transit circuit reservations Consolidate circuits with common source and destination and share their resources between multiple end-site reservations Combination of the twoCombination of the two Modify request to OSCARS to obtain transit circuit that accommodates multiple reservations Modify existing OSCARS reservation? Fall back to satisfying original request only if not enough resources available Details are described in our journal paper submission. 11

12 Summary StorNet integrates network scheduling with storage schedulingStorNet integrates network scheduling with storage scheduling New functionalities and interfaces are being developed to allow BeStMan to interoperate with TeraPathsNew functionalities and interfaces are being developed to allow BeStMan to interoperate with TeraPaths Ongoing research work on reservation negotiation and circuit reservation optimizationOngoing research work on reservation negotiation and circuit reservation optimization 12

13 SRM functionality Space reservationSpace reservation Negotiate and assign space to users Manage “lifetime” of spaces Release and compact space File managementFile management Assign space for putting files into SRM Pin files in storage when requested till they are released Manage “lifetime” of files Manage action when pins expire (depends on file types) Get files from remote locations when necessaryGet files from remote locations when necessary Purpose: to simplify client’s task srmCopy: in “pull” and “push” modes 13

14 14 TeraPaths TeraPaths is a DOE/Office of Science project on end-to-end QoS (BNL, Michigan, and Stony Brook)TeraPaths is a DOE/Office of Science project on end-to-end QoS (BNL, Michigan, and Stony Brook) It provides QoS guarantees at the individual data flow levelIt provides QoS guarantees at the individual data flow level From end host to end host; transparently Because not all data flows are the same… Default “best effort” network behavior treats all data flows as equal Capacity is limited Congestion causes bandwidth and latency variations Performance and service disruption problems, unpredictability Data flows have varying priority/importance Video streams, critical data, long duration transfers It schedules network utilization in “high impact” domainsIt schedules network utilization in “high impact” domains Regulate and classify (prioritize) traffic according to policy Dynamically establish flow-based SLAs

15 15 L2 vs. L3 (1/2) MPLS tunnel starts and ends within WAN domainMPLS tunnel starts and ends within WAN domain Packets are admitted into the tunnel based on flow ID information (IP src, port src, IP dst, port dst ) WAN admission performed at the first router of the tunnel (ingress) WAN border router MPLS tunnel ingress/egress router MPLS tunnel ingress/egress router

16 16 L2 vs. L3 (2/2) TDynamic circuit appears as VLAN connecting end site border routers with single hop qCannot use flow ID data directly qFlow must be directed to the proper VLAN qWAN admission performed within end site LAN qSelect VLAN with Policy Based Routing (PBR) at both ends TRoute can be selected on a per-flow basis WAN switch border router

17 Multi-Layer Capability View QoS MPLS IP MPLS TeraPaths Services SRM/GridFtp Applications Application, Middleware security AA PlaneControl Plane Data Plane Service Plane Management Plane Manage ment Plane AA Layer 3 Application, Middleware Layer Security TCP UDP TeraPaths Services Manage ment Plane AA Layer 4 Security TeraPaths Control Layer2 Control VLANs TeraPaths Services AA Manage ment Plane Layer 2 Security Application, Middleware Management No in implementation 17

18 Multiple-Layer Architecture View BeStMan/Application Plane TeraPaths Service Plane TeraPaths Management Plane TeraPaths Control Plane Generic DataPlane Layer AA Plane 18

19 Example Use Case: BeStMan in “pull’ mode 1) Target BeStMan gets request (userID (credential, priority), files/directory, maxCompletionTime) 1) Target BeStMan gets request (userID (credential, priority), files/directory, maxCompletionTime) 2) T-BeStMan checks if it has any of the files, and pins them (till maxCompletionTime) 3) T-BeStMan contact S-BeStMan (get volumeOfRestOfFiles, get S-maxBandwidth) -> sent, get response 4) T-BeStMan allocates space (for volume), finds its own T- maxBandwidth 5) Determines desiredMaxBandwidth = min(T-maxBandwidth, S- maxBandwidth) 6) T-BeStMan calls local TeraPaths for “reserve and commit” (userID, DesiredBeginTime=now, volume, desiredMaxBandwidth, maxCompletionTime) 7) TeraPaths checks validity of UserID, priority, and authorization, negotiates with OSCARS 8) TeraPaths returns (a) (reservationID, reservedBeginTime, reservedEndTime, reservedBandwidth), or (b) “can’t do it by maxCompletionTime, but here is new (longer) completion time. 9) T-BeStMan informs user case a) “here is your reservation”. OK? If yes, no actions; if no, issue cancel reservation to TeraPaths case b) “can’t do it, do you wish to use extended maxCompletionTime? If no, cancel; if yes, accept. 19

20 New APIs, Functionality, and Communication Flows to be developed Target BeStMan Space Management Bandwidth management Source BeStMan Space Management Bandwidth management TeraPaths Bandwidth coordination and reservation Data Flow Control Flow Pulling ClientPushing Client Notes: Push and Pull modes are needed because of security limitations Data Flow Client-to-BeStMan BeStMan-to-BeStMan BeStMan-to-TeraPaths 20

21 Control and data flows 21 WAN TeraPaths 1 2 3 BeStMan Application 4 5 TeraPaths control flow data flow IDC (OSCARS)

22 Distributed Reservation Negotiation End-to-end paths comprise multiple segments Each segment is established by a reservation Domains have to agree on parameter ranges Each domain is characterized by a resource availability graph, e.g., for bandwidth The availability of all domains can be established by calculating the minimum availability graph Each new reservation has to fit in the available area Reservations that don’t fit have to be modified If no modification makes a reservation fit, it is rejected TeraPaths currently modifies only start time on a individual site basis and iterates with counter offers OSCARS is tried if/after end-sites agree Will extend to modify start time, end time, and bandwidth, using end- to-end BAGs if applicable or combination of BAGs + trial and error otherwise Ongoing collaboration with the OSCARS team to move from trial- and-error to BAGs 22

23 Combination of Algorithms Obtain and intersect BAGs (bandwidth) from end-sites, fit request, optimize/consolidate multiple end-site reservations, submit OSCARS reservation that accommodates them allObtain and intersect BAGs (bandwidth) from end-sites, fit request, optimize/consolidate multiple end-site reservations, submit OSCARS reservation that accommodates them all Obtain and intersect BAGs from all domains, fit request, optimize/consolidate, then fit resulting (bigger) request to transit domain BAG and submit to OSCARSObtain and intersect BAGs from all domains, fit request, optimize/consolidate, then fit resulting (bigger) request to transit domain BAG and submit to OSCARS If sufficiently bigger slot has already been reserved, request can be serviced without further negotiation with the transit domainIf sufficiently bigger slot has already been reserved, request can be serviced without further negotiation with the transit domain 23

24 Research Plan: Bandwidth Allocation and Circuit Assignment GivenGiven Offline case: a set of reservation requests Decision to makeDecision to make Allocate bandwidths to circuits (VLANs) Assign reservation requests to circuits ObjectiveObjective Maximize the number of requests that can be satisfied Major constraintsMajor constraints Each reservation is assigned to one circuit The total capacity WAN provides The bandwidth utilization must be higher than a given value The number of available circuit IDs are constrained by a given value 24

25 Preliminary Results Algorithm SketchAlgorithm Sketch Order requests Use consolidation when possible (bandwidth utilization is high enough) Assign new circuit when necessary (if circuit IDs and bandwidths are available) 25 time bandwidth 2 3 1 5 4 reject

26 Preliminary Results Online caseOnline case Choose an “optimization window” near the new request and perform reservation consolidation 26

27 2 1 5 3 6 4 time bandwidth t s1 t s4 t s2 t s3 t s5 t s6 t e1 t e3 t e5 t e4 t e2 t e6 t1t1 t7t7 t2t2 t3t3 t8t8 t 11 t4t4 t5t5 t 10 t9t9 t6t6 t 12 max reserved available reservation Bandwidth Reservation Requests Bandwidth Availability Graph (BAG) 27

28 time bandwidth t1t1 t7t7 t2t2 t3t3 t8t8 t 11 t4t4 t5t5 t 10 t9t9 t6t6 t 12 max T Smin T Smax TSTS (a) (b) new new (modified) Find Resources for New Request 28

29 time bandwidth t1t1 t8t8 t2t2 t3t3 t 10 t 13 t4t4 t5t5 t 12 t 11 t7t7 t 14 max A max B (a) (b) max B t6t6 t9t9 End-to-End Bandwidth Availability Graph 29 Domain A Domain B Combined BAG (intersection)

30 Reservation Consolidation MotivationMotivation To survive from limited number of VLAN (circuit) IDs To reduce WAN operations (tear-down and setup) IdeaIdea Use one VLAN (WAN reservation) for multiple reservations However, bandwidth will not be fully utilized 30 time bandwidth Bandwidth Utilization = sum of user reservations / WAN reservation Bandwidth utilizationVLAN ID ConsumptionCapacity Consumption high low high


Download ppt "StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong Yu (BNL) ESCC meeting JTW2010 1."

Similar presentations


Ads by Google