Presentation is loading. Please wait.

Presentation is loading. Please wait.

Run II Networking Assessment Andrey Bobyshev, CD/LCS/NVS/Network Service June 24, 2011.

Similar presentations


Presentation on theme: "Run II Networking Assessment Andrey Bobyshev, CD/LCS/NVS/Network Service June 24, 2011."— Presentation transcript:

1 Run II Networking Assessment Andrey Bobyshev, CD/LCS/NVS/Network Service June 24, 2011

2 Outline of discussion Objectives and why do we need to assess Run-II Networking now Review of the current state Issues, short and long time fixes, a timeline Short/long term needs and requirements for Run II, REX- supported experiments CMS expertise and experience on a large-scale data movement, design solutions applicable for RunII Networking General design solutions on networking for science data centers

3 Questions to REX Current and projected needs, FY11 servers acquisition planning Do you see the need for major network upgrades in the near future, if so funding ? Any projections for growth of data movement ?? Is there anything from CMS experience interested for REX ? Dependency between computing resources located at different buildings/data centers

4 Why do we need to assess Run-II Networking? Short term plan (~ 1 year) – reprocess ~20% of raw data with improved algorithms Continue analysis for about 5 years – Computing Sector is committed to provide and support resources at the current level Institutional Review of Fermilab, June 06-09,2011 https://indico.fnal.gov/conferenceDisplay.py?confid=4263 https://indico.fnal.gov/conferenceDisplay.py?confid=4263 – Computing Strategy, Victoria White – CDF: Operations, Physics and Analysis Plan, Raymond Culbertson – D0: Operations, Physics and Analysis Plan, Marco Verzocchi, 5 th Workshop on Data Preservation and Long Term Analysis in HEP – Qizhong Li, Fermilab, May 17, 2011

5

6

7 D0 Module “AS IS”

8

9

10 CDF Module “AS IS”

11 Main killers of science data movement Oversubscription…, if you can leave with over- subscription moving science data why not using 100Mbps switches – lots of money could be saved. If we are connecting computing resources for science data movement at 1/10G we want to and we need to use the fully capacity of provisioned bandwidth end-to-end Firewalls

12 FCC3 CR, Currently provisioned bandwidth and suggested BW as a short-term solution r-s-dist-fcc3-1, current BW avail Active connections BW needed/GE Suggested BW fcdfcacheNd0srvN s-access-fcc3- 2/N5K 10 28:1 240 of 160* 80 FEX102 10 48403046 FEX103 10 4840302311 FEX105 10 4840305 FEX106 10 48403014 FEX107 10 484030 FEX108 10 4840304 Total for all FEXs 60 240 79 s-access-fcc3-5/ C4948E 10 7 (more to come) 4030 7 s-access-fcc3- 4/C4948E 14030 N7K/FCC3 non- blocking ports needed * Maximum N5K aggregation is 160G

13 s-s-access-fcc3-2 / Nexus N5K-C5548P

14 Current issues/short term solutions Clean-up routing CDF switches at GCC have saturated links Reduce oversubscription for CDF/D0 access switches/FEXs at FCC3, ~30GE/FEX, 60-80GE/N5K – will leave just few ports available on N5K, brief disruptions for FEX-connected nodes when upgrading Eliminate/reduce oversubscription on major uplinks, requires upgrade of distribution Nexus-es and possibly Core 6506 routers Establish links between distribution switches at different buildings for D0/CDF traffic Relocate one N7K from GCC-A to GCC-B or FCC2, unless D0 &CDF HW need to be kept separately (but it is already mixed at FCC3)

15 USCMS Tier1 Network Design patterns No-oversubscription or at minimally possible level (80GE for C6509 with 287 1GE connections) 3 VLANs/subnets are CMS-wide, migrating end systems is transparent, we won’t cause hassle to users by re-addressing, change configuration and security policies and etc.. Maximal use of switching fabric rather than inter-switch links Provisioning BW where it is needed Virtual port channel Redundancy at L2 (vPC, access switches are connected to both nexus-es) and L3 (GLBP) A common misleading statement from networking vendors is that L2 and L3 performance is the same for modern routers/switches – not quite true, it will never be the same, by design – much more work needs to be done to process an IP packet versus an ethernet frame. All L3 processing speed up techniques are based on creating L2 shortcuts for selected L3 traffic. Yes, you might expected performance improvement for selected traffic patterns but there are consequences too…

16 2-year upgrade is completed Cisco 6509-based coreCisco Nexus 7000 based core

17 1G 10G N x 1G -> [1-8] x 10G works 10G N x 10G => [1-8] x 10G – not good, you can aggregate up to 8 of 10GE into 8x10G channel (on same platforms 16x10G) then oversubscription will start playing a role Aggregation of same BW links – not OK Aggregation of lower BW links into higher BW links - OK

18 Model of USCMS-T1 Network traffic cmsstor/dCache nodes Federated File System T0 2.2Gbps 3.2Gbps Data processing /~1600 Worker nodes 30-80Gbps EnStore Tape Robots BlueArc NAS CMS-LPC/SLB Clusters Tier2s/Tiers1 Interactive users 10-20Gbps 1Gbps 3-10Gbps QoS

19 Network use (2%) Interactive (2%) Real-Time (Database,monitoring) (2%) Critical (34%) NAS (10%) Best effort (50%) Classes of traffic in USCMS-T1 8 Gbps 40Gbps 27.2Gbps 1.6Gbps

20

21

22

23

24

25 ITIL and USCMS-T1 best practices ITIL process Work practices Network service USCMS

26 Interaction on design process Gathering new requirements from customers for next year, projections for 2-3 years Preliminary design, consider several options, research new solutions

27 Interaction on upgrade processes (fragments from Q&A Assessment documents) Discussion on why we need to upgrade, planning according to experiment’s schedule, provide risk assessment, urgency/importance:1 – low, 10 – high Planning upgrade, checklists, verification procedures, tests, configuration changes, step-by-step, fallback plan – also need to submit CRQ now Notify users when upgrade is completed, provide information on any issues Post-implementation monitoring, ask users to report any suspicious behavior

28 Interaction with experiments (fragments from USCMS-T1 Q&A assessment document) Informal face-to-face meetings, 1-2 times per week, current status, problems or issues, preliminary discussions on upgrade plans Detailed Design Note and procurement plan, yearly plan for design or major upgrades, provide projections for next 2-3 years Intermittent status reports, describe current status, revised plans Project proposals– identifies issues that need substantial efforts to develop or integrate CRQs, Service Desk tickets… AIM/E-mail – staying in contact on urgent issues Survey “How we are doing” – informal, so it is aimed to be honest there is no network or user’s problem… there is a problem – working on troubleshooting in parallel, working on solution in parallel until resolved

29 General considerations on data center design for science data movement We have, FCC3,FCC2, FCC1, GCC-A, GCC-B, LCC computing rooms – is it one distributed data center or is it multiple data centers – it will dictate design approaches

30

31 Following six slides are from the presentation of Eli Dart, Network Engineer, ESNet Engineering Group given at Winter 2011 Joint Techs, Clemson, SC, February 1, 2011….

32

33

34

35

36

37

38 Drawings on following slides are from various Cisco Systems Inc., Design Documents …..

39

40

41 VLANs are extended between DCs, but placement of primary/secondary roots is not optimal for DC2

42 Same as on the previous slide but issue of optimal placement of primary/secondary root is addressed by BPDU filtering

43

44

45

46 The following slide is from Arista Networks white paper on modern network architectures – consolidation of several layers of Multi-tier architecture into fewer layers

47


Download ppt "Run II Networking Assessment Andrey Bobyshev, CD/LCS/NVS/Network Service June 24, 2011."

Similar presentations


Ads by Google