Download presentation
Presentation is loading. Please wait.
Published byKatherine Byrd Modified over 9 years ago
1
Working group for optimized Computing Capacity Lifecycle Planning Created after ISM meeting 16 th of June Members: Tim B, Eric G, Helge, Massimo, Carles, Benoit, Bernd, Olof Also: Eric S, Artur, Arne Mandate: To look at the current process and recent difficulties, including having multiple budget codes. Based on this, make a proposal for a revised process which should eliminate the recent issues as well as make the process as efficient as possible. This process should be, as far as possible, consistent for all hardware and services in the IT Computing Facilities. Activity: 6 meetings in total, 2 to define the problems and 4 for finding solutions and agree on recommendations. Output: report with recommendations
2
Requirements WG Topics Technology survey Decommissioning Schedule Capacity Procurement Life-cycle Technical Budgeting Accounting Funding & Chargeback(?) Commissioning Allocation & Repurposing
3
WhoRecommendation R01CF Procurement team gives yearly public presentation covering technology trends that are relevant for CERN IT and the implications for the services. R02CF Procurement team starts every tender cycle by organizing requirement meeting covering both technical and capacity requirements. R03CF Operation team maintains global table for tracking the deliveries in the Computer Facilities Capacity Coordination Meeting (CFCCM). R04CF Procurement team opens SNOW ticket to intended customer service FE for hand-over of allocated systems for commissioning. R05CSInvestigate technical options for separating logical network from infrastructure. R06AllReview every use-case (e.g. Oracle databases, Drupal) for private network should be re-evaluated in due of cost. R07CSEnhance LANDB interface to better support bulk renumbering of IP services. R08CSAdd information about blocking factor and number of fibres at Switch level in LANDB R09CSReplace cross-charging for network switches with an explicit budget transfer R10CFDefine a process for review and approval of request for using the Barn R11CFEstablish host based replacement process as it is outlined in section 5 R12Bernd Propose and agree with IT management on a standing justification for adding an option for 20% additional volume to future FC papers R13CFMove to standard scheme with two procurement cycles / year targeting June and December FC meetings R14OISTest and certify Windows installation on standard bulk hardware R15All Review clusters for potential candidates for best-effort production hardware usage as defined in host-by-host replacement proposal R16DHODetermine necessary staffing to implement and operate the processes once recommendations R01 to R15 are all agreed
4
WhoRecommendation R01CF Procurement team gives yearly public presentation covering technology trends that are relevant for CERN IT and the implications for the services. R02CF Procurement team starts every tender cycle by organizing requirement meeting covering both technical and capacity requirements. R03CF Operation team maintains global table for tracking the deliveries in the Computer Facilities Capacity Coordination Meeting (CFCCM). R04CF Procurement team opens SNOW ticket to intended customer service FE for hand-over of allocated systems for commissioning. R05CSInvestigate technical options for separating logical network from infrastructure. R06AllReview every use-case (e.g. Oracle databases, Drupal) for private network should be re-evaluated in due of cost. R07CSEnhance LANDB interface to better support bulk renumbering of IP services. R08CSAdd information about blocking factor and number of fibres at Switch level in LANDB R09CSReplace cross-charging for network switches with an explicit budget transfer R10CFDefine a process for review and approval of request for using the Barn R11CFEstablish host based replacement process as it is outlined in section 5 R12Bernd Propose and agree with IT management on a standing justification for adding an option for 20% additional volume to future FC papers R13CFMove to standard scheme with two procurement cycles / year targeting June and December FC meetings R14OISTest and certify Windows installation on standard bulk hardware R15All Review clusters for potential candidates for best-effort production hardware usage as defined in host-by-host replacement proposal R16DHODetermine necessary staffing to implement and operate the processes once recommendations R01 to R15 are all agreed
5
R03: CFCC tracking table for installations, acceptance and commissioning https://espace2013.cern.ch/CFCCM/Shared%20Documents/Installations%20List/CFCCM_Installations_list.xlsx?Web=1 (Access restricted to e-group: it-service-cfccm)
6
Spec work Dispatch tender Waiting bids Bid evaluation Waiting FC approval Waiting delivery Acceptance Dispatching orders N+8N+9N+10N+11NN+1N+2N+3N+4N+5N+6N+7N+12 Dedicated Assisting Consulting Procurement team R13: Typical Procurement cycle with FC Month
7
SONDJFMAM JJ ASONDJFMAM JJ A SONDJFMAM JJ ASONDJFMAM JJ A Year N Year N+1 FC meetings Invoicing June FC March FC December FC September FC R13: synchronized to FC meetings
8
SONDJFMAM JJ ASONDJFMAM JJ A SONDJFMAM JJ ASONDJFMAM JJ A Year N Year N+1 FC meetings Invoicing June FC December FC R13: 2x cycles June & December FC
9
R11: 1-to-1 host replacement Once a year: tender for 1-to-1 replacement of hosts (and storage) with expiring warranty in next 12 months One year later: replacement capacity ready for commissioning Inform the owner Functional Element (Cloud Infrastructure, EOS, …) List of reliable production hosts to be replaced List of replacement hosts Allow one year for migration Old host re-purposed for “best effort” production Host replacement accounts some capacity growth. Additional Growth added on top or tendered separately
10
SONDJFMAMJJASONDJFMAMJJASONDJFMAMJJA 2015 2016 2017 Replacement capacity = 1294 systems with age between 2 and 3 years Tender (December FC) Replacement capacity available Phase 1: procurementPhase 2: commission & repurpose Notify services to migrate All old capacity repurposed R11: typical 1-to-1 cycle
11
R11: Host lifecycle new1 year2 years3 years4 years5 yearsOlder Commissioning Reliable productionBest effort productionRe-purpose Tender for 1:1 replacement Original warranty expires Replacement available
12
R11: 1-to-1 host replacement Once a year: tender for 1-to-1 replacement of hosts (and storage) with expiring warranty in next 12 months One year later: replacement capacity ready for commissioning Inform the owner Functional Element (Cloud Infrastructure, EOS, …) List of reliable production hosts to be replaced List of replacement hosts Allow one year for migration Old host re-purposed for “best effort” production Host replacement implies significant capacity growth Additional growth on top or tendered for separately
13
R11: Decommissioning When to decide obsolescence? Difficult to define generalize criteria for Inefficiency (power consumption per capacity, physical space) Failure rates Parts availability Maintenance efforts (firmware support and security) ~>5 years “feels” about right Decommissioning process? Big-bang retirement campaigns? Establish a background activity, e.g. trickle 100-200 servers / month?
14
- Decommissioning at age>5 years - 200 servers/month Adding ~2300 servers in pipeline
15
R15: Reliable production vs best-effort Move from reliable production and best-effort production phases is expected one year after the expiry of the original vendor warranty Best-effort production consists of clusters of hosts that are available for re-purposing or decommissioning. It may include new hardware that has not yet been commissioned in reliable production Examples of best-effort production cluster can be OpenStack cells with short turn-over VMs such as batch worker nodes. FEs using bare hardware should review their clusters for potential candidates.
16
Summary Working group completed its mission Good and focussed discussions (thanks to all involved) Conclusions presented (and approved) at the ISM in December Only a subset presented here More details in report attached to the agenda Implementation Monitor progress?
17
WhoRecommendation R01CF Procurement team gives yearly public presentation covering technology trends that are relevant for CERN IT and the implications for the services. R02CF Procurement team starts every tender cycle by organizing requirement meeting covering both technical and capacity requirements. R03CF Operation team maintains global table for tracking the deliveries in the Computer Facilities Capacity Coordination Meeting (CFCCM). R04CF Procurement team opens SNOW ticket to intended customer service FE for hand-over of allocated systems for commissioning. R05CSInvestigate technical options for separating logical network from infrastructure. R06AllReview every use-case (e.g. Oracle databases, Drupal) for private network should be re-evaluated in due of cost. R07CSEnhance LANDB interface to better support bulk renumbering of IP services. R08CSAdd information about blocking factor and number of fibres at Switch level in LANDB R09CSReplace cross-charging for network switches with an explicit budget transfer R10CFDefine a process for review and approval of request for using the Barn R11CFEstablish host based replacement process as it is outlined in section 5 R12Bernd Propose and agree with IT management on a standing justification for adding an option for 20% additional volume to future FC papers R13CFMove to standard scheme with two procurement cycles / year targeting June and December FC meetings R14OISTest and certify Windows installation on standard bulk hardware R15All Review clusters for potential candidates for best-effort production hardware usage as defined in host-by-host replacement proposal R16DHODetermine necessary staffing to implement and operate the processes once recommendations R01 to R15 are all agreed @ ITTF tomorrow
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.