Presentation is loading. Please wait.

Presentation is loading. Please wait.

SNS Reliability and Maintenance Programs

Similar presentations


Presentation on theme: "SNS Reliability and Maintenance Programs"— Presentation transcript:

1 SNS Reliability and Maintenance Programs
George Dodson Research Accelerator Division Spallation Neutron Source

2 Topics Vision and Goals Enablers Performance Metrics
Management Information Systems Continuous Improvement RAMI Modeling Maintenance Management Spares/Obsolescence/Vulnerability Configuration Control

3 Vision The vision of the SNS Reliability and Maintenance Programs is an efficient, effective, reliable science facility throughout the lifetime of the SNS, currently expected to be ~40 years. Goals The goals for Accelerator systems include: 4500 Hours of neutron production beam, at greater than 90% availability at or close to the nominal power delivery capacity of the SNS. As the funding landscape shifts, achieving these goals will become more challenging. Increasingly greater demands are being placed on facility even as those staff are becoming leaner and in some cases less experienced due to retirements. As time passes, conditions change. Older equipment becomes obsolete and new equipment is added on a continuous basis. As a result, facilities are being operated and maintained under continually changing conditions. These changes will produce a new dynamic for our organization that adds to the facility maintenance challenges that we will face. Our goals can be met in this challenging environment by developing best practices associated with an Integrated Maintenance Program structure and functionality. We must develop a maintenance processes that identifies causes of potential equipment failures, effectively monitors and assesses equipment condition, and proactively plans for equipment maintenance. This organization will more effectively utilize our staff by increasing their proficiency by applying standard processes, facilitating peer collaboration, completing databases to support condition- based maintenance, and documenting case histories.

4 SNS Accelerator Complex
Front-End: Produce a 1-msec long, chopped, H- beam LINAC: Accelerates the beam to 1 GeV Accumulator Ring: Compress 1 msec long pulse to 700 nsec H- stripped to protons Deliver beam to Target 2.5 MeV 87 MeV 186 MeV 387 MeV 1000 MeV Ion Source RFQ DTL CCL SRF, b=0.61 SRF, b=0.81 Chopper system makes gaps 945 ns Current 1ms mini-pulse Current 1 ms macropulse Monthly Metrics for August, 2006

5 SNS Goals Year Neutron Production Availability
Integrated Beam Power (MW-hrs) Commitment Actual FY2007 68.0% 65.7% 117 159 FY2008 74.0% 72.0% 877 945 FY2009 80.0% 80.7% 2031 2166 FY2010 85.0% 85.6% FY2011 88.0% 92.0 FY2012 90.0% 92.7(94.0%) FY2013 72.4(89.4%) Year Neutron Production Hours Total Operating Hours Commitment Actual FY2007 1500 2078 3500 3779 FY2008 2700 2807 4000 4032 FY2009 3553 4500 4916 FY2010 3900 4250 4800 5310 FY2011 4300 5437 5000 5941 FY2012 5098 5746 FY2013 4202 5120

6 Enablers The SNS Reliability and Maintenance Program is a facility-wide program for achieving the SNS primary beam delivery goals while maintaining and improving SNS Facilities in a cost-effective manner over the lifetime of the facility. The core of this program is a Reliability Centered Maintenance program. It is surrounded by a number of linked Management Information Systems (MIS), Other Systems and specific Policies and Procedures using applicable industrial standards. These systems include; A Beam-time/Downtime Tracking System and Electronic Logbook A Performance Metrics Reporting System A Computerized Maintenance Management System (CMMS) A Document Control System (DCS) linked to the CMMS A Work Request/Planning/Scheduling System in or linked to the CMMS A Reliability (RAMI) Modeling System A Spares Plan linked to an Equipment Obsolescence Plan A Vulnerability analysis of “single point” and/or “long time to recover” failures A process for driving continued improvement in Equipment Design and Operation A Configuration Control System to keep you from doing STUPID THINGS

7 Major Components Document Control Fault Reporting Goals
Reactive Maintenance < 10% Work Planning - Scheduling CMMS Performance Metrics Preventative Maintenance 25-35% RAMI Model SNS Integrated Maintenance Program Reliability Centered Maintenance Predictive Maintenance 45-55% Configuration Control for Upgrades and New Equipment /Systems Spares Plan Equipment Obsolescence Plan Testing and Inspection FMEA Equipment Design Considerations Equipment Operations Considerations

8 Management Information Systems (Oracle) Acquire the Data
Beam Time Accounting Operations Administration System (OAS) Shift by Shift account of downtime Electronic Logbook Narrative account of shift activities including threaded discussion of breakdown and repair CMMS – DataStream 7i (Infor) Equipment Tracking Asset Structure tables with parent-child relationships “Cradle to Grave” tracking by position, location, asset Asset status (Installed, In-Repair, Spare, Disposed Of) Work Control Use the same “Data Structures” for each: System, Sub-System, Sub-Sub- System , Sub-Sub-Sub-System, Asset, Position. Location All 3 MIS Systems “Tied Together” through the Work Order Numbers

9 OAS Shift Closeout

10 Operations Metrics Report for September 23-29, 2013 (Run FY13-2)
Research Accelerator Division Spallation Neutron Source

11

12 Operating Statistics – September 23-29, 2013

13 Unscheduled Downtime – September 23-29, 2013

14 Unscheduled downtime for the last week ≥ 0.2 hrs.
Unscheduled downtime by number of occurrences >1 (beam and non-beam downtime combined)

15 MPS trip summary

16 Hours / week - Target / Down / AP

17 Operating Statistics – FY13 to date

18 Down Time – Pareto Chart for FY 13 to date

19 RTBT_Diag:BCM25I:Power60 Beam power on Target (60 sec
RTBT_Diag:BCM25I:Power60 Beam power on Target (60 sec. average) for the last week MW peak

20 Energy and power on target from October 2006

21 Beam hrs. to Target & Avg. kW/hr as of Sept. 29, 2013

22 NP availability by week

23

24

25

26

27

28 Machine Issues: Ion source
Arcing causing 13 MHz and Edmp power supply trips RFQ Chiller 2 PID tuning (0.5 C overshoot when RF is turned off and back on) Cryopump regen Verify all warm linac arc detectors are working properly No ion pump faults in DTL2 without RF DTL3 winair arcs and vacuum burst in the tank If venting is necessary during 2 week shutdown then replace DTL2 IP202 CCL2 klystron window arcs (not sure there is enough time) Arcs have returned after waveguide polishing CCL2 modulator Still tripping (last trip was 9/30 on DFDC B flux saturated fault) DTL6 tank turbo pump is off

29 Analysis Identifies Problem Areas
Operations Administration System (OAS) Shift Reports E-Log entries and OAS Downtime are reported. Work Orders are created in the CMMS and entered in the E-Log. Downtime linked to Work Order Number in the OAS is reported in the Metrics Performance Metrics Fault Reporting Electronic Logbook (E-Log) Weekly Metrics and Machine Health Report List of Machine “Issues” Operational and Design Considerations Downtime and Trip Rates are evaluated in the Weekly Machine Health Report, The trend from the past week, 2 weeks ago and 3 week ago. Failure precursors are identified by increased trip rates

30 Management Information Systems (Oracle) Acquire the Data
Beam Time Accounting Operations Accounting System (OAS) Shift by Shift account of downtime Electronic Logbook Narrative account of shift activities including threaded discussion of breakdown and repair CMMS – DataStream 7i (Infor) Equipment Tracking Asset Structure tables with parent-child relationships “Cradle to Grave” tracking by position, location, asset Asset status (Installed, In-Repair, Spare, Disposed Of) Work Control Use the same “Data Structures” for each: System, Sub-System, Sub-Sub- System , Sub-Sub-Sub-System, Asset, Position. Location All 3 MIS Systems “Tied Together” through the Work Order Numbers

31 What Equipment Must be Tracked?
1. Is the equipment safety-related? 2. Is the cost of the equipment $2500 or more? 3. Is the equipment categorized as a Quality Level 1 or Level 2 item (Safety Related) 4. Does the equipment require preventative/predictive maintenance? 5. Does the equipment require periodic calibration? 6. Does the equipment contain electrical components, which are categorized as “unlisted electrical equipment,” and require inspection and approval? Manufacturer, Model, Version and Serial Number When was it built What did it arrive When and where was it installed (position, location) When it was maintained and who maintained it When did it fail, what was the root cause, who repaired it Where is it, where has it been and when (position and location)

32 Cradle-to Grave Equipment Tracking Data in the CMMS
Devices (Position /Location) MIS Database of Equipment and Spares (Assets) Receiving Tracking ID Number (barcode #) Vendor Data (Traveler) Test Data Installation Data Vendor Documents Maintenance History Fault EPICS Control System Data are in Document Control System by Tracking Number Example CCL_Vac:IP204

33 CMMS Inventory Control Cradle-to-Grave Asset History Cradle-to-Grave
Equipment Tracking Equipment Status Position-Location History Spares and Parts Management Warranty Information Tracking Work Requests/Authorizations Work Prioritization and Scheduling Work Planning Automated Time-Based PMs Resource Allocation and Scheduling Inspections/Testing Based PMs Automated Meter-Based PMs Work Documentation Post Maintenance Testing Work Execution Equipment Swaps Equipment Repair Maintenance Costs Tracking Maintenance Hours Tracking

34 Data Management Analyze and Use the Data
Build a robust data system for tracking and trending, including MTTF, MTTR, Spares Inventory, Fault Tracking, etc. Comparison of MTBF/MTTR data with the Reliability Model and industrial standards with an eye to the root cause of failures with higher than expected failure rates. Go after the highest sources of downtime Effectively utilize Control System Monitoring Data – filtering and pattern analysis to Detect the Onset of Pre-Failure Behavior so that you can replace the component in a Maintenance Period

35 Modeling: Predict the Performance Data
Modeling sets Your Expectations for Reliability/Availability for a given design: Static Model Markov Chain Model R(t) is Constant MTBF/MTTR inputs from Vendor Information and Industrial Standards Monte Carlo Model (many commercial models available) R(t) is an input function. You get to pick where you are on the function. Use Actual Performance Data to Validate the Model

36 ReliaSoft BlockSim7 – Full Accelerator Complex

37 Front End Ion Source

38 Antenna and Front End Simulation

39 Use the Model: Model subsystems, systems, eventually the whole machine
Initially use vendor data and commercial standards for MTBF Play “what if’s” with redundant systems (Hot Spares) Be certain that what you are building meets the customer’s requirements As equipment breaks you can immediately assess the impact of the measured lifetime on overall availability Use Weibull distributions with guesses at failure onset, failure rate after onset, initial stock of spares and resupply rate to predict Mean Time to Out of Stock. With actual performance data, carefully monitor transitions in performance data from Infant Mortality to Reliable Operation to the onset of Terminal Mortality to refine model parameters and your spares inventory

40 Maintenance Management
Predictive/Preventive maintenance schedules based on accepted practices for standard equipment and experience/MTTF data for specialized equipment Manufacturer data is NOT always the best EPRI Database Proactive replacement of equipment showing pre-failure behavior Effective use of scheduled and discretionary weekly maintenance opportunities Avoid “run to failure” – “replace/repair when possible” Spares inventory, not too big, not too small, just right! Proactive replacement of equipment at a pre-determined % of measured lifetime – mature facilities with lots of data

41 Configuration Control
One of the worst things that you can do at a mature, operating facility is allow changes to the design basis that, though the Law of Unintended Consequences, causes a failure that prevents the facility from operating. Corollary – Smart People Sometimes Do Dumb Things.

42 Work Control The SNS Work Control System is based around Safety then Complexity Regardless of the work being performed, the basic approach is the same: Define the Scope of Work Analyze the Hazards Develop and implement Hazard Controls Perform the Work Perform Post Work Testing Provide feedback and continuous improvement Work is requested, approved, planned, executed, completed and closed out using the CMMS.

43 Work Levels Class 1 Safety Systems (Personnel Safety) Documentation
Work Request X X ** Initial Job Hazard Analysis (JHA) X**** X *** Permits (e.g. Penetration Permit, etc.) and/or complex LO/TO – as appropriate Complex Work Information (i.e. Procedures, Instructions, Drawings, Specifications, DCN/DCD, etc.) Equivalency Evaluation- as appropriate (see section 7.3) X * Unreviewed Safety Issue Determination Screen (USID Screen or Determination that a Screening is not needed) Inspection/Test / Acceptance Criteria and results, as appropriate Approved Design Criteria Document (DCD) or Design Change Notice (DCN), as appropriate Level If Work Involves 1 Changes to a Credited Engineered Control 2 Work on a Credited Engineered Control but no change to the control 3 Work done in accordance with: Approved procedures (Operations Procedure Manual, Internal Operating Procedures, Complex Work Instructions, etc.) Drawings or specifications Permits (except Radiological Work Permits – see section 3.4) Complex LO/TO (multiple energy source and/or stored hazardous energy release required) Change to a component or system that may involve an unplanned operational impact (i.e. loss of an unrelated mission-critical system) Requiring engineering evaluation/concurrence, Unique or unusual hazards Special waste disposal requirements Requirements for access to radiological areas with Facilities and Operations (F&O) support/resources 4 Skill of the Worker, routine activities including simple LO/TO (single energy source with no stored hazardous energy release required). 5 Operational adjustments outside of Control Rooms, Evaluation, Inspection and Troubleshooting

44

45 Configuration Control Policy
Configuration management (CM) is defined as a process for establishing and maintaining consistency of a configuration item’s performance, functional and physical attributes, and its documented configuration with its requirements, design and operation information throughout its lifetime. Configuration management control begins with baselining of requirements, the Design Criteria Document (DCD and Design Change Notification DCN) processes, and ends with decommissioning of equipment. Responsibility for Configuration Control of Systems, Structures, Components and Software (SSCS) resides (at the SNS) with the System Engineer.

46 Configuration Control Objectives
To document and provide full evidence of an SSCS’s previous history (when available) and present configuration including the status of compliance of an item to its physical and functional requirements. To ensure that staff who operate, use, repair or maintain an SSCS or who have the potential to affect its configuration use correct, accurate, and current documentation. To ensure that new designs and changes to existing designs for systems, structures, components and software utilize best engineering practice, follow from an approved set of specifications, and are appropriately documented. To ensure that the deployment of a new SSCS or a change to an existing SSCS is authorized. To ensure that the impact on performance due to the deployment of a new SSCS or a change to an existing SSCS is fully understood, and that the risks associated with the deployment are considered. SNS Procedures OPM 9.A-1 SNS Configuration Management Policy OPM 9.A-2 Design Development Policy OPM 9.A-3 SNS System, Structure, Component or Software Change Procedure

47 Spares – Cold Spares Critical Equipment is equipment which is essential to the facility mission, which is traditionally defines as greater than the nominal beam delivery at greater than ~90% availability for some number of operating hours per year. Spares must be identified for critical equipment. Classes of Spares A “true spare” consisting of a “like for like or equivalent” “on the shelf, tested and ready to go “, “plug compatible” replacement unit. A “like for like or equivalent” that is installed in some other system that is not required for operation of the accelerator systems e.g. a Test Stand that must be removed from where it is being used so that it can be used as a replacement for the failed unit. A system structure or component that must be modified to be used as a spare. A system structure or component that must be purchased to be used as a spare. Only a level 1“true spare” will not contribute to down time. In all other classes, demounting, modification or procurement of the replacement will necessarily contribute to downtime. Class 4 is referred to as an “out of stock” condition The number of spares should be based on a calculation but should never be 1 ( or you are guaranteed to break it while installing it). SNS OPM 9B.-1 RAD Spares Management Policy (DRAFT)

48 Obsolescence You probably don’t want to think about this now but the MTTO is on the order of 3 years for some classes of electronics. With the manufacturing world changing rapidly, companies go out of business or are bought up and their product lines discontinued at an alarming rate. When they do your new replacements and product support may go to zero. Obsolescence Definitions: Supported: Identical New Items/Repair/Parts are available from the OEM Obsolescent: New/Repair/Parts will no longer be supplied by the OEM after a given date. Sometimes you are even notified in advance! Obsolete: New Items/Repair/Parts are no longer available from the OEM Obsolescence issues should be considered in the item life cycle to avoid risk. This means: Assess the impact, cost and probability of obsolescence Derive a Strategy Reactive – do nothing until the need arises - Emulate/Partial Redesign/Replace Proactive – Adopt a proactive strategy – Partial Redesign/Technology Transparency/Contract Support/Lifetime Buy Periodically review and monitor the situation and act accordingly.

49 Types of Maintenance Reactive Maintenance Reactive maintenance is basically the “run it till it breaks” maintenance mode. No actions or efforts are taken to maintain the equipment as the designer originally intended to ensure design life is reached. Advantages • Low initial cost. • Less staff. Disadvantages • Increased unplanned downtime of equipment. • Increased labor cost due to overtime needed for call-in repairs • Possible secondary equipment or process damage from equipment failure. • Inefficient use of staff resources.

50 Preventive Maintenance
Preventive maintenance can be defined as follows: Actions performed on a time- or machine-run-based schedule that detect, preclude, or mitigate degradation of a component or system with the aim of sustaining or extending its useful life through controlling degradation to an acceptable level. Advantages • Cost effective in many capital-intensive processes. • Flexibility allows for the adjustment of maintenance periodicity. • Increased component life cycle. • Energy savings. • Reduced equipment or process failure. • Estimated 12% to 18% cost savings over reactive maintenance program. Disadvantages • Catastrophic failures still likely to occur. • Labor intensive. • Includes performance of unneeded maintenance. • Potential for incidental damage to components in conducting unneeded maintenance.

51 Predictive Maintenance
Predictive maintenance can be defined as follows: Measurements that detect the onset of system degradation (lower functional state), thereby allowing causal stressors to be eliminated or controlled prior to any significant deterioration in the component physical state. Results indicate current and future functional capability. Advantages Increased component operational life/availability. Allows for preemptive corrective actions. Decrease in equipment or process downtime. Decrease in costs for parts and labor. Improved worker and environmental safety. Improved worker morale. Estimated 8% to 12% cost savings over preventive maintenance program. Disadvantages Increased investment in diagnostic equipment. Increased investment in staff training. Savings potential not readily seen by management.

52 Reliability Centered Maintenance
Reliability centered maintenance (RCM),RCM is a systematic approach to evaluate a facility’s equipment and resources to best mate the two and result in a high degree of facility reliability and cost-effectiveness The RCM methodology recognizes that all equipment in a facility is not of equal importance to either the process or facility safety. equipment design and operation differs and that some will have a higher probability to undergo failures from different degradation mechanisms than others. It also approaches the structuring of a maintenance program recognizing that a facility does not have unlimited financial and personnel resources and that the use of both need to be prioritized and optimized

53 Advantages • Can be the most efficient maintenance program. • Lower costs by eliminating unnecessary maintenance or overhauls. • Minimize frequency of overhauls. • Reduced probability of sudden equipment failures. • Able to focus maintenance activities on critical components. • Increased component reliability. • Incorporates root cause analysis. Disadvantages • Can have significant startup cost, training, equipment, etc. • Savings potential not readily seen by management

54 Industrial Standards for Reliability Centered Maintenance (RCM)
Reactive Maintenance < 10% Industrial Standards for Reliability Centered Maintenance (RCM) References: DOE EERE O&M Best Practices Guide Rev. 3 NASA RCM Guide 2008 Preventative Maintenance 25-35% Predictive Maintenance 45-55% Reliability Centered Maintenance Accelerator Systems have more Reactive Maintenance due to the high percentage of digital electronic systems which fail with no precursor events. Testing and Inspection FMEA

55 Since 2006 operational performance improvement at SNS has been dramatic
Without Target Failures

56 FY07-FY13 Downtime by group

57

58 In the end, you have to satisfy your customer.
SNS Management in FY10 decided to emphasize availability improvement while holding proton beam power at or near 1MW Resources were allocated to address major contributors to down time, particularly the HVCM Replacement of some highly stressed oil filled capacitors with less lossy solid units that led to fewer and lower consequence capacitor failures and easier fault recovery. IGBT drive gate synchronization turn off that reduced IGBT failures by more than a factor of 10. The single largest downtime contributor to RF systems, the MEBT RF Power Amplifiers, were replaced with new solid state devices. The 2MHz RF amplifier that drives the ion source plasma was removed from the 65KV floating deck to ground potential and is now powered through an isolation transformer, an improvement that allows for better diagnosis of failures and quicker repair. SNS Management, following the 2 unexpected target failures at the and of FY12 decided to emphasize target availability by holding proton beam power at or near 850KLW which was considered to be a safe power level for extended running. Extensive analysis of the targets was done Orders were placed for new “jet flow” targets and the original style targets, but ALL will have removable water shrouds to allow for inspection of the failure location and mode.

59 Summary The SNS has evolving Reliability and Integrated Maintenance Management Programs We are making progress We are no longer a “young” facility and that we may soon reach Terminal Mortality for many systems. The final goal is 95% availability. A Plan has been developed. It may be too costly to be implemented. Why? Going from 90% to 95% is only another 5.5% in beam delivery, but it is a factor of 2 in downtime reduction. Diminishing returns! The facility Science impact will likely be larger from another beamline instrument (Spectrometer). We will likely make more modest evolutionary (not revolutionary) changes to our operating base.

60 Backup Slides

61 Power plots from Sep. 29


Download ppt "SNS Reliability and Maintenance Programs"

Similar presentations


Ads by Google