The use of recorded information Adjust PM task intervals & PM tasks Assess desirability of additional PM tasks Eliminate unnecessary (over intensive) PM tasks Improve failure response Redesign and Design In order to This is a continuous process! to improve OEE, reliability, safely, at lowest cost by assessing: Failure Data Inspection findings
The functions of a CMMS knowledge base 1. To determine the types of failures the equipment is actually exposed to as well as their frequencies 2. To expose the consequences of each failure, ranging from direct safety hazards through serious operational consequences, high repair costs, long out-of-service times for repair, to a deferred need to correct inexpensive functional failures 3. To confirm that functions originally classified as evident (during RCM analysis) are in fact evident to operating personnel during the normal performance of duties 4. To identify the circumstances of failure in order to determine whether the failure occurred during normal operation or was due to some external factor (accidental damage) 5. To confirm that on-condition (CBM) inspections are really measuring the reduction in resistance to a particular failure mode 6. To inform us of the actual rates of reduction in failure resistance in order that we may determine optimum inspection intervals
The Purpose of a Reliability-Centered Knowledge Base 7. To record the mechanism involved in certain failure modes in order to identify new forms of on-condition inspection (CBM) or parts that require design improvement, or improve diagnostic response. 8. To identify those tasks assigned originally as default actions but that do not prove applicable and effective 9. To identify maintenance packages that are generating few trouble reports and are candidates for longer interval schedules 10. To identify items that are not generating trouble reports 11. To record the working ages of assets and components at which failures occur In summary, the purpose of a Reliability-Centered Knowledge Base is to use all of the above to IMPROVE ASSET OEE AT LOWEST COST, SAFELY
The UML Context Diagram Unified Modeling Language So we want to build some kind of a system. We might call it a reliability information system. And that system must perform the functions that we previously identified. They are represented by the ovals called use cases. And there are several actors who interact with our knowledge system. We recognize among others the Maintainer himself, as well as the operator, maintenance engineer, and so on. And the actors may be other systms - the cmms, even some kind of prognostic agent that uses historical information to recommend actions in the current situation. This diagram is called a UML Context diagram. UML stands for Unified Modeling Language. It is a diagramatic language used by IT developers to communicate the details of complex business solutions.
Data model
Use Case Diagram - Complete the work order form
The UML use case diagram Simplified guidelines and training document, Accessible examples, Supervision, discussion Support, Evolving failure codes , Revision and Audit capabilty But there's a problem. That problem can be addressed in yet another UML diagram, called a use case diagram. The maintainer is expected to perform the function complete the work order form. But in the previous entity relationship diagram we said that each work order should refer to a record in the RCM knowledge base. What if no such record exists? Either the RCM team has not gotten to that item yet, or they simply overlooked that particular failure mode that has occurred. We are going to have to extend the use case with another use case - Insert a record in RCM_Table. Now here's the next question. Can we honestly expect the maintainer to add a RCM record in little time, alone, when a dedicated RCM team develops that knowledge benefiting from adequate time and resources? Obviously the answer is "no". Yet, some how we must capture this valuable knowledge at this opportune moment. How does our proposed system solve that problem? Here are a few ways... Now we’re not going to solve all of these system requirements during this half hour talk. But let me provide some ideas in another UML diagram called a sequence diagram
UML sequence diagram A sequence diagram has the objects at the top, and time proceeds downwards. So here we have an object of the Maintainer class, say Joe. And he performs that extended use case that we just described. And so a new object is created slightly lower on the time line. That object sends a message, invoking the "verify" operation of an object of the "maintenance supervisor" class. He verifies and track edits the record. Whereupon the record messages the Maintainer object to review the changes. Finally, the Mainainer object may initiate a face-to-face discussion, perhaps at their daily meeting, with his supervisor regarding the updated record. … So eventually we grow our reliability knowledge system. So what? What are we supposed to do with that knowledge that we have so painstakingly inserted into our knowledge base? ....
Anticipated vs. Actual Experience Compared failure modes in FMEA to those encountered in the field Turbofan Engine Failure modes “anticipated” Extracted from CaseBank study published in IEEE Paper “Comparison of FMEA and Field Experience for a Turbofan Engine with Application to Case-Based Reasoning” 610 FMEA Overlap of 142 Failure modes This illustrates data from a CaseBank paper published by IEEE. The main points of this illustration are . . . Many of the anticipated failures have not yet been recorded in field experience. There is a significant need to capture and integrate field experience in a diagnostic system. Part of the reason for the discrepancy between FMEA and Field Experience is that . . . Many FMEA items are gas-path components, not “line-replaceable units”, and so would not recorded as a field experience failure in any event. Many of the Field Experience causes would not appear in the FMEA in any event. (accessories, off-engine components, cross-system effects where the root cause is component in another system, and operation of installation error) Failure modes “experienced” (LRUs primarily) 727 Field Experience
… KPI Quality Loss Machine Malfunction Improved maintenance policies Incident 1 Function, Failure, Cause, Effects, Consequences Incident 2 … Incident “n” SpotLight CBR OSISoft PI Etc… IVARA EXP EXAKT ABB Real-TPI DLI ExpertAlert KPI Quality Loss Improved maintenance policies Machine Malfunction Process Linked to RCKB
Reliability terminology f(t) is the probability density function (PDF). F(t) is the cumulative distribution function (CDF) It is the area under the f(t) curve from 0 to t.. (Sometimes called unreliability or the cumulative probability of failure.) R(t) is the survival function. (Also called the reliability function.) R(t) = 1-F(t) h(t) is the hazard function (At various times called the hazard rate, conditional failure rate, instantaneous failure probability, instantaneous failure rate, failure rate, the inverse of failure resistance, failure risk, and risk.) h(t) = f(t)/R(t) MTTF is the average time to failure. (Also called the mean time to failure, expected time to failure, average life.) MTTF = =
Operating age since last shop visit (flight hours) The effects of gradual improvement 0.4 June – August 1964 0.3 August – Oct. 1964 Conditional probability of failure for 100 hour intervals 0.2 Oct. – December 1964 0.1 Slide 1 [Exhibit 5.3] The conditional probability of failure graphs show how age-exploration and the resulting engineering and maintenance improvements gradually overcame the dominant failure modes on the JT8D engine 1.1 The curve continued to flatten until it eventually showed no relationship of engine reliability to operating age. This is indeed a typical cycle for all new equipment. The question is how quickly the optimum reliability state is achieved. Age exploration as a basis for an effective maintenance information strategy will make sure that this happens in minimal time and at lowest cost. January – Feb. 1966 May – July 1967 October – December 1971 2000 4000 6000 8000 Operating age since last shop visit (flight hours)
Questions What is the optimum reliability state? How quickly can we achieve the optimum reliability state? What actions do we take to accelerate the process? and How do we measure our progress to that end?
Operating age since last shop visit (flight hours) Decreasing failure rate 2.0 1.0 0.9 0.8 Failure rate (failures per 1000 hours) 0.7 Experience 0.6 0.5 0.4 0.3 Date of forecast Forecast Slide 1 [Exhibit 5-4] Improvements in reliability as a result of applying a maintenance information strategy based on the principles of age exploration may be expressed as a decrease in failure rate. The graph shows the actual failure rates of the JT8d engine compared to a forecast improvement in reliability which is characteristically exponential when the age exploration is used. 1.1 The temporary deviation from the forecast level in this case was the result of a new dominant failure mode which took several years to resolve by redesign. 0.2 0.1 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 Operating age since last shop visit (flight hours)
Refining the maintenance program RCM default decisions made in the absence of information Analysis of results of scheduled tasks Analysis of unanticipated failures Slide 1 1.1 The maintenance tasks added in response to unanticipated failures are only one aspect of the age-exploration process. 1.2 Once the maintenance program goes into effect, the results of the scheduled tasks provide the basis for adjusting the initial conservative task intervals 1.3 and as further data becomes available the default decisions made in the absence of information are gradually eliminated from the program.
Improvement thru analysis No other way
Operating age since last shop visit (flight hours) 0.4 0.3 Conditional probabilty of failure for 200 hour intervals 0.2 Total failures FF,PF Potential failures PF 0.1 Slide 1 [Exhibit 5-6] This graph shows the results of an age-reliability analysis to decide whether a complete overhaul of a turbine engine would be applicable. 1.1 This curve shows the conditional probability of all removals including both functional failures and potential failures. 1.2 the lower curve shows the conditional probability of functional failures as reported by operating personnel. 1.3 The distance between these two curves at any age represents the conditional probability of potential failures detected by on-condition inspections. It is functional failures that have safety or major operational consequences. The conditional probability of functional failures where on-condition maintenance is in force is constant. Functional failures are , as we can see, independent of the equipment’s working age. Scheduled overhaul will, therefore, not be applicable. We do not want to reduce the incidence of potential failures except by redesign since they are clearly effective in reducing the number of functional failures. Functional failures FF 1000 2000 3000 4000 Operating age since last shop visit (flight hours)
Working age Conditional probability of failure Total failures Failure mode C Infant mortality Failure mode A Failure mode B Verified failures Unverified failures Conditional probability of failure Slide 1 [Exhibit 5-7] 1.1 This graph shows the various age-reliability relationships that can be developed for an item subject to several different failure modes. 1.2 The upper curve shows the conditional probability for all reported failures. It represents the actual reliability of the item. 1.3 The distance between these two curves represents the probability of unverified removals other than from causes A, B, or C. 1.4 The lower curve represents the failures whose causes have been verified as failure modes A, B, or C To determine how we might improve the reliability of this item we must examine the contributions of each failure mode to the total verified failures. For example, failure modes A and B show no increase with increasing age; hence any attempt to reduce the adverse age relationship must be directed at failure mode C. There is also a fairly high conditional probability of failure immediately after a shop visit as a result of high infant mortality from failure mode A. The high incidence of early failures from this failure mode could be due to a problem in shop procedures. If so, the difficulty might be overcome by changing shop specifications either to improve quality control or to break in a repaired unit before it is returned to service. An actuarial analysis such as this can direct improvements toward a great many different areas by indicating which factors are actually involved in the failure behavior of the item. For example, the analysis of a generator failure showed that its conditional probability of failure did not increase with age until bearing failures started at an age of 2000 hours. This failure mode usually results in the destruction of the generator. A generator rework task then was cost effective at 4000 intervals. Working age
CBM Effectiveness Comparison CBM effectiveness is related, ultimately, to how "good" the condition data is.
Elimination of CBM 10 5 1971 1972 1973 Calendar quarters Start of borescope inspection, 125-cycle intervals Inspection interval reduced to 30 cycles Modification started Number of functional failures/quarter Modification completed 5 Inspection requirement removed Slide 1 Exhibit 11-13. History of the C-sump problem in the General Electric CF6-6 engine on the Douglas DC-10. The on-condition task instituted to control this problem had to be reduced to 30-cycle intervals in order to prevent all functional failures. The precise cause of this failure was never pinpointed; however, both the inspection task and the redesigned part covered both possibilities. Once modification of all in-service engines was complete no further potential failures were found, and the inspection requirement was eventually eliminated. 1971 1972 1973 Calendar quarters
Acquiring Maintenance Information 4 Acquiring Maintenance Information Using living RCM software
MIMS – small mod, big impact
Event type FF - the ending and renewal of a component (failure mode) due to a functional failure PF - the ending and renewal of a component (failure mode) due to having detected a potential failure in time to avoid the more dire consequences of a FF. S - the ending and renewal of a component (failure mode) for any reason other than (functional or potential) failure. (For example preventively replacing the component.) B - the beginning of the life of a component in the item (if not FF, PF, or S) BSA - the beginning of a period of temporary removal (suspended animation) of a component from the item. ESA - the return of the same component to the item after a period of suspended animation SA - the beginning and ending of a period of suspended animation if reported on the same work order. MR - the minor (non-rejuvinating) repair of the item. It does not renew any components. Sometimes it will impact the monitored data. For example, a calibration, a shaft alignment, an oil change, the balancing of an impeller, and so on.
Creating the Events table with Living RCM Generate a report of all work orders related to the item during the sample period. Update each work order with the significant failure mode(s) and event type Generate the Events table using the CMMS report writer and feed it to the LRCM software. Perform reliability analysis (e.g. Pareto, Weibull, Jackknife, EXAKT)
A policy agent? CBM Databases (vibration, oil analysis, control system historian) The “right” decisions CMMS and Process Databases (events, failures, replacements, minor repairs, mission requirements …) Agent
The demonstration Decision Click for movie ICHM 20/20 Agent Drive Unbalanced rotor Switch Ramp Tee Accelerometer Tee Ramp Spring Click for movie
Preferred policy for 3 reasons - it is often the strategy that is the: 6 Deciding on CBM Preferred policy for 3 reasons - it is often the strategy that is the: Least costly Least intrusive Least tolerant of failure RCM Guide
Equipment Analysis Decision Resourcing Yearly Sched. Quarterly Sched. 1 2 3 4 5 6, 7 Equipment Function Failed state Cause Effects Conseq Task Training Training resources Procurement Stocking Lead times Purchasing decisions Scheduling one time Labor Skills Parts Consumables Outside services Tools Test equip Yearly Sched. Quarterly Sched. Monthly Sched. Task: tools, materials, safety procedures Planning (one time) Slide 1 Since the consequences of failure must be fully understood, the only way to do is through a rigorous analytical and decision process. … animation notes to be added
The RCM decision algorithm favors CBM