Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005.

Slides:



Advertisements
Similar presentations
Autocorrelation and Heteroskedasticity
Advertisements

What is Event History Analysis?
What is Event History Analysis?
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Chapter 5 Multiple Linear Regression
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Additional Topics in Regression Analysis
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
1 Validation and Verification of Simulation Models.
1 4. Multiple Regression I ECON 251 Research Methods.
Autocorrelation Lecture 18 Lecture 18.
Chapter 15: Model Building
Control Charts for Attributes
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
1 2. Reliability measures Objectives: Learn how to quantify reliability of a system Understand and learn how to compute the following measures –Reliability.
1 Time Scales Virtual Clocks and Algorithms Ricardo José de Carvalho National Observatory Time Service Division February 06, 2008.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
17. Duration Modeling. Modeling Duration Time until retirement Time until business failure Time until exercise of a warranty Length of an unemployment.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Data Mining Chun-Hung Chou
Chapter 1: Introduction to Statistics
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Simple Linear Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
1 Survival Analysis Biomedical Applications Halifax SAS User Group April 29/2011.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
HSRP 734: Advanced Statistical Methods July 17, 2008.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
[Topic 11-Duration Models] 1/ Duration Modeling.
Copyright © 2009 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
© 2016 Minitab, Inc. Reliability for Your Company's Survival Bonnie Stone, Minitab October 19,
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
F-tests continued.
BINARY LOGISTIC REGRESSION
Unit 5: Hypothesis Testing
Proportional Hazard Models
Chow test.
CHAPTER 18 SURVIVAL ANALYSIS Damodar Gujarati
What is Regression Analysis?
Introduction to Logistic Regression
Significance Tests: The Basics
If we can reduce our desire,
Test Drop Rules: If not:
Correlation and Causality
Fractional-Random-Weight Bootstrap
Presumptions Subgroups (samples) of data are formed.
Presentation transcript:

Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005

2T. Petrou 17 Aug 2005 Presentation Outline ESP Overview Survival Analysis Review Dataset Explanation Problems Modeling Process and Improvements NPV Calculations for ESP’s Conclusions

3T. Petrou 17 Aug 2005 ESP overview More than 60 percent (and rising) of producing oil wells require some type of assisted lift to produce the recoverable oil. ESPs are typically used where there is insufficient pressure to lift the fluids to the surface (typically in older, more watered- out wells). Provide cost effective production by boosting fluid production from these less efficient, older reservoirs.

4T. Petrou 17 Aug 2005 Survival Analysis (SA) Review Survival Analysis refers to the statistical procedures for modeling the time until an event occurs. Censoring occurs when a pump has yet to fail at the time of data collection.

5T. Petrou 17 Aug 2005 Capable of providing insight into which explanatory variables significantly affect run times. Predict run times of ESPs given various values of the explanatory variables. Generate estimated survival curves -Produce a bond-type risk ranking scheme -Provide annuity-type NPV calculations for ESP value -Simulate sample reservoir ESP usage Survival Analysis Benefits

6T. Petrou 17 Aug 2005 Survivor and Hazard Functions Survivor function S(t) Gives the probability that an individual survives longer than time t Hazard function h(t) Gives the instantaneous potential per unit time for failure given that the pump has survived up to time t The models applied are defined in terms of the hazard function.

7T. Petrou 17 Aug 2005 Generating Survival Curves Three main methods: Non-parametric (Kaplan-Meier) Parametric (exponential, Weibull, etc…) Semi-parametric (Cox Proportional Hazards) –Factors and covariates are compared to a baseline hazard function –Allows us determine which combination of potential explanatory variables are most significant

8T. Petrou 17 Aug 2005 Formulation of Cox Proportional Hazards Model Given two pumps (R and C), made by two different manufacturers, their hazard functions would be, where is a constant known as the relative risk. If is less than 1 then pump R would be less likely to fail at any given time. Since the relative hazard cannot be negative, we let The comparative baseline level can be arbitrarily chosen. If a different baseline level is chosen, the parameters would change but all statistical significance tests would remain the same.

9T. Petrou 17 Aug 2005 Step-wise modeling overview 1.Data transformation with expert collaboration 2.Step-wise model selection with factor collapsing 3.Model verification and validation 4.Model implementation Once all steps are complete, an automated process can then be set up for quick statistical ESP analysis.

10T. Petrou 17 Aug 2005 Data Introduction The data contains nearly different records of ESPs from around the world. There are 58 explanatory variables consisting of factors and covariates. Problems with large data: Difficult to find correct model Very time consuming Inconsistencies abound Problems with this data High correlation (multicollinearity) Low-failure occurrences Missing data Pragmatic Approach Different subsets of data were chosen.

11T. Petrou 17 Aug 2005 Highly Correlated Data The best way to alleviate multicollinearity issues is to work with someone that has expert knowledge of the database to remove redundant explanatory variables. In the absence of an expert, sifting through the data by hand is a must. Producing a cross-table of the data is one method to find variables that are highly correlated. SYSMFG PMPMFG A perfect one-to-one correlation is found. Removing one of the variables is necessary.

12T. Petrou 17 Aug 2005 Removing Data Variables exhibiting the near one-to-one correlation were removed. There were also many other variables that were subsets of one another There might possibly be a chance to replace some variables with the variables that are subsets of them. Knowing one variable level can possibly give information about 15 others. Reducing the data will help with model interpretation as well as computing time.

13T. Petrou 17 Aug 2005 Transforming Low Counts and Missing Data All factors in the data can be comprised of several levels each. Levels with low counts can severely skew the model building process. To alleviate this problem, all levels were required to have at least 15 records. Missing data was also an issue. Several variables had more than half their values recorded as ‘NA’. If the NA group contained more than 15 entries then, this group was changed to a level named ‘Unknown’ Again, collaboration with an expert is needed to investigate the cause for the missing entries.

14T. Petrou 17 Aug 2005 Data With No Failures Right censored data can make for difficult analysis A factor level with no failures is essentially implying that an ESP will never fail. No information about failure rate is being provided. To alleviate this problem, the levels can be eliminated from the data all together or combined with another level with help from an expert.

15T. Petrou 17 Aug 2005 Model Selection Once a ‘good’ set of data is produced, a step-wise procedure will add or remove variables one at a time until a statistically ‘best’ model is found. Different combinations of explanatory variables will affect selection procedure. The step- wise procedure is conservative and will tend to keep variables in the model that might not be necessary. Once this model is found, each variable is looked at individually and a decision is made whether or not to drop the variable.

16T. Petrou 17 Aug 2005 Factor Collapsing Once a final model is chosen, a procedure to combine levels of similar hazards is began.

17T. Petrou 17 Aug 2005 Model Validation A valid model is one that is consistent, reliable and not sensitive to small changes in the data. Methods to check validity: Randomly split data and retrieve a new model for each half and compare. Randomly split data and use model found for first half to model second half and compare coefficients Use a bootstrapping method to obtain many different sets of data and apply the model building procedure Obtain new data, repeat model building procedure and compare. This method could be useful to see how the model changes over time.

18T. Petrou 17 Aug 2005

19T. Petrou 17 Aug 2005 Conclusions Pragmatic risk ranking and valuation tools for ESPs have been created Pragmatic tools for dealing with large, sparse, and inconsistent data as well as Modeling this data in a consistent fashion

20T. Petrou 17 Aug 2005