Lecture Notes 9 Prediction Limits

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Special random variables Chapter 5 Some discrete or continuous probability distributions.
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Evaluating Classifiers
Sampling Distributions (§ )
Chapter 6: Model Assessment
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
What is a sample? Epidemiology matters: a new introduction to methodological foundations Chapter 4.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Review of Basic Probability and Statistics
Ekstrom Math 115b Mathematics for Business Decisions, part II Sample Mean Math 115b.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Simulation Modeling and Analysis
Estimation of parameters. Maximum likelihood What has happened was most likely.
The moment generating function of random variable X is given by Moment generating function.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Standard error of estimate & Confidence interval.
Zhangxi Lin ISQS Texas Tech University Note: Most slides in this file are sourced from Course Notes Lecture Notes 8 Continuous and Multiple.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Chapter 5 Modeling & Analyzing Inputs
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Statistics & Flood Frequency Chapter 3 Dr. Philip B. Bedient Rice University 2006.
AP Statistics Section 7.2 C Rules for Means & Variances.
Lecture 15: Statistics and Their Distributions, Central Limit Theorem
1 Since everything is a reflection of our minds, everything can be changed by our minds.
Problem: 1) Show that is a set of sufficient statistics 2) Being location and scale parameters, take as (improper) prior and show that inferences on ……
Section 7.2 P1 Means and Variances of Random Variables AP Statistics.
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
3.1 Statistical Distributions. Random Variable Observation = Variable Outcome = Random Variable Examples: – Weight/Size of animals – Animal surveys: detection.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
Simple linear regression and correlation Regression analysis is the process of constructing a mathematical model or function that can be used to predict.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Lecturer: Ing. Martina Hanová, PhD..  How do we evaluate a model?  How do we know if the model we are using is good?  assumptions relate to the (population)
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
The Exponential and Gamma Distributions
Regression Analysis: Statistical Inference
Ch12.1 Simple Linear Regression
Chapter Six Normal Curves and Sampling Probability Distributions
Multiple Regression and Model Building
Chapter 7: Sampling Distributions
Handout THQ #5 at end of class.
Linear Combination of Two Random Variables
Multivariate Probability Distributions
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.
BA 275 Quantitative Business Methods
CI for μ When σ is Unknown
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Scatter Plots of Data with Various Correlation Coefficients
The Gamma PDF Eliason (1993).
Distributions and Densities: Gamma-Family and Beta Distributions
Statistics & Flood Frequency Chapter 3
Additional notes on random variables
Sampling Distribution of a Sample Proportion
CHAPTER 15 SUMMARY Chapter Specifics
Additional notes on random variables
Use the graph of the given normal distribution to identify μ and σ.
Simple Linear Regression
CHAPTER- 3.1 ERROR ANALYSIS.
Section Means and Variances of Random Variables
Section Means and Variances of Random Variables
Sampling Distribution of a Sample Proportion
Sampling Distributions (§ )
Machine Learning: Lecture 5
Presentation transcript:

Lecture Notes 9 Prediction Limits Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides in this file are sourced from SAS@ Course Notes

Section 3.1 Profit Variability

Random Profit Consequences Primary Decision random deterministic Profiti = Yi - costsi y Profit

Conditional Profits Profiti = Yi - costsi Primary Decision random deterministic Profiti = Yi - costsi y y

Expected Profit Consequence Primary Decision random deterministic Profiti = Yi - costsi Primary Outcome Secondary d(y|xi) EPCi = E(Yi) - costsi y = p(xi)·D(xi) - costsi ^ y

Predicted Profit Plots N=96,367 Scaled Total Profit Overall Average EPCi Σ 10 20 30 40 50 60 70 80 90 % selected $10,000 $12,000 $14,000 $16,000 $8,000 $6,000 $4,000

Predicted and Observed Profit Plots Overall Average Profit EPCi Σ 10 20 30 40 50 60 70 80 90 % selected OPi (training) $10,000 $12,000 $14,000 $16,000 $8,000 $6,000 $4,000 Scaled Total

Predicted and Observed Profit Plots Overall Average Profit EPCi Σ 10 20 30 40 50 60 70 80 90 % selected OPi (training) OPi (validation) $10,000 $12,000 $14,000 $16,000 $8,000 $6,000 $4,000 Scaled Total

Predicted and Observed Profit Plots Overall Average Profit Scaled Total Profit Sum of independent r.v. (not i.d.) N=96,367 Lyapunov conditions  var(Σ)=Σvari $10,000 $12,000 $14,000 $16,000 $8,000 $6,000 $4,000 EPCi Σ OPi (training) Σ OPi (validation) Σ 10 20 30 40 50 60 70 80 90 % selected

Beyond Expectations: Variability in Profit Profiti = Yi - costsi EPCi = E(Yi) - costsi ^ ^ = p(xi)·D(xi) - costsi Var( Profiti ) = Var (Yi) = E(Yi2) – (EYi)2

Beyond Expectations: Variability in Profit Profiti = Yi - costsi E( Profiti ) = E(Yi) - costsi = p(xi)·D(xi) - costsi ^ Var( Profiti ) = Var (Yi) = pi·[E(Di2)-Di2·pi] need to estimate

Some Second Moment Estimates  Normal* σ2 + Di2  Poisson Di + Di2  Gamma Di2 ·(1+1/σshape)  Lognormal Di2 ·exp(σ2) Distribution Estimate ^

Some Profit Variance Estimates Distribution Estimate ^ ^ ^ ^ ^  Normal* pi·Di2 [ 1–pi + σ2/Di2 ] ^ ^ ^ ^  Poisson pi·Di2 [ 1–pi + 1/Di ] ^ ^ ^ ^  Gamma pi·Di2 [ 1–pi + 1/ σshape ] ^ ^ ^ ^  Lognormal pi·Di2 [ 1–pi + exp(σ2)–1 ]

Profit Plots with Tolerance Limits Overall Average Profit EPCi Σ 10 20 30 40 50 60 70 80 90 % selected OPi EPCi ± 2 √Σ Var(Profiti) $10,000 $12,000 $14,000 $16,000 $8,000 $6,000 $4,000 Scaled Total

Profit Plots with Tolerance Limits Overall Average Profit EPCi Σ 10 20 30 40 50 60 70 80 90 % selected OPi (training) OPi (validation) $10,000 $12,000 $14,000 $16,000 $8,000 $6,000 $4,000 OPi (score) Scaled Total

for “solicit everyone” model 1998 KDD-Cup Results 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. $14,712 14,662 13,954 13,825 13,794 13,598 13,040 12,298 11,423 11,276 Total Profit Rank $0.153 0.152 0.145 0.143 0.141 0.135 0.128 0.119 0.117 Overall Avg. Profit 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. $ 10,720 10,706 10,112 10,049 9,741 9,464 5,683 5,484 1,925 1,706 $ 0.111 0.111 0.105 0.104 0.101 0.098 0.059 0.057 0.020 0.018 $10,560 $ 0.110 Total profit Avg. profit for “solicit everyone” model

Prediction Limits: The Good Quantifies uncertainty in expected profit estimates Lends perspective to model comparisons Gives insight into model fit $ ± $

Prediction Limits: The Bad Does not account for model variability Skewed by outlying predictions $

Model Variability Σ Same Model Specification Same Training Data Overall Average Profit 10 20 30 40 50 60 70 80 90 % selected Same Model Specification Same Training Data Different Parameter Initialization EPCi Σ

Prediction Limits: The Ugly Requires scaling adjustments for sampling Surprises analysts/management ¡¡¡

Scaling Prediction Limits (More CLT) $100,000 $120,000 $140,000 $160,000 $80,000 $60,000 $40,000 N=963,670 Overall Average Profit 10 20 30 40 50 60 70 80 90 % selected Overall Average Profit Limits Scale by 1/√N Total Profit Limits Scale by √N Scaled Total