CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical.

Slides:



Advertisements
Similar presentations
Chapter 4: Basic Estimation Techniques
Advertisements

Normal distribution Learn about the properties of a normal distribution Solve problems using tables of the normal distribution Meet some other examples.
Transportation Problem (TP) and Assignment Problem (AP)
An article on peanut butter reported the following scores (quality ratings on a scale of 0 to 100) for various brands. Construct a comparative stem-and-leaf.
1 Functions and Applications
Shape Ogives and Timeplots
Copyright © Cengage Learning. All rights reserved.
The General Linear Model. The Simple Linear Model Linear Regression.
SPF workshop February 2014, UBCO1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Ch. 2: The Art of Presenting Data Data in raw form are usually not easy to use for decision making. Some type of organization is needed Table and Graph.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Charts & Graphs.
Spreadsheet Problem Solving
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Objectives (BPS chapter 1)
Systems and Matrices (Chapter5)
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical.
SPREADSHEET BASICS SPREADSHEET BASICS What are the benefits of using a spreadsheet to solve a problem?
Demographic Profiles of Agency Clients - Part 2 Next, we will create a table and a column chart for the conservator field in my database. Because we are.
Summarizing Bivariate Data
Parameterization. Section 1 Parametrically Defined Curves.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 3 Graphical Methods for Describing Data.
GrowingKnowing.com © Frequency distribution Given a 1000 rows of data, most people cannot see any useful information, just rows and rows of data.
1 7. What to Optimize? In this session: 1.Can one do better by optimizing something else? 2.Likelihood, not LS? 3.Using a handful of likelihood functions.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
SPF workshop February 2014, UBCO1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first parametric SPF CH6: Which fit is fitter CH7: Choosing the objective function.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Math 409/409G History of Mathematics
SPF workshop UBCO February CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Machine Learning 5. Parametric Methods.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
Quality Control  Statistical Process Control (SPC)
Chapter 3: Describing Relationships
The Normal Approximation for Data. History The normal curve was discovered by Abraham de Moivre around Around 1870, the Belgian mathematician Adolph.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
Click once to reveal the definition. Think of the answer. Then click to see if you were correct. Spreadsheet / Workbook A grid of rows and columns containing.
The number of procedures by type and data about rev/costs. The summary section, includes a change area, which shows the difference between the current.
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Linear Algebra Review.
Demand Point Aggregation for Location Models Chapter 7 – Facility Location Text Adam Bilger 7/15/09.
Normal Distribution and Parameter Estimation
Basic Estimation Techniques
Using Excel to Graph Data
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
CHAPTER 26: Inference for Regression
Discrete Event Simulation - 4
Teacher’s Notes This sequence of slides is designed to introduce, and explain, the idea of Graphs in practical work, as explained on pages in.
26th CARSP Conference, Halifax, June 5-8, 2016
Graphing Linear Equations
POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.
An AS Lesson Using the LDS to teach content on Data Collection and Processing.
M248: Analyzing data Block D UNIT D2 Regression.
Using Excel to Graph Data
Chapter 7: The Normality Assumption and Inference with OLS
Presentation transcript:

CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical stuff Ch9: Adding variables CH11. Choosing a model equation 1 6. Which fit is fitter In this session: 1.What makes for a good fit 2.Introducing the CURE plot 3.Eliminating ‘overall bias’ 4.The bias of a fit 5.Using the CURE plot

2 What makes for a good fit? Common ‘goodness-of-fit’ measures: R 2, χ 2, AIC,... These are ‘overall’ (single-number) measures. For application SPF they are insufficient. Recall… Two perspectives on SPF E{  } and  = f(Traits, parameters) Applications centered perspective Cause and effect centered perspective

3 One judges the fit of a model by its residuals. In SPFs for applications a fit is thought good only if the residuals are closely packed around 0 everywhere. Perhaps acceptable SPF Workshop February 2014, UBCO

4 The main figure of merit for SPFs: Unbiased Everywhere Fitted is too large Fitted is too small But this one is not!

SPF Workshop February 2014, UBCO 5 Informative? The usual residual plot

6 But, when the same residuals are cumulated From spreadsheet Compute Residual → Cumulate → Plot SPF Workshop February 2014, UBCO

7 The CURE Plot Now one can see! 0-A, B-C, E-F: Observed>Fitted, not good; A-B, D-E, Fitted>Observed, bad; Where the drop is precipitous there may be outliers. Residual: Observed - Fitted SPF Workshop February 2014, UBCO

8 Benefits: 1. Chaos is replaced by clarity. 2.We can recognize a good model. 3.The cost of parameterization is clear.. (2) What should a good CURE plot look like? Should not have long up or down runs Should not have vertical drops Should meander around the horizontal axis

SPF Workshop February 2014, UBCO9 (3) The cost of parametric curve fitting is now manifest Imposing the function 1.675×(Segment Length) on the data causes bias almost everywhere! No bias Biased estimates Bad decisions Real costs

SPF Workshop February 2014, UBCO10 How much bias is there? Accumulated Accidents Fitted Accidents BiasBias/ Fitted Accident Origin to A A to B B to C... TAB=Total Accumulated Bias =303+|-688|+...

11 When the scale parameter is determined by ‘Solver’ the sum of fitted values is usually not the same as the sum of crash counts. This is a blemish. To remove this blemish, add constraint Levelling the playing field Open spreadsheet #7. OLS with constraint

12 click How to add constraints

SPF Workshop February 2014, UBCO13 With constraint Now click ‘Solve’ to get

SPF Workshop February 2014, UBCO14 When is a CURE plot good enough? Open (again): #7 OLS with constraint Open: #8 CURE computations After SOLVER with constraint was used you should now see: Copy values in columns A, B, D and E into CURE spreadsheet

SPF Workshop February 2014, UBCO15 Copied Important step: On ‘DATA’ tab choose ‘Sort’ and sort in ascending order by ‘miles’

SPF Workshop February 2014, UBCO16 Now add columns E, F, and G, Note that for the last row (n=5323) the Cumulated Residuals=0. Why? C4-D4 F3+E4

SPF Workshop February 2014, UBCO17 Below is a plot of segment length (column B) against cumulative residuals (column F) Segment Length Cumulative residuals Upward drift means that in this range ‘observed’ tends to be consistently larger than ‘fitted’. Vertical gap is possible ‘outlier’ Truncated at 3 miles The question was when a CURE plot is good enough.

SPF Workshop February 2014, UBCO18 Computing the limits which a random walk should seldom exceed. Details in text. The last ‘cumulated squared residual’ +2  ’ -2  ’

SPF Workshop February 2014, UBCO19 40% within ±0.5  ’ Stop, you are in danger of overfitting. Rule of thumb: 95% within ±2  ’. This fit does not pass muster. Guidance:

20 Which fit is better? Objective Function   ∑ squared differences ∑ absolute differences The steeper the run the larger the bias; Red increased A to B bias. Black is better

SPF Workshop February 2014, UBCO21 Summary for section 6. (Which fit is fitter?) 1.For SPFs the main figure of merit is when the fit is unbiased everywhere; 2.For applications R 2, χ 2, AIC,... ‘overall’ measures are of limited use; 3.The usual plot of residuals is not informative; the CURE plot opens one’s eyes; 4.We show how to compute bias and Total Accumulated Bias. The cost of parametric C-F was manifest; 5.It is clear what a good CURE plot should look like; 6.By adding a constraint we eliminated overall bias;

SPF Workshop February 2014, UBCO22 7.We computed ±2  ’ limits and provided guidance on when a CURE plot is acceptable and when overfitting is a danger; 8.We showed how to decide which of two CURE plots is better. 9. All fits were bad. Perhaps, partly, because minimizing SSD is not good since crash count distributions are not symmetrical. What should be optimized? Next.