Introduction Second report for TEGoVA ‘Assessing the Accuracy of Individual Property Values Estimated by Automated Valuation Models’ Objective.

Slides:



Advertisements
Similar presentations
Measures of Dispersion
Advertisements

Descriptive Statistics
Measures of Dispersion or Measures of Variability
Calculating & Reporting Healthcare Statistics
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Introduction to Educational Statistics
Edpsy 511 Homework 1: Due 2/6.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Describing Data: Numerical
Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Descriptive Statistics: Numerical Methods
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics Measures of Location Measures of Dispersion Measures.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Chapter 2 Describing Data: Numerical
Descriptive Statistics ( )
Measures of Dispersion
Different Types of Data
Statistics in Management
Statistics for Managers Using Microsoft® Excel 5th Edition
Business and Economics 6th Edition
Measures of Dispersion
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Topic 3: Measures of central tendency, dispersion and shape
Basic Statistics Measures of Variability.
Descriptive Statistics: Presenting and Describing Data
Single Variable Data Analysis
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Numerical Measures: Centrality and Variability
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
MEASURES OF CENTRAL TENDENCY
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Quartile Measures DCOVA
Summary descriptive statistics: means and standard deviations:
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Numerical Descriptive Measures
Numerical Descriptive Statistics
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
CHAPTER 2: Basic Summary Statistics
Measures of Central Tendency Measures of Variability
Business and Economics 7th Edition
Numerical Descriptive Measures
Presentation transcript:

Introduction Second report for TEGoVA ‘Assessing the Accuracy of Individual Property Values Estimated by Automated Valuation Models’ Objective to identify criteria which will provide an appraiser with information on the attendant uncertainty associated with individual property AVM generated valuations We need statistics to help interpret the results (the valuation) of an AVM Ten items of information requested for inclusion in AVM reports Refer to the Appendix for statistical definitions

Recommendation 1 Confidence Intervals for the AVM valuation at: i) the 50% level and ii) the 95% level

A little statistics: the normal distribution http://www.muelaner.com/wp-content/uploads/2013/07/Standard_deviation_diagram.png

The empirical rule for a normal distribution The empirical rule approximates the variation of data in a bell-shaped distribution Approximately 50% of the data in a bell shaped distribution is within a 0.67 standard deviation of the mean, or µ ± 0.67σ Approximately 68% of the data in a bell shaped distribution is within 1 standard deviation of the mean, or µ ± 1σ Approximately 95% of the data in a bell-shaped distribution lies within 2 standard deviations of the mean, or µ ± 2σ Approximately 99.7% of the data in a bell-shaped distribution lies within 3 standard deviations of the mean, or µ ± 3σ Note, although the normal distribution is convenient and simple to use, other distributions/measures reflecting the ‘uncertainty’ surrounding valuations can be employed

16.17 10.61 14.03 8.26 11.08 8.4 11.91 7.44 14.34 12.44 12.13 10.16 10.2 9.28 10.44 8.91

Recommendation 2 A clear explanation, accompanied by a ‘legend’, of the ‘confidence score’ or ‘confidence level’ ‘Confidence score’ and ‘confidence level’ represent the AVM vendor’s assessment of how accurate, in their view, the AVM valuation is. It provides useful information on how much confidence the AVM vendor has in the resulting valuation figure Each vendor will have their own definition, for example, it could be on a scale of 1-9 or (a,b,c,d,e etc) or some other scale. The basis of the calculations will differ for different AVM vendors Some vendors base their definition on Forecast Standard Deviation (FSD) Recommendation 2 asks for a clear explanation together with a legend showing the range of values

Forecast Standard Deviation (FSD) FSD is a statistical measure which provides the probability that the sale price falls within a range of the estimated AVM value The lower the FSD the smaller the range in within which sale price may lie Example: If the FSD is 10%, there is a 68% (2/3rds) probability that the sale price will fall within +/-10% of the AVM estimate If the AVM returns an estimate of €100,000, there is a 68% chance that the sale price will lie in the range €90,000 and €110,000 The FSDs for the four models are: Target analogy for FSD Scale The FSD tells you, with 68% statistical certainty, into which ring the sale price is likely to fall. Model FSD (%) Model 1 9.5 Model 2 10.9 Model 3 8.8 Model 4 6.5

Evaluating AVM valuations The statistical measures are based on AVM ‘error’ This measures the difference between the AVM Value and the Sale Price error = AVM Value – Sale Price Various accuracy measures using the ‘error’ can be calculated In Europe, appraiser valuations are typically used in place of Sale Price Refer to Table 2 in the Appendix for the different accuracy measures

TABLE 1: Summary of required information 1: The i) 50% and ii) 95% confidence intervals of the AVM valuation  2: A clear explanation, accompanied by a ‘legend’, of the ‘confidence score’ or ‘confidence level’  3: Confirmation that comparable have been used in the AVM valuation. If not, what method was used in the AVM valuation? 4: The standard deviation and the skewness of the comparable sales prices, or appraised values, used in the AVM valuation 5: The AVM model’s overall accuracy, based on the comparable sales sample using: i) Mean Absolute Error ii) Median Absolute Error iii) ‘Error Buckets’ for the percentage of valuations lying within +/- 5%, +/- 10% and +/- 20% of the Sales Price 6: The number and the overall geographic distribution of the comparable used in the AVM valuation 7: The range of comparable sales prices used in the AVM valuation 8: Confirmation of the earliest and most recent sales dates of the comparable used in the AVM valuation 9. If ‘adjusted’ comparable sales prices have been used, explanation of how they were adjusted 10: Confirmation of the Benchmark used in arriving at the figures in 4: and 5 above, sales prices or valuations, in arriving at the overall accuracy figures

Conclusions The recommendations are not about judging the accuracy of AVM models per se They are directed towards making an assessment of the accuracy reported for a specific situation, i.e. the property in question For different properties,different conclusions about accuracy, based on the same model, could result

Appendix Statistical measures

Table 2: Definitions and measures of AVM accuracy What is being measured Absolute Error The difference between the sales price and the original valuation, ignoring whether it’s a positive or negative amount. The value of |AVM Value - Sale Price|. Mean Absolute Error The average of the Absolute Errors. Absolute Error/N, where N = the number of valuations. Absolute Percentage Error Absolute Error as a % Sale Price. 100*Absolute Error/Sale Price. Median Absolute Error The value which splits the errors, such that 50% are less than the middle value and 50% are greater than the middle value. Order the Absolute Errors from the lowest value to highest value, then select the middle value, which is known as the median. (If there are an even number of values take the average the two middle values.) The Absolute Errors are thus split into two groups containing an equal number of errors. Percentage Error 100*(AVM Value – Sale Price)/Sale Price. Mean Percentage Error The average of the Percentage Errors. Percentage Error/N, where N = the number of AVM valuations. Standard Deviation of the Percentage Errors The spread of the Percentage Errors around the Mean Percentage Error. Variance = 𝟏 𝐍 (𝐏𝐞𝐫𝐜𝐞𝐧𝐭𝐚𝐠𝐞 𝐄𝐫𝐫𝐨𝐫−𝐌𝐞𝐚𝐧 𝐏𝐞𝐫𝐜𝐞𝐧𝐭𝐚𝐠𝐞 𝐄𝐫𝐫𝐨𝐫) 𝐍 𝟐 Standard Deviation = 𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞 Forecast Standard Deviation (FSD) A measure of the spread surrounding the AVM valuation figure. This is the ‘uncertainty value’ which surrounds the AVM valuation, based on the model’s standard deviation error. Confidence Interval The range within which the sales price is likely to be. For a 95% Confidence Interval, the range: (AVM Value-1.96*SD to AVM Value+1.96*SD), where SD= standard deviation error. Vendor’s bespoke ‘Confidence Score’ or ‘Confidence Level’ The vendor’s assessment of the how accurate the AVM valuation actually is. Vendors may/will have their own scale which needs to be clearly explained in the report. ‘Error Buckets’: % of AVM valuations that fall within +/-5%, +/-10% and +/- 20% of the sales price The percentage of comparable valuations falling within the Error Buckets sales price ranges.

Mean (Arithmetic Mean) Mean (arithmetic mean) of data values Sample mean Population mean Sample Size Population Size

Mean (Arithmetic Mean) The most common measure of central tendency Affected by extreme values (outliers) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 12 13 14 Mean = 5 Mean = 7

Advantages of using the mean A mean can be calculated for every set of data Every value of each data point has an equal weight in determining the value of the arithmetic mean

Problems with the mean The mean is based on all the observations in the set of data but can be greatly affected by any extreme value or values For example, in the sample 1, 2, 3, 4, 5 and 99 the mean is 19, although 5 of the 6 observations are 5 or below The mean cannot be determined for open-ended frequency distributions without additional information

The Median The median is the middle value in an ordered sequence of data We say that half the observations will be above the median and half below The median is thus unaffected by extreme observations in the data set In cases where there are outliers the median is often the best measure

The Median To compute the median we have to order the data We then define the positioning point formula - (n + 1)/2 This finds the median value in an ordered array The computation depends on whether the sample is an odd number of an even number

The Median Rule for odd samples: Rule for even samples: the median is represented by the numerical value corresponding to the positioning point - the (n + 1)/2 ordered observation Rule for even samples: the positioning point will lie between the two middle observations in the ordered array. The median is then the average of the numerical values corresponding to these middle observations

Example - Odd sample Take an odd sized sample like 24 78 92 115 290 24 78 92 115 290 The positioning point formula gives: (5 + 1)/2 = 3 The median is the third observation or 92

Example - even sample Take an even sized sample such as this: 5.5 9.6 8.4 3.5 7.5 6.5 We order the array as: 5.5 3.5 6.5 7.5 8.4 9.6 The positioning point is (6 + 1)/2 = 3.5 The median thus lies between the third and the fourth observation. We average the 3rd and 4th observation to get (6.5+ 7.5)/2 = 7.0 = median value

The Median Robust measure of central tendency Not affected by extreme values In an ordered array, the median is the “middle” number If n or N is odd, the median is the middle number If n or N is even, the median is the average of the two middle numbers 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Median = 5

Comparing Standard Deviations Data A Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Mean = 15.5 s = .9258 11 12 13 14 15 16 17 18 19 20 21 Data C Mean = 15.5 s = 4.57 11 12 13 14 15 16 17 18 19 20 21

Range Measure of variation Difference between the largest and the smallest observations: Ignores the way in which data are distributed Range = 12 - 7 = 5 Range = 12 - 7 = 5 7 8 9 10 11 12 7 8 9 10 11 12

Variance An important measure of variation Shows variation about the mean Sample variance: Population variance:

Standard Deviation Simply, the square root of the Variance Shows variation about the mean Has the same units as the original data Sample standard deviation: Population standard deviation: Measured in the same units as the mean and is useful for making probability statements when using distributions

The shape of a distribution: skewness Describes the amount of asymmetry in the distribution symmetric or skewed? Skewness Statistic < 0 0 >0