1 Local and Global Scores in Selective Editing Dan Hedlin Statistics Sweden.

Slides:



Advertisements
Similar presentations
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 6 Point Estimation.
Advertisements

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: asymptotic properties of estimators: plims and consistency Original.
Statistics 1: Introduction to Probability and Statistics Section 3-3.
Binomial Random Variables. Binomial experiment A sequence of n trials (called Bernoulli trials), each of which results in either a “success” or a “failure”.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Distance Measures Tan et al. From Chapter 2.
Numerically Summarizing Data
Slides by JOHN LOUCKS St. Edward’s University.
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Eg Al is buying some cows and sheep for his farm. He buys c cows at £120 each He buys s sheep at £200 each. He wants at least 10 animals in total. He wants.
Incomplete Block Designs
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
VARIABILITY. PREVIEW PREVIEW Figure 4.1 the statistical mode for defining abnormal behavior. The distribution of behavior scores for the entire population.
Distance Measures Tan et al. From Chapter 2. Similarity and Dissimilarity Similarity –Numerical measure of how alike two data objects are. –Is higher.
Solver & Optimization Problems n An optimization problem is a problem in which we wish to determine the best values for decision variables that will maximize.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Richard W. Hamming Learning to Learn The Art of Doing Science and Engineering Session 9: n–Dimensional Space Learning to Learn The Art of Doing Science.
Estimation Basic Concepts & Estimation of Proportions
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
1 Review Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central Measures (mean,
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Describing distributions with numbers
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
13-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 13 Measures.
1 Review Sections Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
The Greedy Method. The Greedy Method Technique The greedy method is a general algorithm design paradigm, built on the following elements: configurations:
Chapter 2: Getting to Know Your Data
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
1 Chapter 4 Numerical Methods for Describing Data.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Power Spectral Estimation
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (2)
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
1 1 Slide © 2003 Thomson/South-Western. 2 2 Slide © 2003 Thomson/South-Western Chapter 3 Descriptive Statistics: Numerical Methods Part B n Measures of.
Euclidean space Euclidean space is the set of all n-tuples of real numbers, formally with a number called distance assigned to every pair of its elements.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
Measures of Variability Variability. Measure of Variability (Dispersion, Spread) Variance, standard deviation Range Inter-Quartile Range Pseudo-standard.
Triangle Inequality Theorem The sum of the lengths of any two sides of a triangle is greater than the length of the third side.
Chapter 3 The Real Numbers.
Measures of Dispersion
Lecture 2-2 Data Exploration: Understanding Data
Descriptive Statistics (Part 2)
Inference for the mean vector
Similarity and Dissimilarity
Geometry – Part II Feb 2nd , 2017
Triangle Inequalities
Clustering and Multidimensional Scaling
Sampling Distribution
Sampling Distribution
Model Comparison: some basic concepts
POINT ESTIMATOR OF PARAMETERS
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Triangle Inequalities
Linear Model Selection and regularization
Measures of Dispersion (Spread)
One-Way Analysis of Variance
Determination of Sample Size
Measures of Variability
Presentation transcript:

1 Local and Global Scores in Selective Editing Dan Hedlin Statistics Sweden

2 Local score Common local (item) score for item j in record k: w k design weight predicted value z kj reported value  j standardisation measure

3 Global score What function of the local scores to form a global (unit) score? The same number of items in all records p items, j = 1, 2, … p Let a local score be denoted by  kj … and a global score by

4 Common global score functions In the editing literature: Sum function: Euclidean score: Max function:

5 Farwell (2004): ”Not only does the Euclidean score perform well with a large number of key items, it appears to perform at least as well as the maximum score for small numbers of items.”

6 Unified by… Minkowski’s distance Sum function if  = 1 Euclidean  = 2 Maximum function if   infinity

7 NB extreme choices are sum and max Infinite number of choices in between = 20 will suffice for maximum unless local scores in the same record are of similar size

8 Global score as a distance The axioms of a distance are sensible properties such as being non-negative Also, the triangle inequality Can show that a global score function that does not satisfy the triangle inequality yields inconsistencies

9 Hence a global score function should be a distance Minkowski’s distance appears to be adequate for practical purposes Minkowski’s distance does not satisfy the triangle inequality if < 1 Hence it is not a distance for < 1

10 Parametrised by Advantages: unified global score simplifies presentation and software implementation Also gives structure: orders the feasible choices …from smallest: = 1 …to largest: infinity

11 Turning to geometry…

12 Sum function = City block distance p = 3, ie three items

13 Euclidean distance

14 Supremum (maximum, Chebyshev’s) distance

15 Imagine questionnaires with three items Record k Euclidean distance

16

17 The Euclidean function, two items A sphere in 3D Threshold 

18 The max function A cube in 3D Same threshold 

19 The sum function An octahedron in 3D

20

21 The sum function will always give more to edit than any other choice, with the same threshold

22 Three editing situations 1.Large errors remain in data, such as unit errors 2.No large errors, but may be bias due to many small errors in the same direction 3.Little bias, but may be many errors

23 Can show that if… 1.Situation 3 2.Variance of error is 3.Local score is Then the Euclidean global score will minimise the sum of the variances of the remaining error in estimates of the total

24 Summary Minkowski’s distance unifies many reasonable global score functions Scaled by one parameter The sum and the maximum functions are the two extreme choices The Euclidean unit score function is a good choice under certain conditions