Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.

Slides:



Advertisements
Similar presentations
Chapter 7 Hypothesis Testing
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 15)
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Business Statistics for Managerial Decision
Chapter 10: Hypothesis Testing
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #19 3/8/02 Taguchi’s Orthogonal Arrays.
Business Statistics for Managerial Decision
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #20 3/10/02 Taguchi’s Orthogonal Arrays.
Inferences About Process Quality
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Inferential Statistics
Statistics Introduction 1.)All measurements contain random error  results always have some uncertainty 2.)Uncertainty are used to determine if two or.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Fundamentals of Data Analysis Lecture 7 ANOVA. Program for today F Analysis of variance; F One factor design; F Many factors design; F Latin square scheme.
Statistical Analysis Statistical Analysis
Determining Sample Size
Tests of significance: The basics BPS chapter 15 © 2006 W.H. Freeman and Company.
14. Introduction to inference
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.2.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Chapter 8 Introduction to Hypothesis Testing
Significance Tests: THE BASICS Could it happen by chance alone?
Statistics and Quantitative Analysis Chemistry 321, Summer 2014.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Introduction to inference Tests of significance IPS chapter 6.2 © 2006 W.H. Freeman and Company.
Essential Statistics Chapter 131 Introduction to Inference.
Approximate letter grade assignments ~ D C B 85 & up A.
1 MARKETING RESEARCH Week 5 Session A IBMS Term 2,
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Section 10.1 Confidence Intervals
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Introduction to inference Tests of significance IPS chapter 6.2 © 2006 W.H. Freeman and Company.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
PCB 3043L - General Ecology Data Analysis.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Course Overview Collecting Data Exploring Data Probability Intro. Inference Comparing Variables Relationships between Variables Means/Variances Proportions.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Chapter 7: The Distribution of Sample Means
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Chapter 10: The t Test For Two Independent Samples.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Step 1: Specify a null hypothesis
ESTIMATION.
PCB 3043L - General Ecology Data Analysis.
Tests of significance: The basics
Essential Statistics Introduction to Inference
Presentation transcript:

Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement

Programm for today F untypical observations; F deviating results - how to detect and eliminate; F processing of data sets;

Introduction Measurement is an experiment performed by appropriate methods, using appropriate tools, organized in an appropriate system. The measurement can also be seen as the process of obtaining information about the object measured. We need to remember about the three aspects of the signal: the signal content (borne of information), the signal carrier (this is the previously mentioned phenomenon or object) and the signal code (ie, how the assignment forms of information media characteristics).

Introduction We understand the precision of measurement as the ability of measurements to detect the real effects of interactions. Overall, we can say that the experiment is more precise, the smaller the differences in the effects of impact he can detect. The greater the variation in the quantity measured by the same impact, the greater is the error associated with the difference between the two averages and the experiment is less precise in terms of detecting variations in the measured value caused by impacts.

Introduction Precision of the measurement which should be aimed in an experiment depends on its purpose. In general, this statement is true, but in many physical experiments, especially when measuring the fundamental physical quantity, we can not predict what will ultimately sufficient precision. It should strive to achieve the highest accuracy allowed by the phenomenon investigated and available measurement technique.

Introduction Using simple data sets is an important operation for many engineers and scientists. The data should be reviewed and verified for errors that occur in them, and compliance of values. In many cases it is necessary to answer the question of whether all the values ​​ are equal, or whether there are any results strongly deviating from the others.

Untypical observations n With a data set consisting of a group of measurements that are ideally the same value, it is difficult to say how much should be different individual values ​​ in order to be recognized as "foreign" and not as extreme deviations from the mean. n It is suspicious when, after ordering from the smallest to the largest value, one or both of the extreme values ​​ differ significantly from the average. A similar situation is when we find points on the graph significantly different from the smooth curve.

Untypical observations Deviating measurement values ​​ are still interesting for another reason - they can indicate the people carrying out the measurements both errors, awkwardness and failures of measuring equipment. On this basis it is possible to determine how to improve the measurement system or the measurement methodology. Each occurrence of deviating value ​​ should produce a critical review the entire measurement process, which provided an incorrect result. The first step is, of course, check calculations, later transcription errors and decoding, and only at the end search for possible damage in the measurement system.

Thick Error Law If we know what should be the value of the standard deviation σ for our measurements, we can easily determine which of measurements should be rejected using the calculation for the point value of the expression: where x i - " suspicious” value. If we obtained M > 4, such point should be rejected.

Thick Error Law Law of thick error is simplified (coarse) t-test, for which the confidence level can be estimated as not less than for the proper determination of the standard deviation, in other cases will be closer to This test is rarely used, usually when no other procedures. Often, the standard deviation in this case is calculated after rejection of all suspicious points, and then the verification is performed.

Thick Error Law This test is particularly useful when preparing a graph of the measurement results set. After rejecting the bad points we can graphically or by the method of least squares lead the curve best mapping the location of measurement points. Standard deviations for the drawn points or curve drawn between them should be allowed to determine whether the rejected points are in the range M <4. If so, we need to enable them to set and the curve.

Testing of data sets - Dixon’s test This test is also used to find abnormal measurement values ​​ in measurements series and is easy to use because of the simplicity of the calculations necessary to perform. It is assumed that we do not know mean value ​​ of the measured series or the standard deviation and the set of measured values ​​ is our only source of information. However, we accept the assumption that these values ​​ are normally distributed.

Testing of data sets - Dixon’s test To perform the Dixons test the measurement points are arranged from smallest to largest, then for assumed level of confidence is calculated Dixons critical ratio r, whereby depending on the number of measurement points in a series different equations are used. Then check in tables critical value r cr for the assumed level of confidence, and if the calculated value is greater than the critical value shown in the table, remove the suspect point.

Testing of data sets - Dixon’s test

After the rejection of point, make sure that not even a point is suspect, even though the occurrence erroneous points twice in the series of measurements is unlikely. With larger number of points significantly different from other measurements according to Dixons test indicates the use of the wrong method of measurement. However, always keep in mind that the data represent one's time and money, and can not be lightly cast aside, although the use of tests is quick and easy.

Testing od data sets – Grubbs’ test The procedure is as follows : 1) we sort the measurement points from the smallest to the largest ; 2) decide that point: the first or the last is suspect ; 3) attempts to calculate the average value m and standard deviation s using all the data ; 4) parameter T is calculated as follows : a) when the point x 1 is suspect : b when the point x n is suspect:

Testing od data sets – Grubbs’ test 5) choose the confidence level for the test and comparing the calculated value with the critical value T kr given in Tables. If the calculated value exceeds the critical level, remove the measuring point of the series.

Testing of data sets – Grubbs’ test This test can be used if it is suspected one of the values ​​, the smallest or the largest, and if both values x 1 and x n ​​ are suspected we should use a different version of this test. Instead of parameter T we calculate the gap R = x n - x 1 and standard deviation σ and compared with the corresponding tables of critical values of ratio R/ . If the calculated value is greater than the critical one, we must reject both suspect values.

Testing of data sets – Grubbs’ test In the event that after ordering we find that there are two extremes suspicious values, we can use yet another variant of Grubbs test. This time, we calculate the sum of squared deviations from the mean for the entire sample, with suspected values: and without it:

Testing of data sets – Grubbs’ test Then we calculate the ratio of these values: S /S 2 The obtained results are compared with the corresponding arrays. This time, both values had to be eliminated when a critical value (for a given confidence level) is lower than the calculated value.

To be continued !