Warsaw Summer School 2014, OSU Study Abroad Program Variability Standardized Distribution.

Slides:



Advertisements
Similar presentations
Normal Distribution Sampling and Probability. Properties of a Normal Distribution Mean = median = mode There are the same number of scores below and.
Advertisements

Descriptive Statistics
Measures of Dispersion
Statistics. Review of Statistics Levels of Measurement Descriptive and Inferential Statistics.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Statistics for the Social Sciences
Types of Measurement Continuous –Underlying continuum Index of amount Intermediate numbers make sense Discreet –Is or is not –Usually use only whole numbers.
Statistical Methods in Computer Science Data 2: Central Tendency & Variability Ido Dagan.
Analysis of Research Data
Introduction to Educational Statistics
Data Transformation Data conversion Changing the original form of the data to a new format More appropriate data analysis New.
Hypothesis Testing CJ 526. Probability Review Review P = number of times an even can occur/ P = number of times an even can occur/ Total number of possible.
Data observation and Descriptive Statistics
Central Tendency and Variability
Chapter 5: z-scores.
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
Measures of Central Tendency
Today: Central Tendency & Dispersion
Measures of Central Tendency
Chapter 3 Statistical Concepts.
Psychometrics.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 16 Descriptive Statistics.
Measures of Central Tendency or Measures of Location or Measures of Averages.
1.3 Psychology Statistics AP Psychology Mr. Loomis.
Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, :269: Research Methods I Dr. Leonard March 31, 2010.
Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
1 Review Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central Measures (mean,
Chapter 5 The Normal Curve. In This Presentation  This presentation will introduce The Normal Curve Z scores The use of the Normal Curve table (Appendix.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Measures of Dispersion & The Standard Normal Distribution 2/5/07.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Descriptive Statistics
Measures of Dispersion & The Standard Normal Distribution 9/12/06.
Skewness & Kurtosis: Reference
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Review Sections Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central.
Quick Review Central tendency: Mean, Median, Mode Shape: Normal, Skewed, Modality Variability: Standard Deviation, Variance.
Measures of Central Tendency or Measures of Location or Measures of Averages.
Thursday August 29, 2013 The Z Transformation. Today: Z-Scores First--Upper and lower real limits: Boundaries of intervals for scores that are represented.
Central Tendency & Dispersion
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
1 Outline 1. Why do we need statistics? 2. Descriptive statistics 3. Inferential statistics 4. Measurement scales 5. Frequency distributions 6. Z scores.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Today: Standard Deviations & Z-Scores Any questions from last time?
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Welcome to… The Exciting World of Descriptive Statistics in Educational Assessment!
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Descriptive Statistics
Basic Statistical Terms
Warsaw Summer School 2017, OSU Study Abroad Program
Advanced Algebra Unit 1 Vocabulary
Biostatistics Lecture (2).
Presentation transcript:

Warsaw Summer School 2014, OSU Study Abroad Program Variability Standardized Distribution

Variability, VR Variability, RV – also known as spread, width, or dispersion – describes how spread out or closely clustered a set of data is. In the textbook the following measures are discussed: the range, the mean deviation, the variance, and the standard deviation. These are good for metric variables. However, we will start with non metric variables (nominal and ordinal).

Variability, VR Nominal variables A comparison of the observed frequency distribution with the uniform distribution (all categories represented by the same number of cases). Index of dissimilarity: DISS = ½ ∑ | p k – u k | for k = 1, 2, 3 … n p k = percentage of observed cases in the category k u k = percentage of cases in the category k under the uniform distribution

Variability, VR Ordinal variables A comparison of the values for all cases with the median value. Absolute deviation from the median VM = ∑ | x ij – Md j | / N where x ij refers to the value of the case i of the variable j and Md j refers to the median of variable j.

Variability, VR Metric variables (interval and ratio) A comparison of the values for all cases with the mean value. Variance VAR = s 2 = ∑ ( x ij – j ) 2 / N where x ij refers to the value of the case i of the variable j and X j refers to the mean of variable j.

Variability, VR Standard deviation Standard Deviation (s) = the square root of the Variance (of s 2 )

Variability, VR Scale Index of dissimilarity Deviation from the median Variance/ standard deviation NominalYesNo OrdinalYes No IntervalYes

Basic characteristics of the distribution Mean = (Σ X i ) / N where X i means “the value for each case,” and Σ means “add all of these up” and i refers to all cases from i = 1 to i = N Standard deviation ____ ____ S = √ Var = √ s 2 VAR = s 2 = ∑ (X i – Mean of X) 2 / N

Z-scores Definition: a z-score measures the difference between a raw value (a variable value X i ) and the mean using the standard deviation of the distribution as the unit of measure. Z score = (Value for a given case – Mean) / Standard deviation ___ Z score (X i ) = X i – Mean for X / √ s 2

Z-scores A z-score specifies the precise location of each value Xi within a distribution. The sign of the z-score (+ or -) signifies whether the score is above the mean (positive) or below the mean (negative). The numerical value of the z-score specifies the distance from the mean expressed in terms of the proportion of the standard deviation.

Z-scores Raw-values and z-scores 1. Shape. The shape of the z-score distribution is the same as the shape of the raw-score distribution. 2. The mean. When raw scores are transformed into z-scores, the resulting z-score distribution will always have a mean of zero. This fact makes the mean a convenient reference point. 3. The standard deviation. When raw scores are transformed into z-scores, the resulting z-score distribution will always have a standard deviation of one.

Shape Three properties of the distribution’s shape (metric variables): MODALITY KURTOSIS SKEWNESS

Shape MODALITY: a number of frequency peaks. Unimodal: one clear frequency peak. Bimodal: two clear frequency peaks. Mutimodal: three or more clear frequency peaks.

Shape KURTOSIS: Peakedness of a distribution Mesokurtosis (a “normal” distribution): a moderate peakedness. Platykurtosis (a flat distribution): a fairly flat lump of values in the center. Leptokurtosis (a peaked distribution): a high peak of values in the center.

Shape SKEWNESS: Departure from symmetry in a distribution. The way skewness depends on the rightmost or leftmost extreme scores, called the tails of the distribution. Zero skewness: the right tail and the left tail are symmetric. Positive skewness: the right tail contains extreme (far from symmetric) scores. Negative skewness: the left tail contains extreme (far from symmetric) scores.

CT = central tendency, VR = variability How do CT and VR help us to assess the distribution’s shape? Modality: Comparison of the mode with the frequency of other values informs us whether we have one clear frequency peak, or two picks, or more than two picks.

Multimodal distribution

Kurtosis Kurtosis: Comparison of the standard deviation (SD) with the mean (M) shows whether we have picked distribution of not: SD >> M = peaked distribution SD << M = flat distribution A >> B, A is many times greater than B A << B, A is many times smaller than B

Kurtosis

Skewness Skewness: - Whenever Mean > Median > Mode  positively skewed distribution. - Whenever Mean < Median < Mode  negatively skewed distribution. - Whenever Mean = Median = Mode  normal distribution.

Skewness

Normal distribution Normal distribution: –Unimodal –Mesokurtosic (a moderate peakedness) –Symetric (zero skewness)

Normal distribution

Normal distribution refers to the distribution in which the mean and standard deviation are given and the shape of the distribution is derived from a mathematical equation. This distribution is: (1) unimodal, (2) symmetric, (3) mesokurtic.

Normal distribution The normal distribution has been known by many different names: the law of error, the law of facility of errors, or Gaussian law. The name “normal distribution” was coined by Galton. The term was derived from the fact that this distribution was seen as typical, common, normal.

Normal distribution Why does the normal distribution is important in science? There are two main reasons: (1) It provides a tool for analyzing data (in descriptive statistics) (2) It provides a tool for deciding about errors that we might commit in testing hypotheses (in inferential statistics).

Normal distribution Every normal curve (regardless of its mean or standard deviation) conforms to the following rule: About 68% of the area under the curve falls within 1 standard deviation of the mean. About 95% of the area under the curve falls within 2 standard deviations of the mean. About 99.7% of the area under the curve falls within 3 standard deviations of the mean.

Normal distribution

If we know that a given metric variable is normally distributed, we know also a lot about the place of particular values in this distribution. Let assume that we measure IQ of the large population and we obtain the distribution that it is unimodal, symmetric, mesokurtic. The results are that the mean value = 100, and the standard deviation = 15

Normal distribution How far apart are two people A and B, where A = 115, and B = 99. The difference, 16 points, tell us only little about the meaning of this result. The important information is how many people (in terms of percentage of all) have the values between 99 and 115 Note: 99 is close to the mean and 115 is one standard deviation above the mean. Let’s look at the graph.

Normal distribution

Consider a person C who also differ from B by one standard deviation but in plus, C = 130. What percentage of people is between them? For a given normal distribution, for any pair of people K, L, we can say what percentage of people is between them, i.e., has values of the variable X K > X M < X L, for X K < X L.

Z-scores For finding points in the normal distribution the mean value and the standard deviation are crucial. Z score = (Value – Mean) / Standard deviation

Z-scores Why is the knowledge about z-scores so important? For two reasons: - first, in evaluating individual scores we rely on deviations from the average; - second, in evaluating individual scores we want to take into account how the scores are spread out.

Probability The second use of the normal curve deals with deciding about errors that we might commit in testing hypotheses. It is about probabilities of committing errors.

Two important properties: The probability that X is greater than a equals the area under the normal curve bounded by a and plus infinity (non-shaded area). The probability that X is less than a equals the area under the normal curve bounded by a and minus infinity (shaded area).

For probability distribution of the observed variable we use μ for the mean, and σ for standard deviation (s). Standardized normal probability distribution is expressed by z-scores: z = (X i - μ) / σ The idea of standard scores integrates our knowledge of central tendency (μ) and variability (σ).

Appendix shows the proportion of the area above and below the z-score. - Column A = z-score - Column B = area between mean and z (proportion) - Column C = area beyond z (proportion) Note: Column B + C =.5000 A B C