Data observation and Descriptive Statistics

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

A.k.a. “bell curve”.  If a characteristic is normally distributed in a population, the distribution of scores measuring that characteristic will form.
Introduction to Summary Statistics
Measures of Central Tendency& Variability.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
BHS Methods in Behavioral Sciences I April 18, 2003 Chapter 4 (Ray) – Descriptive Statistics.
Statistics for the Social Sciences
Types of Measurement Continuous –Underlying continuum Index of amount Intermediate numbers make sense Discreet –Is or is not –Usually use only whole numbers.
Lecture 2 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Descriptive Statistics Chapter 3 Numerical Scales Nominal scale-Uses numbers for identification (student ID numbers) Ordinal scale- Uses numbers for.
PSY 307 – Statistics for the Behavioral Sciences
Descriptive Statistics
Analysis of Research Data
Intro to Descriptive Statistics
Introduction to Educational Statistics
Standard Scores & Correlation. Review A frequency curve either normal or otherwise is simply a line graph of all frequency of scores earned in a data.
SHOWTIME! STATISTICAL TOOLS IN EVALUATION DESCRIPTIVE VALUES MEASURES OF VARIABILITY.
Central Tendency and Variability
Measures of Central Tendency
Measures of Central Tendency
CHAPTER 2 Percentages, Graphs & Central Tendency.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3 Statistical Concepts.
EPE/EDP 557 Key Concepts / Terms –Empirical vs. Normative Questions Empirical Questions Normative Questions –Statistics Descriptive Statistics Inferential.
Psychometrics.
Statistics and Research methods Wiskunde voor HMI Betsy van Dijk.
Part II Sigma Freud & Descriptive Statistics
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Chapters 1 & 2 Displaying Order; Central Tendency & Variability Thurs. Aug 21, 2014.
Statistical Tools in Evaluation Part I. Statistical Tools in Evaluation What are statistics? –Organization and analysis of numerical data –Methods used.
Smith/Davis (c) 2005 Prentice Hall Chapter Six Summarizing and Comparing Data: Measures of Variation, Distribution of Means and the Standard Error of the.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Warsaw Summer School 2014, OSU Study Abroad Program Variability Standardized Distribution.
Measures of Dispersion & The Standard Normal Distribution 2/5/07.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Descriptive Statistics
Measures of Dispersion & The Standard Normal Distribution 9/12/06.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Measures of Dispersion
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
Chapter 2 Describing and Presenting a Distribution of Scores.
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Chapter Six Summarizing and Comparing Data: Measures of Variation, Distribution of Means and the Standard Error of the Mean, and z Scores PowerPoint Presentation.
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Descriptive Statistics
Description of Data (Summary and Variability measures)
Descriptive Statistics
Presentation transcript:

Data observation and Descriptive Statistics We organize data so that it is easer to read and understand.

Organizing Data Frequency distribution Table that contains all the scores along with the frequency (or number of times) the score occurs. Relative frequency: proportion of the total observations included in each score.

Frequency distribution Amount f(frequency) rf(relative frequency) $0.00 2 0.125 $0.13 1 0.0625 $0.93 $1.00 $10.00 $32.00 $45.53 $56.00 $60.00 $63.25 $74.93 $80.00 $85.28 $115.35 $120.00 n=16 1.00

Organizing data Class interval frequency distribution Scores are grouped into intervals and presented along with frequency of scores in each interval. Appears more organized, but does not show the exact scores within the interval. To calculate the range or width of the interval: (Highest score – lowest score) / # of intervals Ex: 120 – 0 / 5 = 24

Class interval frequency distribution f (frequency) rf ( relative frequency) $0-$24 6 .375 $25-$48 2 .125 $49-$73 3 .1875 $74-$98 $99-$124 n = 16 1.00

Graphs Bar graphs Data that are collected on a nominal scale. Qualitative variables or categorical variables. Each bar represents a separate (discrete) category, and therefore, do not touch. The bars on the x-axis can be placed in any order.

Bar Graph

Graphs Histograms To illustrate quantitative variables Scores represent changes in quantity. Bars touch each other and represent a variable with increasing values. The values of the variable being measured have a specific order and cannot be changed.

Histogram Notice that the values on the x-axis have a specific order and cannot be rearranged.

Frequency polygon Line graph for quantitative variables Represents continuous data: (time, age, weight)

Frequency Polygon AGE 22.06 24.05 25.04 25.07 26.03 26.11 27.03 27.11 29.03 29.05 34 37.1 53 Make graph in class Y – axis: frequency X – axis: the scores, plot them with points and then connect the points.

Descriptive Statistics Numerical measures that describe: Central tendency of distribution Width of distribution Shape of distribution

Central tendency Describe the “middleness” of a data set Mean Median Mode

_ Mean Arithmetic average Used for interval and ratio data Formula for population mean ( µ pronounced “mu”) µ = ∑ X _____ N Formulas for sample mean _ X = ∑ X _____ n

rf(relative frequency) Mean Amount f(frequency) rf(relative frequency) $0.00 2 0.125 $0.13 1 0.0625 $0.93 $1.00 $10.00 $32.00 $45.53 $56.00 $60.00 $63.25 $74.93 $80.00 $85.28 $115.35 $120.00 $46.53 n=16

Mean Not a good indicator of central tendency if distribution has extreme scores (high or low). High scores pull the mean higher Low scores pull the mean lower

Median Middle score of a distribution once the scores are arranged in increasing or decreasing order. Used when the mean might not be a good indicator of central tendency. Used with ratio, interval and ordinal data.

Median $0.00 $0.13 $0.93 $1.00 $10.00 $32.00 $45.53 $56.00 $60.00 $63.25 $74.93 $80.00 $85.28 $115.35 $120.00

Mode The score that occurs in the distribution with the greatest frequency. Mode = 0; no mode Mode = 1; unimodal Mode = 2; bimodal distribution Mode = 3; trimodal distribution

rf(relative frequency) Mode Amount f(frequency) rf(relative frequency) $0.00 2 0.125 $0.13 1 0.0625 $0.93 $1.00 $10.00 $32.00 $45.53 $56.00 $60.00 $63.25 $74.93 $80.00 $85.28 $115.35 $120.00 $46.53 n=16

Measures of Variability Range From the lowest to the highest score Variance Average square deviation from the mean Standard deviation Variation from the sample mean Square root of the variance

Measures of Variability Indicate the degree to which the scores are clustered or spread out in a distribution. Ex: Two distributions of teacher to student ratio. Which college has more variation? College A College B 4 16 12 19 41 22 Sum = 57 Mean = 19

Range The difference between the highest and lowest scores. Examples: Provides limited information about variation. Influenced by high and low scores. Does not inform about variations of scores not at the extremes. Examples: Range = X(highest) – X (lowest) College A: range = 41- 4 = 37 College B: range = 22-16 = 6

Variance Limitations of range require a more precise way to measure variability. Deviation: The degree to which the scores in a distribution vary from the mean. Typical measure of variability: standard deviation (SD) Variance The first step in calculating standard deviation

Variance X = Number of therapy sessions each student attended. M = 4.2 “Deviation” Sum of deviations = 0

Variance In order to eliminate negative signs, we square the deviations. Sum the deviations = sum of squares or SS

Variance SD2 = Σ(X-M)2 N Take the average of the SS Ex: SS = 48.80 That is the average of the squared deviations from the mean SD2 = 9.76

____ √ Standard Deviation Standard deviation Typical amount that the scores vary or deviate from the sample mean SD = Σ(X-M)2 N That is, the square root of the variance Since we take the square root, this value is now more representative of the distribution of the scores. ____ √

Standard Deviation X = 1, 2, 4, 4, 10 M = 4.2 SD = 3.12 (standard deviation) SD2 = 9.76 (variance) Always ask yourself: do these data (mean and SD) make sense based on the raw scores?

Population Standard Deviation The average amount that the scores in a distribution vary from the mean. Population standard deviation: (σ pronounced “sigma”) √ ____ σ = ∑( X - µ ) ² _________ N

Sample Standard Deviation Sample is a subset of the population. Use sample SD to estimate population SD. Because samples are smaller than populations, there may be less variability in a sample. To correct for this, we divide the sample by N – 1 Increases the standard deviation of the sample. Provides a better estimate of population standard deviation. When we run experiments, we want to make sure the our results can generalize to the population at large. This also goes for the statistical procedures that we perform on the data of our sample. Differences in formulas from pop. to sample SD: Sigma is now “s” Mu is now “ X bar” and divide by N – 1 instead of N √ σ = ∑( X - µ ) ² _________ N √ s = ∑( X - X ) ² _________ N - 1 Unbiased Sample estimator standard deviation Population standard deviation

Sample Standard Deviation X X - mean X - mean squared $0.00 -$46.53 $2,165.04 $0.13 -$46.40 $2,152.96 $0.93 -$45.60 $2,079.36 $1.00 -$45.53 $2,072.98 $10.00 -$36.53 $1,334.44 $32.00 -$14.53 $211.12 $45.53 -$1.00 $56.00 $9.47 $89.68 $60.00 $13.47 $181.44 $63.25 $16.72 $279.56 $74.93 $28.40 $806.56 $80.00 $33.47 $1,120.24 $85.28 $38.75 $1,501.56 $115.35 $68.82 $4,736.19 $120.00 $73.47 $5,397.84 $46.53 N = 16 SS = $26,295.02 The standard deviation tells us that the amount of money you guys had falls an average of $41.87 dollars from the mean of $46.53 Variance = $1753 SD = $41.87

Types of Distributions Refers to the shape of the distribution. 3 types: Normal distribution Positively skewed distribution Negatively skewed distribution

Normal Distribution Normal distributions: Specific frequency distribution Bell shaped Symmetrical Unimodal Most distributions of variables found in nature (when samples are large) are normal distributions. A true normal distribution is a theoretical term – it doesn’t exist in the real world. So, we use the term “approximate” when describing our results. when the distribution of scores is very large and the scores are plotted on a line graph, the distribution tends to approximate a normal distribution.

Normal Distribution Mean, media and mode are equal and located in the center.

Normal Distribution

Skewed distributions When our data are not symmetrical Positively skewed distribution Negatively skewed distribution Memory hint: skew is where the tail is; also the tail looks like a skewer and it points to the skew (either positive or negative direction)

Skewed Distributions Notice how mean is pulled by the extreme scores.

Kurtosis Kurtosis - how flat or peaked a distribution is. Tall and skinny versus short and wide Mesokurtic: normal Leptokurtic: tall and thin Platykurtic: short and fat (squatty like a platypus!)

Kurtosis leptokurtic platykurtic mesokurtic Mesokurtic – have the peaks of medium height Leptokurtic – tall and thin with only a few scores in the middle Platykurtic – short and broader mesokurtic

Skewness, Number of Modes, and Kurtosis in Distribution of Housing Prices

z - Scores In which country (US vs. England) is Homer Simpson considered overweight? How can we make this comparison? Need to convert weight in pounds and kilograms to a standardized scale. Z- scores: allow for scores from different distributions to be compared under standardized conditions. The need for standardization Putting two different variables on the same scale z-score: Transforming raw scores into standardized scores z = (X - µ) σ Tell us the number of standard deviations a score is from the mean. So far we know how to describe how spread a distribution is and its shape. However, we might want to describe how an individual’s score within a distribution compares to the rest of the distribution.

z- Scores Class 1: M = $46.53 SD = $41.87 X = $54.76 In which class did I have more money in comparison to the distribution of the other students? Sample z-score: z = (X - M) s When we convert raw scores from different distributions to z-scores, these scores become part of the same z distribution and we can compare scores from different distributions. Let’ say that I asked my other class to do the same type of exercise and to provide me the amount of $ they had in their pocket on the same day that we did that in this class. Because the mean of each class is going to be different, I need to convert how much money I had to a standard measure that will allow me to compare both of my scores directly. That is, I need to convert my raw-scores from each class into the same “language” so that I can compare them properly. So first, I need to convert my raw-scores into a z-score: which is the measure of how many standard deviations my raw score is from the mean of the distribution. Class 1 : z = .20 Class 2 : z = 1.94 Z scores are used to transform raw scores to standard scores for the purposes of comparisons.

z Distribution Characteristics: (regardless of the original distributions) z score at the mean equals 0 Standard deviation equals 1

z distribution of exam scores

Standard normal distribution If a z-distribution is normal, then we refer to it as a standard normal distribution. Provides information about the proportion of scores that are higher or lower than any other score in the distribution. In a normal distribution, the z-score at the mean is 0 and the standard deviation is +1, derived theoretically. This area under the curve represents the total proportion of scores in this distribution. 50% of the scores are above the mean and 50% of scores are below the mean. 34% of scores fall between 0 and 1 standard deviations above the mean. 47% of scores fall between 0 and 2 standard deviations above the mean Because a normal curve is symmetrical, the same is true for scores that fall below the mean. So, 34% of scores fall between the mean and 1 SD below the mean.

Standard Normal Curve Table Standard normal curve table (Appendix A) Statisticians provided the proportion of scores that fall between any two z-scores. What is the percentile rank of a z score of 1? Percentile rank = proportion of scores at or below a given raw score. Ex: SAT score = 1350 M = 1120 s = 340 75th percentile Statisticians have figured out the proportion of score that fall between any 2 z-scores. The first column of the table is the z-score. The second column is the proportion of scores that fall between the mean and the Z score. The third column is the proportion of scores that fall between the z-score and beyond.

Percentile Rank The percentage of scores that your score is higher than. 89th percentile rank for height You are taller than 89% of the students in the class. (you are tall!) Homer Simpson: 4th percentile rank for intelligence. he is smarter than 4% of the population (or 96% of the population is smarter than Homer). GRE score: 88th percentile rank Reading scores of grammar school: 18th percentile rank Examples using Standard normal curve table From class data, we got the z-scores for each of your raw scores. Let use your z-scores to learn how to use this table. By saying that you did better than % of the class refers to your percentile rank Try to work backwards from percentile rank to get to a z-score. Figure out the proportion that is under the curve, see the corresponding z-score and apply to formula given the SD and mean of class data.

Review Data organization Descriptive statistics Z- scores Frequency distribution, bar graph, histogram and frequency polygon. Descriptive statistics Central tendency = middleness of a distribution Mean, median and mode Measures of variation = the spread of a distribution Range, standard deviation Distributions can be normal or skewed (positively or negatively). Z- scores Method of transforming raw scores into standard scores for comparisons. Normal distribution: mean z-score = 0 and standard deviation = 1 Normal curve table: shows the proportions of scores below the curve for a given z-score.