Basic Concepts Chapter 1 College of Education and Health Professions.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Describing Quantitative Variables
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Section #1 October 5 th Research & Variables 2.Frequency Distributions 3.Graphs 4.Percentiles 5.Central Tendency 6.Variability.
Descriptive Statistics
Exploring Assumptions
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
FINDING A PROPER STATISTICAL DISTRIBUTION FOR DATASET
Statistics for the Social Sciences
Copyright (c) Bani Mallick1 Lecture 2 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #2 Population and sample parameters More on populations.
Descriptive Statistics
Methods and Measurement in Psychology. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Analysis of Research Data
Those who don’t know statistics are condemned to reinvent it… David Freedman.
Introduction to Educational Statistics
1.2: Describing Distributions
Data observation and Descriptive Statistics
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Today: Central Tendency & Dispersion
Chapter 2 Describing Data with Numerical Measurements
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Describing distributions with numbers
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Formula Compute a standard deviation with the Raw-Score Method Previously learned the deviation formula Good to see “what's going on” Raw score formula.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Statistical Tools in Evaluation Part I. Statistical Tools in Evaluation What are statistics? –Organization and analysis of numerical data –Methods used.
Smith/Davis (c) 2005 Prentice Hall Chapter Six Summarizing and Comparing Data: Measures of Variation, Distribution of Means and the Standard Error of the.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Measures of Dispersion
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Table of Contents 1. Standard Deviation
Chapter 1 Measurement, Statistics, and Research. What is Measurement? Measurement is the process of comparing a value to a standard Measurement is the.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Skewness & Kurtosis: Reference
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
NORMAL DISTRIBUTION AND ITS APPL ICATION. INTRODUCTION Statistically, a population is the set of all possible values of a variable. Random selection of.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Practice Page 65 –2.1 Positive Skew Note Slides online.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Chapter 2 Describing and Presenting a Distribution of Scores.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Descriptive Statistics ( )
Chapter Six Summarizing and Comparing Data: Measures of Variation, Distribution of Means and the Standard Error of the Mean, and z Scores PowerPoint Presentation.
Different Types of Data
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Practice Page Practice Page Positive Skew.
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Chapter 1 Warm Up .
Chapter 1: Exploring Data
Presentation transcript:

Basic Concepts Chapter 1 College of Education and Health Professions

Statistical Inference A Population is a group with a common characteristic. A population is usually large and it is difficult to measure all members. To make inference about a population we take a representative sample (RANDOM). In a random sample each member of the population is equally likely to be selected. A sample cannot accurately represent the population unless it is drawn without BIAS. In a bias free sample selection of one member does not affect to selection of future subjects.

Types of Variables Continuous variable – can assume any value (ht, wt) Discrete variable – limited to certain values: integers or whole numbers (2.5 children?)

Level of Measurement Nominal Scale: mutually exclusive (male, female) Ordinal Scale: gives quantitative order to the variable, but it DOES NOT indicate how much better one score is than another (pain of 2 is not twice of 1) Interval Scale: has equal units and zero is not an absence of the variable (temperature) Ratio Scale: based on order, has equal distance between scale points, and zero is an absence of value

Independent & Dependent Variables Independent Variable: the variable being controlled (gender, group, class). Dependent Variable: NOT free to vary (math score, height, weight). The INDEPENDENT VARIABLE is controlled by the researcher – Effects of exercise on body fat. – Effects of type of instruction on learning. The DEPENDENT VARIABLE is the variable being studied – Body fat – learning

Parameters and Statistics

Describing and Exploring Data Chapter 2 College of Education and Health Professions

A frequency distribution organizes the data in a logical order. Most of the scores are between 46 – 67.

Histogram with the normal curve superimposed.

Measures of Central Tendency Mean, Median and Mode Describe the middle or central characteristics of the data. The mode is the most frequent score. The median is the middle score. In a normal distribution the mean, median and mode are the near the same score.

Measures of Variability Range (lowest-highest). Suffers reliance on extreme scores. Interquartile Range: middle 50% of the scores. Variance: sum of squared deviations from the mean divided by (N-1), [data in squared units] Standard Deviation: square root of the variance [data in original units].

Formulas for Mean, Variance and SD Mean Variance Standard Deviation

Degrees of Freedom Df is the number of things that are free to vary. For Example – The sum of three numbers is 10. The df is two, you can pick any two numbers. Once you pick the first two numbers the third number is fixed, it is not free to vary ? = 10

SPSS Box Plots The top of the box is the upper fourth or 75th percentile. The bottom of the box is the lower fourth or 25th percentile. 50 % of the scores fall within the box or interquartile range. The horizontal line is the median. The ends of the whiskers represent the largest and smallest values that are not outliers. An outlier, O, is defined as a value that is smaller (or larger) than 1.5 box-lengths. An extreme value, E, is defined as a value that is smaller (or larger) than 3 box-lengths. Normally distributed scores typically have whiskers that are about the same length and the box is typically smaller than the whiskers.

A normal distribution is symmetrical, uni-modal and mesokurtic. Playturtic is flat. Leptokurtic is peaked.

Tests for Normality

–Not less than 0.05 so the data are normal.

Tests for Normality: Normal Probability Plot or Q-Q Plot –If the data are normal the points cluster around a straight line

Tests for Normality: Boxplots –Bar is the median, box extends from 25 – 75 th percentile, whiskers extend to largest and smallest values within 1.5 box lengths –Outliers are labeled with O, Extreme values are labeled with a star

Tests for Normality: Normal Probability Plot or Q-Q Plot

Outliers and Extreme Scores Chapter 2 College of Education and Health Professions

SPSS – Explore BoxPlot The top of the box is the upper fourth or 75th percentile. The bottom of the box is the lower fourth or 25th percentile. 50 % of the scores fall within the box or interquartile range. The horizontal line is the median. The ends of the whiskers represent the largest and smallest values that are not outliers. An outlier, O, is defined as a value that is smaller (or larger) than 1.5 box-lengths. An extreme value, E, is defined as a value that is smaller (or larger) than 3 box-lengths. Normally distributed scores typically have whiskers that are about the same length and the box is typically smaller than the whiskers.

Choosing a Z Score to Define Outliers Z Score% Above% +/- Above

Decisions for Extremes and Outliers 1.Check your data to verify all numbers are entered correctly. 2.Verify your devices (data testing machines) are working within manufacturer specifications. 3.Use Non-parametric statistics, they don’t require a normal distribution. 4.Develop a criteria to use to label outliers and remove them from the data set. You must report these in your methods section. 1.If you remove outliers consider including a statistical analysis of the results with and without the outlier(s). In other words, report both, see Stevens (1990) Detecting outliers. 5.Do a log transformation. 1.If you data have negative numbers you must shift the numbers to the positive scale (eg. add 20 to each). 2.Try a natural log transformation first in SPSS use LN(). 3.Try a log base 10 transformation, in SPSS use LG10().

Transform - Compute

Add 10 to the data. Then log transform. Add 10 to each data point. Try Natural Log. Last option, use Log10.

Add 10 to each data point, since you can not take a log of a negative number.

First try a Natural Log Transformation

If Natural Log doesn’t work try Log10 Transformation.

Outlier Criteria: 1.5 * Interquartile Range from the Median Milner CE, Ferber R, Pollard CD, Hamill J, Davis IS. Biomechanical Factors Associated with Tibial Stress Fracture in Female Runners. Med Sci Sport Exer. 38(2): , Statistical analysis. Boxplots were used to identify outliers, defined as values >1.5 times the interquartile range away from the median. Identified outliers were removed from the data before statistical analysis of the differences between groups. A total of six data points fell outside this defined range and were removed as follows: two from the RTSF group for BALR, one from the CTRL group for ASTIF, one from each group for KSTIF, and one from the CTRL group for TIBAMI.

Outlier Criteria: 1.5 * Interquartile Range from the Median (using SPSS)

Outliers = 1.5 * 3.39 above and below

Outlier Criteria: ± 3 standard deviations from the mean Tremblay MS, Barnes JD, Copeland JL, Esliger DW. Conquering Childhood Inactivity: Is the Answer in the Past? Med Sci Sport Exer. 37(7): , Data analyses. The normality of the data was assessed by calculating skewness and kurtosis statistics. The data were considered within the limits of a normal distribution if the dividend of the skewness and kurtosis statistics and their respective standard errors did not exceed ± 2.0. If the data for a given variable were not normally distributed, one of two steps was taken: either a log transformation (base 10) was performed or the outliers were identified (± 3 standard deviations from the mean) and removed. Log transformations were performed for push-ups and minutes of vigorous physical activity per day. Outliers were removed from the data for the following variables: sitting height, body mass index (BMI), handgrip strength, and activity counts per minute..

Computing and Saving Z Scores Check this box and SPSS creates and saves the z scores for all selected variables. The z scores in this case will be named zLight1…

Computing and Saving Z Scores Now, you can identify and remove raw scores above and below 3 sds if you want to remove outliers.

Comparison of Outlier Methods Median ± 1.5 * Interquartile Range 27 ± 1.5 * 7 gives a range of 16.5 – 37.5 TABLE 1.1 Newcomb's measurements of the passage time of light

Comparison of Outlier Methods ± 2 SDs (Notice the 16’s are not removed) TABLE 1.1 Newcomb's measurements of the passage time of light

Comparison of Outlier Methods ± 2 SDs (Notice the 16s are not removed) TABLE 1.1 Newcomb's measurements of the passage time of light

Comparison of Outlier Methods ± 3 SDs (Step 1, then run Z scores again) TABLE 1.1 Newcomb's measurements of the passage time of light

Comparison of Outlier Methods ± 3 SDs (Step 2, after already removing the -44, the -2 then has a SD of so it is removed) TABLE 1.1 Newcomb's measurements of the passage time of light

Conclusions? The median ± 1.5 * interquartile range appears to be too liberal. ± 2 SDs may also be too liberal and statisticians may not approve. An iterative process where you remove points above and below 3 SDs and then re-check the distribution may be the most conservative and acceptable method. Choosing 3.1, 3.2, or 3.3 as a SD increases the protection against removing a score that is potentially valid and should be retained.

Choosing a Z Score to Define Outliers Z Score% Above% +/- Above

Decisions for Extremes and Outliers 1.Check your data to verify all numbers are entered correctly. 2.Verify your devices (data testing machines) are working within manufacturer specifications. 3.Use Non-parametric statistics, they don’t require a normal distribution. 4.Develop a criteria to use to label outliers and remove them from the data set. You must report these in your methods section. 1.If you remove outliers consider including a statistical analysis of the results with and without the outlier(s). In other words, report both, see Stevens (1990) Detecting outliers. 5.Do a log transformation. 1.If you data have negative numbers you must shift the numbers to the positive scale (eg. add 20 to each). 2.Try a natural log transformation first in SPSS use LN(). 3.Try a log base 10 transformation, in SPSS use LG10().

Data Transformations and Their Uses Data TransformationCan Correct For Log Transformation (log(X))Positive Skew, Unequal Variances Square Root Transformation (sqrt(X))Positive Skew, Unequal Variances Reciprocal Transformation (1/X)Positive Skew, Unequal Variances Reverse Score Transformation – all of the above can correct for negative skew, but you must first reverse the scores. Just subtract each score from the highest score in the data set + 1. Negative Skew