Psy B07 Chapter 2Slide 1 DESCRIBING AND EXPLORING DATA.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Section #1 October 5 th Research & Variables 2.Frequency Distributions 3.Graphs 4.Percentiles 5.Central Tendency 6.Variability.
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
Statistics for the Social Sciences
Calculating & Reporting Healthcare Statistics
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
PSY 307 – Statistics for the Behavioral Sciences
Measures of Variability
Introductory Mathematics & Statistics
Data observation and Descriptive Statistics
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Central Tendency and Variability
Measures of Central Tendency 3.1. ● Analyzing populations versus analyzing samples ● For populations  We know all of the data  Descriptive measures.
Initial Data Analysis Central Tendency.
Measures of Central Tendency
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Chapter 3 Descriptive Measures
Chapter 1: Introduction to Statistics
Initial Data Analysis Central Tendency. Notation  When we describe a set of data corresponding to the values of some variable, we will refer to that.
Describing distributions with numbers
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 2 NUMERICAL DATA REPRESENTATION.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?
CORRELATION & REGRESSION
Chapter 3 – Descriptive Statistics
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Measures of Central Tendency or Measures of Location or Measures of Averages.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Lecture 3 A Brief Review of Some Important Statistical Concepts.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Initial Data Analysis Frequency. IDA  Often overlooked or sloughed off as being not all that important but…  It is at the beginning stages where much.
1.1 - Populations, Samples and Processes Pictorial and Tabular Methods in Descriptive Statistics Measures of Location Measures of Variability.
Descriptive Statistics: Numerical Methods
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
The Central Tendency is the center of the distribution of a data set. You can think of this value as where the middle of a distribution lies. Measure.
The Robust Approach Dealing with real data. Estimating Population Parameters Four properties are considered desirable in a population estimator:  Sufficiency.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Practice Page 65 –2.1 Positive Skew Note Slides online.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Measures of variability: understanding the complexity of natural phenomena.
Robust Estimators.
1 Review Sections 2.1, 2.2, 1.3, 1.4, 1.5, 1.6 in text.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
Introduction to statistics I Sophia King Rm. P24 HWB
Today: Standard Deviations & Z-Scores Any questions from last time?
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 1 – Slide 1 of 27 Chapter 3 Section 1 Measures of Central Tendency.
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Measures of Central Tendency
MEASURES OF CENTRAL TENDENCY
An Introduction to Statistics
Presentation transcript:

Psy B07 Chapter 2Slide 1 DESCRIBING AND EXPLORING DATA

Psy B07 Chapter 2Slide 2  Plotting data  Grouping data  Terminology  Notation  Measures of Central Tendency  Measures of Variability  Properties of a Statistic Outline

Psy B07 Chapter 2Slide 3 Plotting Data  Once a bunch of data has been collected, the raw numbers must be manipulated in some fashion to make them more informative.  Several options are available including plotting the data or calculating descriptive statistics

Psy B07 Chapter 2Slide 4 Plotting Data Age Weight  Raw data of typical age and weight in a second year course (made- up data) Age Weight

Psy B07 Chapter 2Slide 5 Plotting Data  Often, the first thing one does with a set of raw data is to plot frequency distributions.  Usually this is done by first creating a table of the frequencies broken down by values of the relevant variable, then the frequencies in the table are plotted in a histogram

Psy B07 Chapter 2Slide 6 Plotting Data  Example: Typical age in a second year course  Note: The frequencies in the adjacent table were calculated by simply counting the number of subjects having the specified value for the age variable

Psy B07 Chapter 2Slide 7 Plotting Data

Psy B07 Chapter 2Slide 8 Grouping Data  Plotting is easy when the variable of interest has a relatively small number of values (like our age variable did).  However, the values of a variable are sometimes more continuous, resulting in uninformative frequency plots if done in the above manner.

Psy B07 Chapter 2Slide 9 Grouping Data  For example, our weight variable ranges from 100 lb. to 200 lb. If we used the previously described technique, we would end up with 100 bars, most of which with a frequency less than 2 or 3 (and many with a frequency of zero).  We can get around this problem by grouping our values into bins. Try for around 10 bins with natural splits.

Psy B07 Chapter 2Slide 10 Grouping Data

Psy B07 Chapter 2Slide 11 Grouping Data Check out this demo which clearly shows how the width of the bin that you select can clearly affect the “look” of the datathis demo Here is another similar demonstration of the effects of bin width demonstration  See section in text on cumulative frequency distributions

Psy B07 Chapter 2Slide 12 Terminology  Often, frequency histograms tend to have a roughly symmetrical bell-shape and such distributions are called normal or Gaussian

Psy B07 Chapter 2Slide 13 Terminology  Sometimes, the bell shape is not symmetrical  The term positive skew refers to the situation where the “tail” of the distribution is to the right, negative skew is when the “tail” is to the left

Psy B07 Chapter 2Slide 14 Terminology

Psy B07 Chapter 2Slide 15 Notation  Variables  When we describe a set of data corresponding to the values of some variable, we will refer to that set using a letter such as X or Y.  When we want to talk about specific data points within that set, we specify those points by adding a subscript to the letter like X 1.

Psy B07 Chapter 2Slide 16 Notation 5,8, 12,3,6,8,7 X1, X2, X3, X4, X5, X6, X7 X1, X2, X3, X4, X5, X6, X7

Psy B07 Chapter 2Slide 17 Notation  The Greek letter sigma, which looks like , means “add up” or “sum” whatever follows it.  Thus,  X i, means “add up all the X i s.  If we use the X i s from the previous example,  X i = 49 (or just  X).

Psy B07 Chapter 2Slide 18 Nasty Example

Psy B07 Chapter 2Slide 19 Nasty Example  X = 360  Y = 336  (X-Y) = 24  X 2 = (  X) 2 =

Psy B07 Chapter 2Slide 20 Your turn  (XY) = (  (X-Y)) 2 = 576  (X 2 -Y 2 ) = 2956

Psy B07 Chapter 2Slide 21 Notation  Sometimes things are made more complicated because letters (e.g., X) are sometimes used to refer to entire data sets (as opposed to single variables) and multiple subscripts are used to specify specific data points.

Psy B07 Chapter 2Slide 22 Notation X 24 = 3  X or  X ij = 61

Psy B07 Chapter 2Slide 23 Measures of Central Tendency  While distributions provide an overall picture of some data set, it is sometimes desirable to represent the entire data set using descriptive statistics.  The first descriptive statistics we will discuss, are those used to indicate where the centre of the distribution lies.

Psy B07 Chapter 2Slide 24 Measures of Central Tendency

Psy B07 Chapter 2Slide 25 Measures of Central Tendency  There are, in fact, three different measures of central tendency.  The first of these is called the mode.  The mode is simply the value of the relevant variable that occurs most often (i.e., has the highest frequency) in the sample.

Psy B07 Chapter 2Slide 26 Measures of Central Tendency  Note that if you have done a frequency histogram, you can often identify the mode simply by finding the value with the highest bar.  However, that will not work when grouping was performed prior to plotting the histogram (although you can still use the histogram to identify the modal group, just not the modal value)

Psy B07 Chapter 2Slide 27 Measures of Central Tendency  Create a non-grouped frequency table as described previously, then identify the value with the greatest frequency.  Example: Class height.

Psy B07 Chapter 2Slide 28 Measures of Central Tendency  A second measure of central tendency is called the median.  The median is the point corresponding to the score that lies in the middle of the distribution (i.e., there are as many data points above the median as there are below the median).

Psy B07 Chapter 2Slide 29 Measures of Central Tendency  To find the median, the data points must first be sorted into either ascending or descending numerical order.  The position of the median value can then be calculated using the following formula: Median Location

Psy B07 Chapter 2Slide 30 Measures of Central Tendency 1) If there are an odd number of data points: (1, 3, 3, 4, 4, 5, 6, 7, 12) The median is the item in the fifth position of the ordered data set, therefore the median is 4 Median Location

Psy B07 Chapter 2Slide 31 Measures of Central Tendency 2) If there are an even number of data points: (1, 3, 3, 3, 5, 5, 6, 7) We take the average of the two adjacent values – in this case giving us 4 Median Location

Psy B07 Chapter 2Slide 32 Measures of Central Tendency  Finally, the most commonly used measure of central tendency is called the mean (denoted x for a sample, and μ for a population).  The mean is the same of what most of us call the average, and it is calculated in the following manner:

Psy B07 Chapter 2Slide 33 Measures of Central Tendency  For example, given the data set that we used to calculate the median (odd number example), the corresponding mean would be:

Psy B07 Chapter 2Slide 34 Measures of Central Tendency  When a distribution is fairly symmetrical, the mean, median, and mode will be quite similar  However, when the underlying distribution is not symmetrical, the three measures of central tendency can be quite different

Psy B07 Chapter 2Slide 35 Measures of Central Tendency  This raises the issue of which measure is best.  Note that if you were calculating these values, you would show all your steps (it’s good to be a prof!).

Psy B07 Chapter 2Slide 36 Measures of Central Tendency   Here is a demonstration that allows you to change a frequency histogram while simultaneously noting the effects of those changes on the mean versus the median. Here is a demonstration   As you use the demo, you should easily be able to think about how these changes are also affecting the mode, right?

Psy B07 Chapter 2Slide 37 Measures of Variability  In addition to knowing where the centre of the distribution is, it is often helpful to know the degree to which individual values cluster around the centre.  This is known as variability

Psy B07 Chapter 2Slide 38 Measures of Variability  There are various measures of variability, the most straightforward being the range of the sample: Highest value minus lowest value  While range provides a good first pass at variance, it is not the best measure because of its sensitivity to extreme scores (see text).

Psy B07 Chapter 2Slide 39 Measures of Variability  One approach to estimating variability is to directly measure the degree to which individual data points differ from the mean and then average those deviations.  This is known as the average deviation

Psy B07 Chapter 2Slide 40 Measures of Variability  However, if we try to do this with real data, the result will always be zero: Example: (2,3,3,4,4,6,6,12)

Psy B07 Chapter 2Slide 41 Measures of Variability  One way to get around the problem with the average deviation is to use the absolute value of the differences, instead of the differences themselves.  The absolute value of some number is just the number without any sign: For Example: |-3| = 3 And: |+3| = 3 And: |+3| = 3

Psy B07 Chapter 2Slide 42 Measures of Variability  Thus, we could re-write and solve our average deviation question as follows:  Therefore, this data set has a mean of 5, and a MAD of 2.25

Psy B07 Chapter 2Slide 43 Measures of Variability  Although the MAD is an acceptable measure of variability, the most commonly used measure is variance (denoted s 2 for a sample and  2 for a population) and its square root termed the standard deviation (denoted s for a sample and  for a population).

Psy B07 Chapter 2Slide 44 Measures of Variability  The computation of variance is also based on the basic notion of the average deviation however, instead of getting around the “zero problem” by using absolute deviations (as in MAD), the “zero problem” is eliminating by squaring the differences from the mean

Psy B07 Chapter 2Slide 45 Measures of Variability  Example: (2,3,4,4,4,5,6,12)

Psy B07 Chapter 2Slide 46 Measures of Variability  To convert the variance into SD, we simply take a square root of it:

Psy B07 Chapter 2Slide 47 Measures of Variability  This demonstration allows you to play with the mean and standard deviation of a distribution. Note that changing the mean of the distribution simply moves the entire distribution to the left or right without changing its shape. In contrast, changing the standard deviation alters the spread of the data but does not affect where the distribution is “centered”  This demonstration allows you to play with the mean and standard deviation of a distribution. Note that changing the mean of the distribution simply moves the entire distribution to the left or right without changing its shape. In contrast, changing the standard deviation alters the spread of the data but does not affect where the distribution is “centered”DEMODEMO

Psy B07 Chapter 2Slide 48 Measures of Variability  Population vs. Sample  As mentioned, we usually deal with statistics, not parameters. σ 2 and σ are parameters. Their counterparts, when dealing with samples are s 2 and s. The formulae are slightly different

Psy B07 Chapter 2Slide 49 Properties of a Statistic  So, the mean (X) and variance (s 2 ) are the descriptive statistics that are most commonly used to represent the data points of some sample.  The real reason that they are the preferred measures of central tendency and variance is because of certain properties they have as estimators of their corresponding population parameters; μ and  2.

Psy B07 Chapter 2Slide 50 Properties of a Statistic  Four properties are considered desirable in a population estimator; sufficiency, unbiasedness, efficiency, & resistance.  Both the mean and the variance are the best estimators in their class in terms of the first three of these four properties.  To understand these properties, you first need to understand a concept in statistics called the sampling distribution

Psy B07 Chapter 2Slide 51 Properties of a Statistic   We will discuss sampling distributions off and on throughout the course, and I only want to touch on the notion now.   Basically, the idea is this – in order to examine the properties of a statistic we often want to take repeated samples from some population of data and calculate the relevant statistic on each sample. We can then look at the distribution of the statistic across these samples and ask a variety of questions about it.   Check out this demonstration which I hope makes the concept of sampling distributions more clear.this demonstration

Psy B07 Chapter 2Slide 52 Properties of a Statistic 1) Sufficiency  A sufficient statistic is one that makes use of all of the information in the sample to estimate its corresponding parameter.

Psy B07 Chapter 2Slide 53 Properties of a Statistic 2) Unbiasedness  A statistic is said to be an unbiased estimator if its expected value (i.e., the mean of a number of sample means) is equal to the population parameter it is estimating.  Explanation of N-1 in s 2 formula.

Psy B07 Chapter 2Slide 54 Properties of a Statistic  Using the procedure, the mean can be shown to be an unbiased estimator (see p 47).  However, if the σ 2 formula is used to calculate s 2 it turns out to underestimate σ 2

Psy B07 Chapter 2Slide 55 Properties of a Statistic  The reason for this bias is that, when we calculate s 2, we use x, an estimator of the population mean  The chances of x being EXACTLY the same as μ are virtually nil, which results in the bias  To compensate, we use N-1  Note that this is only true when calculating s 2, if you have a measurable population and you want to calculate  2, you use N in the denominator, not N-1

Psy B07 Chapter 2Slide 56 Properties of a Statistic  Degrees of Freedom  The mean of 6, 8, & 10 is 8.  If I allow you to change as many of these numbers as you want BUT the mean must stay 8, how many of the numbers are you free to vary?

Psy B07 Chapter 2Slide 57 Properties of a Statistic  The point of this exercise is that when the mean is fixed, it removes a degree of freedom from your sample -- this is like actually subtracting 1 from the number of observations in your sample.  It is for exactly this reason that we use N-1 in the denominator when we calculate s 2 (i.e., the calculation requires that the mean be fixed first which effectively removes -- fixes -- one of the data points).

Psy B07 Chapter 2Slide 58 Properties of a Statistic 3) Efficiency  The efficiency of a statistic is reflected in the variance that is observed when one examines the means of a bunch of independently chosen samples. The smaller the variance, the more efficient the statistic is said to be

Psy B07 Chapter 2Slide 59 Properties of a Statistic 4) Resistance  The resistance of an estimator refers to the degree to which that estimate is effected by extreme values.  As mentioned previously, both X and s 2 are highly sensitive to extreme values

Psy B07 Chapter 2Slide 60 Properties of a Statistic 4) Resistance  Despite this, they are still the most commonly used estimates of the corresponding population parameters, mostly because of their superiority over other measures in terms sufficiency, unbiasedness, & efficiency