BNAD 276: Statistical Inference in Management Winter, 2015

Slides:



Advertisements
Similar presentations
Data observation and Descriptive Statistics
Advertisements

Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall, 2014 Room 120 Integrated.
Today: Central Tendency & Dispersion
Describing distributions with numbers
Objective To understand measures of central tendency and use them to analyze data.
MGMT 276: Statistical Inference in Management Spring, 2014 Green sheets.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall, 2014 Room 120 Integrated.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall, 2014 Room 120 Integrated.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Stage Screen Row B Gallagher Theater Row R Lecturer’s desk Row A Row B Row C
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Modern Languages Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Modern Languages Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M
Lecturer’s desk INTEGRATED LEARNING CENTER ILC 120 Screen Row A Row B Row C Row D Row E Row F Row G Row.
BNAD 276: Statistical Inference in Management Winter, Green sheet Seating Chart.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Lecturer’s desk INTEGRATED LEARNING CENTER ILC 120 Screen Row A Row B Row C Row D Row E Row F Row G Row.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Statistical analysis.
Screen Stage Lecturer’s desk Gallagher Theater Row A Row A Row A Row B
Different Types of Data
Business and Economics 6th Edition
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
One-Variable Statistics
Statistical analysis.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Please hand in Project 4 To your TA.
Descriptive Statistics (Part 2)
Statistics.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Please sit in your assigned seat INTEGRATED LEARNING CENTER
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Screen Stage Lecturer’s desk Gallagher Theater Row A Row A Row A Row B
INTEGRATED LEARNING CENTER
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
INTEGRATED LEARNING CENTER
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Lecturer’s desk Projection Booth Screen Screen Harvill 150 renumbered
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
Module 8 Statistical Reasoning in Everyday Life
An Introduction to Statistics
Lecturer’s desk Projection Booth Screen Screen Harvill 150 renumbered
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Week 3 Lecture Statistics For Decision Making
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2018 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Alyson Lecturer’s desk Chris Flo Jun Trey Projection Booth Screen
Welcome!.
Chapter Nine: Using Statistics to Answer Questions
Worksheet – Redux.
Advanced Algebra Unit 1 Vocabulary
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Compare and contrast histograms to bar graphs
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Module 8: Two-Way Frequency Tables
Business and Economics 7th Edition
Presentation transcript:

BNAD 276: Statistical Inference in Management Winter, 2015 Welcome Syllabi Green sheet Seating Chart http://www.youtube.com/watch?v=Ahg6qcgoay4&watch_response

Daily group portfolios Beginning of each lecture (first 5 minutes) Meet in groups of 3 or 4 Quiz one another on class material Discuss the questions and determine the correct answer for each question Five copies (one for each group member – and typed) 3 multiple choice questions based on lecture Include 4 options (a, b, c, and d) Include a name and describe a person in a certain situation Margaret was interested in taking a Statistics course. It is likely she was interested in studying which of the following? a. economic theories of communism b. theological perspectives of life after death c. musical compositions of the 12th century d. statistical techniques and inference They can be funny or serious, and must be clear and have only one correct answer.

Please start portfolios

. Homework Assignment Go to D2L - Click on “Content” Click on “Interactive Online Homework Assignments” Complete the module: Seven Prototypical Designs

Height of Daughters (inches) Height of Mothers (in) 48 52 56 60 64 68 72 76 48 52 5660 64 68 72 This shows the strong positive (r = +0.8) relationship between the heights of daughters (in inches) with heights of their mothers (in inches). Variable name is listed clearly Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Estimated value (actual number) Both axes have real numbers listed Both axes and values are labeled Variable name is listed clearly 1. Describe one positive correlation Draw a scatterplot (label axes) 2. Describe one negative correlation Draw a scatterplot (label axes) Hand in Correlation worksheet 3. Describe one zero correlation Draw a scatterplot (label axes) 4. Describe one perfect correlation (positive or negative) Draw a scatterplot (label axes) 5. Describe curvilinear relationship Draw a scatterplot (label axes)

Overview Frequency distributions The normal curve Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure of 1) central tendency 2) dispersion or 3) shape Mean, Median, Mode, Trimmed Mean Standard deviation, Variance, Range Mean Absolute Deviation Skewed right, skewed left unimodal, bimodal, symmetric

Another example: How many kids in your family? Number of kids in family 1 4 3 2 1 8 4 2 2 14 14 4 2 1 4 2 2 3 1 8

Mean: The balance point of a distribution. Found Measures of Central Tendency (Measures of location) The mean, median and mode Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations Mean for a sample: Σx / n = mean = x Mean for a population: ΣX / N = mean = µ (mu) Measures of “location” Where on the number line the scores tend to cluster Note: Σ = add up x or X = scores n or N = number of scores

Number of kids in family Measures of Central Tendency (Measures of location) The mean, median and mode Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations Mean for a sample: Σx / n = mean = x 41/ 10 = mean = 4.1 Number of kids in family 1 4 3 2 1 8 4 2 2 14 Note: Σ = add up x or X = scores n or N = number of scores

How many kids are in your family? What is the most common family size? Number of kids in family 1 3 1 4 2 4 2 8 2 14 How many kids are in your family? What is the most common family size? Median: The middle value when observations are ordered from least to most (or most to least)

Number of kids in family 1 4 3 2 1 8 4 2 2 14 How many kids are in your family? What is the most common family size? Median: The middle value when observations are ordered from least to most (or most to least) 1, 3, 1, 4, 2, 4, 2, 8, 2, 14 1, 1, 2, 2, 2, 3, 4, 4, 8, 14

Number of kids in family 1 3 1 4 2 4 2 8 2 14 Number of kids in family 1 4 3 2 1 8 4 2 2 14 How many kids are in your family? What is the most common family size? Median: The middle value when observations are ordered from least to most (or most to least) 1, 3, 1, 4, 2, 4, 2, 8, 2, 14 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 8, 8, 14 14 2.5 2 + 3 µ=2.5 If there appears to be two medians, take the mean of the two Median always has a percentile rank of 50% regardless of shape of distribution Median also called the 2nd Quartile

Number of kids in family 1 4 3 2 1 8 4 2 2 14 Number of kids in family 1 3 1 4 2 4 2 8 2 14 How many kids are in your family? What is the most common family size? Median: The middle value when observations are ordered from least to most (or most to least) 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 8, 14 Lower half Upper half 2.5 2nd Quartile Middle number of all scores (Median) 1, 1, 1, 1, 2, 2, 2, 3, 8, 14 2, 2, 3, 4, 4, 4, 2, 4, 4, 8, 14 1st Quartile Middle number of lower half of scores 3rd Quartile Middle number of upper half of scores

Number of kids in family Mode: The value of the most frequent observation Score f . 1 2 2 3 3 1 4 2 5 0 6 0 7 0 8 1 9 0 10 0 11 0 12 0 13 0 14 1 Number of kids in family 1 3 1 4 2 4 2 8 2 14 Please note: The mode is “2” because it is the most frequently occurring score. It occurs “3” times. “3” is not the mode, it is just the frequency for the value that is the mode Bimodal distribution: If there are two most frequent observations

What about central tendency for qualitative data? Mode is good for nominal or ordinal data Median can be used with ordinal data Mean can be used with interval or ratio data

Overview Frequency distributions The normal curve Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure of 1) central tendency 2) dispersion or 3) shape Mean, Median, Mode, Trimmed Mean Skewed right, skewed left unimodal, bimodal, symmetric

Overview Frequency distributions The normal curve Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure of 1) central tendency 2) dispersion or 3) shape Mean, Median, Mode, Trimmed Mean Skewed right, skewed left unimodal, bimodal, symmetric

A little more about frequency distributions An example of a normal distribution

A little more about frequency distributions An example of a normal distribution

A little more about frequency distributions An example of a normal distribution

A little more about frequency distributions An example of a normal distribution

A little more about frequency distributions An example of a normal distribution

Measure of central tendency: describes how scores tend to Measure of central tendency: describes how scores tend to cluster toward the center of the distribution Normal distribution In all distributions: mode = tallest point median = middle score mean = balance point In a normal distribution: mode = mean = median

Positively skewed distribution Measure of central tendency: describes how scores tend to cluster toward the center of the distribution Positively skewed distribution In all distributions: mode = tallest point median = middle score mean = balance point In a positively skewed distribution: mode < median < mean Note: mean is most affected by outliers or skewed distributions With Bill Gates our Average Income would be $38 million a year

Measure of central tendency: describes how scores tend to Measure of central tendency: describes how scores tend to cluster toward the center of the distribution Negatively skewed distribution In all distributions: mode = tallest point median = middle score mean = balance point In a negatively skewed distribution: mean < median < mode Note: mean is most affected by outliers or skewed distributions

Mode: The value of the most frequent observation Bimodal distribution: Distribution with two most frequent observations (2 peaks) Example: Ian coaches two boys baseball teams. One team is made up of 10-year-olds and the other is made up of 16-year-olds. When he measured the height of all of his players he found a bimodal distribution

Overview Frequency distributions The normal curve Mean, Median, Mode, Trimmed Mean Standard deviation, Variance, Range Mean Absolute Deviation Skewed right, skewed left unimodal, bimodal, symmetric

Frequency distributions The normal curve

Some distributions are more Variability What might this be? Some distributions are more variable than others Let’s say this is our distribution of heights of men on U of A baseball team 5’ 5’6” 6’ 6’6” 7’ 5’ 5’6” 6’ 6’6” 7’ Mean is 6 feet tall What might this be? 5’ 5’6” 6’ 6’6” 7’

Dispersion: Variability Some distributions are more variable than others 6’ 7’ 5’ 5’6” 6’6” A The larger the variability the wider the curve tends to be The smaller the variability the narrower the curve tends to be B Range: The difference between the largest and smallest observations C Range for distribution A? Range for distribution B? Range for distribution C?

84” – 70” = 14” Wildcats Basketball team: Tallest player = 84” (same as 7’0”) (Kaleb Tarczewski and Dusan Ristic) Shortest player = 70” (same as 5’10”) (Parker Jackson-Cartwritght) Fun fact: Mean is 78 Range: The difference between the largest and smallest scores 84” – 70” = 14” xmax - xmin = Range Range is 14”

No reference is made to numbers between the min and max Baseball Fun fact: Mean is 72 Wildcats Baseball team: Tallest player = 77” (same as 6’5”) (Austin Schnabel) Shortest player = 69” (same as 5’9”) (Justin Behnke and Ernie DeLaTrinidad ) Range: The difference between the largest and smallest score 77” – 69” = 8” xmax - xmin = Range Range is 8” (77” – 69” ) Please note: No reference is made to numbers between the min and max

Generally, (on average) how far away is each score from the mean? Variability Standard deviation: The average amount by which observations deviate on either side of their mean Generally, (on average) how far away is each score from the mean? Mean is 6’

Let’s build it up again… U of A Baseball team Deviation scores Let’s build it up again… U of A Baseball team Diallo is 0” Diallo is 6’0” Diallo’s deviation score is 0 6’0” – 6’0” = 0 Diallo 5’8” 5’10” 6’0” 6’2” 6’4”

Let’s build it up again… U of A Baseball team Deviation scores Diallo is 0” Let’s build it up again… U of A Baseball team Preston is 2” Diallo is 6’0” Diallo’s deviation score is 0 Preston is 6’2” Preston Preston’s deviation score is 2” 6’2” – 6’0” = 2 5’8” 5’10” 6’0” 6’2” 6’4”

Let’s build it up again… U of A Baseball team Deviation scores Diallo is 0” Let’s build it up again… U of A Baseball team Preston is 2” Mike is -4” Hunter is -2 Diallo is 6’0” Diallo’s deviation score is 0 Hunter Preston is 6’2” Mike Preston’s deviation score is 2” Mike is 5’8” Mike’s deviation score is -4” 5’8” – 6’0” = -4 5’8” 5’10” 6’0” 6’2” 6’4” Hunter is 5’10” Hunter’s deviation score is -2” 5’10” – 6’0” = -2

Let’s build it up again… U of A Baseball team Deviation scores Diallo is 0” Let’s build it up again… U of A Baseball team Preston is 2” Mike is -4” Hunter is -2 Shea is 4 David is 0” Diallo’s deviation score is 0 David Preston’s deviation score is 2” Mike’s deviation score is -4” Shea Hunter’s deviation score is -2” Shea is 6’4” Shea’s deviation score is 4” 5’8” 5’10” 6’0” 6’2” 6’4” 6’4” – 6’0” = 4 David is 6’ 0” David’s deviation score is 0 6’ 0” – 6’0” = 0

Let’s build it up again… U of A Baseball team Deviation scores Diallo is 0” Let’s build it up again… U of A Baseball team Preston is 2” Mike is -4” Hunter is -2 Shea is 4 David is 0” Diallo’s deviation score is 0 David Preston’s deviation score is 2” Mike’s deviation score is -4” Shea Hunter’s deviation score is -2” Shea’s deviation score is 4” David’s deviation score is 0” 5’8” 5’10” 6’0” 6’2” 6’4”

Let’s build it up again… U of A Baseball team Deviation scores Diallo is 0” Let’s build it up again… U of A Baseball team Preston is 2” Mike is -4” Hunter is -2 Shea is 4 David is 0” 5’8” 5’10” 6’0” 6’2” 6’4”

Standard deviation: The average amount Deviation scores Standard deviation: The average amount by which observations deviate on either side of their mean Diallo is 0” Preston is 2” Mike is -4” Hunter is -2 Shea is 4 David is 0” 5’8” 5’10” 6’0” 6’2” 6’4”

Standard deviation: The average amount Deviation scores Standard deviation: The average amount by which observations deviate on either side of their mean Diallo is 0” Preston is 2” Mike is -4” Hunter is -2 Shea is 4 David is 0” 5’8” 5’10” 6’0” 6’2” 6’4”

Standard deviation: The average amount Deviation scores Standard deviation: The average amount by which observations deviate on either side of their mean Diallo is 0” Preston is 2” Mike is -4” Hunter is -2 Shea is 4 David is 0” 5’8” 5’10” 6’0” 6’2” 6’4”

How far away is each score from the mean? Diallo is 0” Mike is -4” Hunter is -2 Shea is 4 David is 0” Preston is 2” Deviation scores (x - µ) Deviation scores: The amount by which observations deviate on either side of their mean (x - µ) How far away is each score from the mean? Mean Diallo Deviation score Mike Preston Shea (x - µ) = ? Hunter Mike 5’8” - 6’0” = - 4” 5’9” - 6’0” = - 3” 5’10’ - 6’0” = - 2” 5’11” - 6’0” = - 1” 6’0” - 6’0 = 0 6’1” - 6’0” = + 1” 6’2” - 6’0” = + 2” 6’3” - 6’0” = + 3” 6’4” - 6’0” = + 4” Diallo How do we find each deviation score? (x - µ) Preston Hunter Diallo Mike Preston Find distance of each person from the mean (subtract their score from mean)

How far away is each score from the mean? Diallo is 0” Mike is -4” Hunter is -2 Shea is 4 David is 0” Preston is 2” Deviation scores (x - µ) Deviation scores: The amount by which observations deviate on either side of their mean (x - µ) How far away is each score from the mean? Mean Diallo Deviation score Preston Shea (x - µ) = ? Mike 5’8” - 6’0” = - 4” 5’9” - 6’0” = - 3” 5’10’ - 6’0” = - 2” 5’11” - 6’0” = - 1” 6’0” - 6’0 = 0 6’1” - 6’0” = + 1” 6’2” - 6’0” = + 2” 6’3” - 6’0” = + 3” 6’4” - 6’0” = + 4” Remember It’s relative to the mean Based on difference from the mean

How far away is each score from the mean? Standard deviation: The average amount by which observations deviate on either side of their mean Deviation scores (x - µ) Diallo is 0” Preston is 2” How far away is each score from the mean? Mike is -4” Hunter is -2 Shea is 4 Mean David is 0” Add up Deviation scores Diallo Preston Σ (x - µ) = ? Shea Mike 5’8” - 6’0” = - 4” 5’9” - 6’0” = - 3” 5’10’ - 6’0” = - 2” 5’11” - 6’0” = - 1” 6’0” - 6’0 = 0 6’1” - 6’0” = + 1” 6’2” - 6’0” = + 2” 6’3” - 6’0” = + 3” 6’4” - 6’0” = + 4” How do we find the average height? N Σx = average height How do we find the average spread? Σ(x - x) = 0 Σ(x - µ) N = average deviation Σ(x - µ) = 0

How far away is each score from the mean? Standard deviation: The average amount by which observations deviate on either side of their mean Deviation scores (x - µ) Diallo is 0” Preston is 2” How far away is each score from the mean? Mike is -4” Hunter is -2 Shea is 4 Mean David is 0” Diallo Preston Σ (x - µ) = ? Shea Mike 5’8” - 6’0” = - 4” 5’9” - 6’0” = - 3” 5’10’ - 6’0” = - 2” 5’11” - 6’0” = - 1” 6’0” - 6’0 = 0 6’1” - 6’0” = + 1” 6’2” - 6’0” = + 2” 6’3” - 6’0” = + 3” 6’4” - 6’0” = + 4” Square the deviations Big problem Σ(x - x) 2 2 Σ(x - x) = 0 Σ(x - µ) N Σ(x - µ) 2 Σ(x - µ) = 0

These would be helpful to know by heart – please memorize Standard deviation: The average amount by which observations deviate on either side of their mean These would be helpful to know by heart – please memorize these formula

What do these two formula have in common? Standard deviation: The average amount by which observations deviate on either side of their mean What do these two formula have in common?

What do these two formula have in common? Standard deviation: The average amount by which observations deviate on either side of their mean What do these two formula have in common?

How do these formula differ? Standard deviation: The average amount by which observations deviate on either side of their mean “n-1” is Degrees of Freedom” How do these formula differ?

“Sum of Squares” “Sum of Squares” “Sum of Squares” “Sum of Squares” Standard deviation: The average amount by which observations deviate on either side of their mean “Sum of Squares” “Sum of Squares” “Sum of Squares” “Sum of Squares” Diallo is 0” Mike is -4” Hunter is -2 Shea is 4 David 0” Preston is 2” Deviation scores Remember, it’s relative to the mean “n-1” is “Degrees of Freedom” “n-1” is “Degrees of Freedom” Generally, (on average) how far away is each score from the mean? Based on difference from the mean Mean Remember, We are thinking in terms of “deviations” Diallo Please memorize these Preston Shea Mike

Standard deviation (definitional formula) - Let’s do one This numerator is called “sum of squares” Each of these are deviation scores _ X - µ _ 1 - 5 = - 4 2 - 5 = - 3 3 - 5 = - 2 4 - 5 = - 1 5 - 5 = 0 6 - 5 = 1 7 - 5 = 2 8 - 5 = 3 9 - 5 = 4 (X - µ)2 16 9 4 1 60 Step 1: Find the mean _ X_ 1 2 3 4 5 6 7 8 9 45 ΣX = 45 ΣX / N = 45/9 = 5 Step 2: Subtract the mean from each score Step 3: Square the deviations Step 4: Find standard deviation This is the Variance! a) 60 / 9 = 6.6667 b) square root of 6.6667 = 2.5820 Σ(x - µ) = 0 This is the standard deviation!

Another example: How many kids in your family? 3 4 2 1 4 2 2 3 1 8 https://www.youtube.com/watch?v=YST-JQ1bREA

Standard deviation - Let’s do one Definitional formula How many kids? Step 1: Find the mean X - µ_ 3 - 3 = 0 2 - 3 = -1 1 - 3 = -2 4 - 3 = 1 8 - 3 = 5 (X - µ)2 1 4 25 _ X_ 3 2 1 4 8 = 30 = 30/10 = 3 Step 2: Subtract the mean from each score (deviations) Step 3: Square the deviations Step 4: Add up the squared deviations Step 5: Find standard deviation Σ(x - µ) = 0 Σx = 30 Σ(x - µ)2 = 38 This is the Variance! a) 38 / 10 = 3.8 b) square root of 3.8 = 1.95 This is the standard deviation!

These would be helpful to know by heart – please memorize areas 1 sd above and below mean 68% 2 sd above and below mean 95% 3 sd above and below mean 99.7% These would be helpful to know by heart – please memorize areas

Raw scores, z scores & probabilities Please note spatially where 1 standard deviation falls on the curve 68% 95% 99.7%

Raw scores, z scores & probabilities Please note spatially where 1 standard deviation falls on the curve

Raw scores, z scores & probabilities 1 sd above and below mean 68% z = +1 z = -1 Mean = 50 S = 10 (Note S = standard deviation) If we go up one standard deviation z score = +1.0 and raw score = 60 If we go down one standard deviation z score = -1.0 and raw score = 40

Raw scores, z scores & probabilities 2 sd above and below mean 95% z = -2 z = +2 Mean = 50 S = 10 (Note S = standard deviation) If we go up two standard deviations z score = +2.0 and raw score = 70 If we go down two standard deviations z score = -2.0 and raw score = 30

Raw scores, z scores & probabilities 3 sd above and below mean 99.7% z = +3 z = -3 Mean = 50 S = 10 (Note S = standard deviation) If we go up three standard deviations z score = +3.0 and raw score = 80 If we go down three standard deviations z score = -3.0 and raw score = 20

z score = raw score - mean standard deviation If we go up one standard deviation z score = +1.0 and raw score = 105 z = -1 z = +1 68% If we go down one standard deviation z score = -1.0 and raw score = 95 85 90 95 100 105 110 115 If we go up two standard deviations z score = +2.0 and raw score = 110 z = -2 95% z = +2 If we go down two standard deviations z score = -2.0 and raw score = 90 85 90 95 100 105 110 115 If we go up three standard deviations z score = +3.0 and raw score = 115 99.7% z = -3 z = +3 If we go down three standard deviations z score = -3.0 and raw score = 85 85 90 95 100 105 110 115 z score: A score that indicates how many standard deviations an observation is above or below the mean of the distribution z score = raw score - mean standard deviation

Writing Assignment – Pop Quiz 1. What is a “deviation score” 2. Preston has a deviation score of 2: What does that tell us about Preston? Is he taller or shorter than the mean? And by how much? Are most people in the group taller or shorter than Preston Mike has a deviation score of -4: What does that tell us about Mike? Are most people in the group taller or shorter than Mike Diallo has a deviation score of 0: What does that tell us about Diallo? Are most people in the group taller or shorter than Diallo? Please write the formula for the standard deviation of a population Please draw 3 curves showing 1, 2 & 3 standard deviations from mean

Writing Assignment – Pop Quiz 7. What does this symbol refer to? What is it called? What does it mean? Is it referring to a sample or population? 8. What does this symbol refer to? What is it called? What does it mean? Is it referring to a sample or population? 9. What does this symbol refer to? What is it called? What does it mean? Is it referring to a sample or population? 10. What does this symbol refer to? What is it called? What does it mean? Is it referring to a sample or population? 11. What does this symbol refer to?

Writing Assignment – Pop Quiz 12. What does this refer to? What are they called? What do they refer to? How are they different 13. What does this refer to? What are they called? How are they different 14. What do these two refer to? What are they called? How are they different 15. What does this refer to? What is it called? Use it for sample data or population?

Writing Assignment – Pop Quiz 16. What does this refer to? What are they called? What do they refer to? How are they different 17. What does this refer to? What are they called? What do they refer to? How are they different

Writing Assignment – Pop Quiz 1. What is a “deviation score” 2. Preston has a deviation score of 2: What does that tell us about Preston? Is he taller or shorter than the mean? And by how much? Are most people in the group taller or shorter than Preston Mike has a deviation score of -4: What does that tell us about Mike? Are most people in the group taller or shorter than Mike Diallo has a deviation score of 0: What does that tell us about Diallo? Are most people in the group taller or shorter than Diallo? Please write the formula for the standard deviation of a population Please draw 3 curves showing 1, 2 & 3 standard deviations from mean How far away is each score from the mean? Preston is 2” taller than the mean (taller than most) Mike is 4” shorter than the mean (shorter than most) Diallo is exactly same height as mean (half taller half shorter)

Writing Assignment – Pop Quiz The standard deviation (population) 7. What does this symbol refer to? What is it called? What does it mean? Is it referring to a sample or population? sigma population The mean (population) 8. What does this symbol refer to? mu What is it called? What does it mean? Is it referring to a sample or population? population The mean (sample) 9. What does this symbol refer to? x-bar What is it called? What does it mean? Is it referring to a sample or population? sample The standard deviation (sample) 10. What does this symbol refer to? s What is it called? What does it mean? Is it referring to a sample or population? sample 11. What does this symbol refer to? Each individual score

Variance Writing Assignment – Pop Quiz 12. What does this refer to? population sample 12. What does this refer to? Variance What are they called? What do they refer to? How are they different S squared Sigma squared 13. What does this refer to? population sample What are they called? How are they different Deviation scores 14. What do these two refer to? sample population What are they called? How are they different Sum of squares 15. What does this refer to? Degrees of freedom What is it called? Use it for sample data or population?

Writing Assignment – Pop Quiz Standard Deviation Writing Assignment – Pop Quiz 16. What does this refer to? What are they called? What do they refer to? How are they different sample population Variance 17. What does this refer to? What are they called? What do they refer to? How are they different sample population

Connecting intentions of studies with Experimental Methodologies Appropriate statistical analyses Appropriate graphs Today I want to present some “typical designs”. We will spend the next couple weeks filling in the details. We’ll come back to these distinctions over and over again, and build on them for the rest of the semester. Let’s get this overview well! Not worry about calculation details for now

Create example of each type Identify IV (one or two) Identify DV (one or two) Draw possible graph for each Homework Assignment Think about this as we work through each type of study Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Study Type 5: Correlation Study Type 6: Simple and Multiple regression Study Type 7: Chi Square We’ll come back to these distinctions over and over again, and build on them for the rest of the class. Let’s get this overview well! Not worry about calculation details for now

Study Type 1: Confidence Intervals Remember, this is just introduction to the idea Not worry about calculation details for now, we will get to those soon Study Type 1: Confidence Intervals On average newborns weigh 7 pounds, and are 20 inches long. My sister just had a baby - guess how much it weighs? Makes sense, right?!? Guess the mean. On average you would be right most often if you always guessed the mean Point estimate versus confidence interval: Guessing a single number versus a range of numbers What if you really needed to be right?!!? You could guess a range with smallest and largest possible scores. (how wide a range to be completely sure? Confidence interval: Guessing a range (max and min) and assigning a level of confidence that the score falls in that range

Study Type 1: Confidence Intervals Remember, this is just introduction to the idea Not worry about calculation details for now, we will get to those soon Study Type 1: Confidence Intervals Confidence Intervals: A range of values that, with a known degree of certainty, includes an unknown population characteristic, such as a population mean 100% Confidence Interval: We can be 100% confident that our population mean falls between these two scores (Guess absurdly large and small values) 99% Confidence Interval: We can be 99% confident that our population mean falls between these two scores 95% Confidence Interval: We can be 95% confident that our population mean falls between these two scores Which has a wider interval relative to raw scores 95% or 99%?

Study Type 1: Confidence Intervals Remember, this is just introduction to the idea Not worry about calculation details for now, we will get to those soon Study Type 1: Confidence Intervals Confidence Intervals: A range of values that, with a known degree of certainty, includes an unknown population characteristic, such as a population mean This sample of 10,000 newborns a mean weight is 7 pounds. What do you think the minimum and maximum weights would be to capture 95% of all newborns? This sample of 1000 flights, the mean number of empty seats is 12. What do you think the minimum and maximum number of empty seats are likely to be in the flights today with a 95% level of certainty? You can use a mean of a sample to guess the mean of population mean of a smaller sample most likely score for an individual This sample of 500 households produced a mean income of $35,000 a year. What do you think the minimum and maximum income levels are so that we are 95% confident that we captured Mabel’s?

We are looking to compare two means Study Type 1: Confidence Intervals Study Type 2: t-test Comparing Two Means? Use a t-test We are looking to compare two means http://www.youtube.com/watch?v=n4WQhJHGQB4

Study Type 2: t-test analysis Single Independent Variable (categorical) comparing two groups Single Dependent Variable (numeric) Used to test the effect of the IV on the DV Andrea was interested in the effect of vacation time on productivity of the workers in her department. She randomly assigned workers into two groups, she allowed one group to go on vacation while the other group had no vacation. After the vacation she measured productivity for the two groups. Independent Variable Dependent Variable Between or within Quasi or true Causal relationship? Productivity Yes Vacation No Vacation

workers in her department. She randomly assigned workers into two Andrea was interested in the effect of vacation time on productivity of the workers in her department. She randomly assigned workers into two groups, she allowed one group to go on vacation while the other group had no vacation. After the vacation she measured productivity for the two groups. This is an example of a true experiment. Dependent variable is always quantitative If “true” experiment (randomly assigned to groups) we can conclude that vacation had an effect - it increased productivity In t-test, independent variable is qualitative (with two groups) If “quasi” experiment (not randomly assigned to groups), we can conclude only that data suggest that vacation may have had an effect; productivity increased for those who went on vacation, but we can’t rule out other explanations.

p < 0.05 is most common value Study Type 2: t-test analysis Single Independent Variable (categorical) comparing two groups Single Dependent Variable (numerical/continuous) Comparing two means (2 bars on graph) Used to test the effect of the IV on the DV Please note: a t-test allows us to compare two means If the means are statistically different - we say that there is “real” difference that is not just due to chance - we say there is a statistically significant difference p < 0.05 p < 0.05 is most common value – the “p value” can vary (p < 0.01, or p < 0.001)

Comparing more than two means Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Comparing more than two means

Study Type 3: One-way ANOVA Single Independent Variable comparing more than two groups Single Dependent Variable (numerical/continuous) Used to test the effect of the IV on the DV Ian was interested in the effect of incentives for girl scouts on the number of cookies sold. He randomly assigned girl scouts into one of three groups. The three groups were given one of three incentives and looked to see who sold more cookies. The 3 incentives were 1) Trip to Hawaii, 2) New Bike or 3) Nothing. This is an example of a true experiment How could we make this a quasi-experiment? Independent Variable: Type of incentive Levels of Independent Variable: None, Bike, Trip to Hawaii Dependent Variable: Number of cookies sold Levels of Dependent Variable: 1, 2, 3 up to max sold Between participant design Causal relationship: Incentive had an effect – it increased sales

Study Type 3: One-way ANOVA Single Independent Variable comparing more than two groups Single Dependent Variable (numerical/continuous) Used to test the effect of the IV on the DV Ian was interested in the effect of incentives for girl scouts on the number of cookies sold. He randomly assigned girl scouts into one of three groups. The three groups were given one of three incentives and looked to see who sold more cookies. The 3 incentives were 1) Trip to Hawaii, 2) New Bike or 3) Nothing. This is an example of a true experiment Dependent variable is always quantitative Sales Sales None New Bike Trip Hawaii None New Bike Trip Hawaii In an ANOVA, independent variable is qualitative (& more than two groups)

Comparing two independent variables Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Comparing two independent variables Each one has multiple levels

Study Type 4: Two-way ANOVA “Two-way” = “Two IVs” Study Type 4: Two-way ANOVA Ian was interested in the effect of incentives (and age) for girl scouts on the number of cookies sold. He randomly assigned girl scouts into one of three groups. The three groups were given one of three incentives and he looked to see who sold more cookies. The 3 incentives were: 1) Trip to Hawaii, 2) New Bike or 3) Nothing. He also measured the scouts’ ages. Independent Variable #1 Independent Variable #2 Dependent Variable

Study Type 4: Two-way ANOVA Multiple Independent Variables (categorical), each variable comparing two or more groups Single Dependent Variable (numerical/continuous) Used to test the effect of two IV on the DV Independent Variable #1: Type of incentive Levels of Independent Variable: None, Bike, Trip to Hawaii Independent Variable #2: Age Levels of Independent Variable: Elementary girls versus college Dependent Variable: Number of cookies sold Levels of Dependent Variable: 1, 2, 3 up to max sold Between participant design Results: Incentive had an effect – it increased sales Data suggest age had an effect – older girls sold more

Study Type 4: Two-way ANOVA Two Independent Variables (categorical) Single Dependent Variable (numerical/continuous) Used to test the effect of two IV on the DV Dependent variable is always quantitative College College Elementary Sales Sales Elementary None New Bike Trip Hawaii None New Bike Trip Hawaii In an ANOVA, both independent variables are qualitative (with more than two groups)

Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Study Type 5: Correlation

Study Type 5: Correlation plots relationship between two Pretty much all correlations are “quasi-experimental” Study Type 5: Correlation plots relationship between two continuous / quantitative variables Neutral relative to causality – but especially useful for predictions Relationship between amount of money spent on advertising and amount of money made in sales Dependent variable is always quantitative Dollars spent on Advertising Positive Correlation In correlation, both variables are quantitative Dollars in Sales Describe strength and direction of correlation – in this case positive/strong Graphing correlations use scatterplots (not bar graphs)

Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Study Type 5: Correlation Study Type 6: Simple and Multiple regression

You probably make this much Study Type 6: Regression: Using the correlation to predict the value of one variable based on its relationship with the other variable You probably make this much The predicted variable goes on the “Y” axis and is called the dependent variable. The predictor variable goes on the “X” axis and is called the independent variable Expenses per year Yearly Income If you spend this much

Dustin spends $12 for his own Birthday Angelina Jolie Buys Brad Pitt a $24 million Heart-Shaped Island for his 50th Birthday You probably make this much Expenses per year Yearly Income You probably make this much If you spend this much If you spend this much Dustin spends $12 for his own Birthday

Study Type 6: Regression: Using the correlation to predict the value of one variable based on its relationship with the other variable You probably make this much The predicted variable goes on the “Y” axis and is called the dependent variable. The predictor variable goes on the “X” axis and is called the independent variable Expenses per year Yearly Income You probably make this much Dependent Variable (Predicted) If you spend this much If you save this much Multiple regression will use multiple independent variables to predict the dependent variable Independent Variable 1 (Predictor) If you spend this much Independent Variable 2 (Predictor)

Study Type 1: Confidence Intervals Study Type 2: t-test Study Type 3: One-way Analysis of Variance (ANOVA) Study Type 4: Two-way Analysis of Variance (ANOVA) Study Type 5: Correlation Study Type 6: Simple and Multiple regression Study Type 7: Chi Square

Study Type 7: Chi-squared is used to evaluate whether the differences found in your sample match what you would expect to find. It is used with nominal or ordinal data when we simply count how many participants fall into each category. We are comparing frequencies, not means. or objects or events What is your favorite type of restaurant? (Do university students show the same results as the general population?) What is your political affiliation? (Do the proportions change when the country is at war or otherwise stressed? What is the most popular ride at Disneyland? (Are all the rides at Disneyland equally popular?) Do more children, teens or adults play video games? What is the most popular ride at Disneyland? Just count how many people ride each one. a. Dumbo b. Small World c. Space Mountain d. Splash Mountain We could gather this data using clickers

Study Type 5: Correlation Study Type 1: Confidence Intervals Connecting intentions of studies with Experimental Methodologies Appropriate statistical analyses Appropriate graphs Study Type 5: Correlation Study Type 1: Confidence Intervals Study Type 6: Regression Study Type 2: t-test Study Type 3: One-way ANOVA Study Type 7: Chi-squared Study Type 4: Two-way ANOVA Remember when p < 0.05 we say: - results are statistically significant there is “real” difference (not just due to chance)

What type of analysis is this? Marietta is a manager of a movie theater. She wanted to know whether there is a difference in concession sales for afternoon (matinee) movies vs. evening movies. She took a random sample of 25 purchases from the matinee movie (mean of $7.50) and 25 purchases from the evening show (mean of $10.50). She compared these two means. This is an example of a _____. a. correlation b. t-test c. one-way ANOVA d. two-way ANOVA t-test Let’s try another one Let’s try one This is an example of a a. between participant design b. within participant design c. mixed participant design Between

What type of analysis is this? Marietta is a manager of a movie theater. She wanted to know whether there is a difference in concession sales for people of all ages. She simply measured their age and how much they spent on treats. This is an example of a _____. a. correlation b. t-test c. one-way ANOVA d. two-way ANOVA Correlation Let’s try one

What type of analysis is this? Marietta is a manager of a movie theater. She wanted to know whether there is a difference in concession sales for afternoon (matinee) movies and evening movies. She took a random sample of 25 purchases from the matinee movie (mean of $7.50) and 25 purchases from the evening show (mean of $10.50). Which of the following would be the appropriate graph for these data Matinee Evening Concession purchase a. c. Concession purchase Movie Times Two means t-test Movie Times Concession purchase d. Movie Time Concession b. Let’s try one

What type of analysis is this? Gabriella is a manager of a movie theater. She wanted to know whether there is a difference in concession sales between teenage couples and middle-aged couples. She also wanted to know whether time of day makes a difference (matinee versus evening shows). She gathered the data for a sample of 25 purchases from each pairing. a. correlation b. t-test c. one-way ANOVA d. two-way ANOVA Two-way ANOVA What are the two IV? What are the levels of each? Let’s try one

What type of analysis is this? Gabriella is a manager of a movie theater. She wanted to know whether there is a difference in concession sales between teenage couples and middle-aged couples. She also wanted to know whether time of day makes a difference (matinee versus evening shows). She gathered the means for a sample of 25 purchases from each pairing. Matinee Older couples Evening Teenagers Concession purchase a. c. Concession purchase Matinee Older couples Evening Teenagers Movie Times Concession purchase d. Older couples Teenagers Movie Time Old / young b. Matinee Evening Four means Let’s try one

What type of analysis is this? Pharmaceutical firm tested whether fish-oil capsules taken daily decrease cholesterol. They measured cholesterol levels for 30 male subjects and then had them take the fish-oil daily for 2 months and tested their cholesterol levels again. Then they compared the mean cholesterol before and after taking the capsules. This is an example of a _____. a. correlation b. t-test c. one-way ANOVA d. two-way ANOVA t-test Let’s try another one Let’s try one This is an example of a a. between participant design b. within participant design c. mixed participant design Within

What type of analysis is this? Elaina was interested in the relationship between the grade point average and starting salary. She recorded for GPA and starting salary for 100 students and looked to see if there was a relationship. This is an example of a _____. a. correlation b. t-test c. one-way ANOVA d. two-way ANOVA correlation GPA Starting Salary Relationship between GPA and Starting salary Let’s try one

What type of analysis is this? An automotive firm tested whether driving styles can affect gas efficiency in their cars. They observed 100 drivers and found there were four general driving styles. They recruited a sample of 100 drivers all of whom drove with one of these 4 driving styles. Then they asked all 100 drivers to use the same model car for a month and recorded their gas mileage. Then they compared the mean mpg for each driving style. This is an example of a _____. a. correlation b. t-test c. one-way ANOVA d. two-way ANOVA Let’s try one One-way ANOVA Let’s try another one Between Let’s try another one Quasi-experiment This is an example of a a. between participant design b. within participant design c. mixed participant design This is an example of a a. true experimental design b. quasi-experimental design c. mixed design

Thank you! See you next time!!