Comparing Groups April 6-7, 2017 CS 160 – Section 10.

Slides:



Advertisements
Similar presentations
Statistics Review.
Advertisements

Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Introduction to Summary Statistics
Basic Statistical Concepts
Descriptive Statistics
Data observation and Descriptive Statistics
Central Tendency and Variability
BPT 2423 – STATISTICAL PROCESS CONTROL.  Frequency Distribution  Normal Distribution / Probability  Areas Under The Normal Curve  Application of Normal.
Summarizing Scores With Measures of Central Tendency
Measures of Central Tendency or Measures of Location or Measures of Averages.
Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.
Chapters 1 & 2 Displaying Order; Central Tendency & Variability Thurs. Aug 21, 2014.
Statistics Recording the results from our studies.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Skewness & Kurtosis: Reference
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
Measures of Central Tendency or Measures of Location or Measures of Averages.
Introduction to Statistics Santosh Kumar Director (iCISA)
Chapter Eight: Using Statistics to Answer Questions.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Introduction to statistics I Sophia King Rm. P24 HWB
HL Psychology Internal Assessment
CHAPTER 2: Basic Summary Statistics
Descriptive Statistics(Summary and Variability measures)
HMS 320 Understanding Statistics Part 2. Quantitative Data Numbers of something…. (nominal - categorical Importance of something (ordinal - rankings)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 10 Descriptive Statistics Numbers –One tool for collecting data about communication.
Outline Sampling Measurement Descriptive Statistics:
Statistics Made Simple
PRESENTATION OF DATA.
Confidence Intervals Cont.
Statistical Methods Michael J. Watts
Different Types of Data
Populations.
Doc.RNDr.Iveta Bedáňová, Ph.D.
Statistical Methods Michael J. Watts
Research Methods in Psychology PSY 311
Measures of Central Tendency
Practice Page Practice Page Positive Skew.
Statistics.
Central Tendency and Variability
Description of Data (Summary and Variability measures)
Univariate Descriptive Statistics
Part Three. Data Analysis
Means & Medians Chapter 4.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Descriptive Statistics
Analysis and Interpretation: Exposition of Data
Basic Statistical Terms
Chi Square Two-way Tables
Descriptive and inferential statistics. Confidence interval
Chapter 6 Confidence Intervals.
Summary descriptive statistics: means and standard deviations:
Dispersion How values arrange themselves around the mean
Means & Medians.
Measure of Central Tendency
Statistics Made Simple
Numerical Descriptive Measures
Elementary Statistics: Picturing The World
CHAPTER 2: Basic Summary Statistics
Chapter Nine: Using Statistics to Answer Questions
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
Descriptive Statistics
Numerical Descriptive Measures
Presentation transcript:

Comparing Groups April 6-7, 2017 CS 160 – Section 10

Outline Summary measures Normal distribution Comparison of means Chi-square and its uses Within-subjects and Between subjects study types

Summary Measures Where is the center of these data? Mean Median Mode How spread out are these data? Variance Standard Deviation

Measures of central tendency Refresher Mean = average of all values Median = middle most value when ordered Mode = most frequently occurring value

Which one to use? Mean Median Mode Easy to calculate but distorted by outliers and skewed data Mean Uses less information but is not affected by outliers Median More useful for categorical or ordinal data Mode

Variance Average ‘spread’ or ‘deviation’ of values around the mean of observations Mean 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38

Calculating Variance - Measure difference of each observation from mean - Take square of all these - Add them up - Divide by (total # of obs. – 1)

Standard Deviation (SD) Square root of Variance! Because SD has the same units as that of original observations

Example of SD calculation Take diastolic BP of 4 any people: 80 74 95 96 86.25 Mean

Example of SD calculation Variance (s): Subject BP 1 80 2 95 3 96 4 74 Total 345 Difference from mean -6.25 8.75 9.75 -12.25 Square of difference 39.06 76.56 95.06 150.06 360.75 360.75 4 - 1 120.25 =√120.25 SD: = 10.25 Mean = 86.25

Using SD Determine the degree of dispersion of values around the mean in a given dataset Compare various datasets of a variable Smaller SD  more homogenous data Larger SD  more variability in data Larger SD may be masking different populations Smaller SD Larger SD

Frequency Polygons Note: Hypothetical data, do not quote Same as a histogram but uses a line connecting the tops of all the bars instead of the bars themselves Note: Hypothetical data, do not quote

Bell shaped (unimodal) Normal Distribution Symmetrical Bell shaped (unimodal) SD = σ frequency x μ

Normal Distribution Example Typing speed of young adults SD = ± 10wpm frequency x μ = 100 wpm

Using SD with Normal Distribution 68% frequency 95% 99% x μ μ-3σ μ-2σ μ-σ μ+σ μ+2σ μ+3σ

Comparing means _ μo = 60 wpm x = 100 wpm

Comparing means When population SD is known Z - table p - value

Comparing means When population SD is unknown t - table p - value

Interpreting p-value Mathematically, p-value = area in the tails of the probability distribution curve = Probability that the difference in means is just by chance If p-value < significance level, consider the difference as significant Typically, 0.05 or 0.01 level is used

Comparing means _ μo = 60 wpm x = 100 wpm p = 0.0023 α = 0.05 95%

Comparing means _ μo = 60 wpm x = 100 wpm p = 0.07 α = 0.05 95%

Comparing two means μ1 μ2 Both populations should have normal distribution and equal variance

Chi-square test (For Count data)

Voice Assistance vs. Accidents Has had accidents Uses Voice assistance Yes No Total 76 277 353 153 640 793 229 917 1146

Question Is Voice assistance associated with having had an accident? OR Are accidents more common in drivers using Voice Assistance?

Make Hypotheses… Null Hypothesis: Proportion of accidents among drivers using Voice Assistance is same as in those who do not use it Alternate Hypothesis: Proportion of accidents is not similar in the two groups

How do we do the chi-square test? Make 2 x 2 table of observed frequencies Calculate expected frequencies for each cell assuming no difference in the two groups Calculate difference between observed and expected values using chi- square formula Convert chi-square to p value Compare p-value with significance level chosen for the test

Example: Observed frequencies Has had accidents Uses Voice assistance Yes No Total 76 277 353 153 640 793 229 917 1146

Calculating Expected Frequencies Has had accidents Used Voice Assistance Yes No Total 76 277 353 153 640 793 229 917 1146 α = 0.05 353 Overall proportion of accidents = = 30.8% 1146 793 Proportion of no accidents = = 69.2% 1146

Calculating Expected Frequencies Has had accidents Used Voice Assistance Yes No Total 76 277 353 153 640 793 229 917 1146 Expected accident rate in VA users = 30.8% * 229 = 70.5 Expected non-accident rate in VA users = 69.2% * 229 = 158.5

Calculating Expected Frequencies Has had accidents Used Voice Assistance Yes No Total 76 277 353 153 640 793 229 917 1146 Expected accident rate in VA non-users = 30.8% * 917 = 282.5 Expected non-accident rate in VA non-users = 69.2% * 917 = 634.5

Expected frequencies Expected VA used Accidents Yes No Total 70.5 282.5 353 158.5 634.5 793 229 917 1146

Just for comparison Has had accidents VA users Yes No Total 76 277 353 153 640 793 229 917 1146 Expected VA users Accidents Yes No Total 70.5 282.5 353 158.5 634.5 793 229 917 1146

Calculating differences For our example, χ2 = 0.630

Convert chi-square to p-value Use Chi-distribution table to look up the p-value corresponding to 0.630 OR Use Excel to calculate exact p-value of 0.630 Online calcualtor: http://www.socscistatistics.com/tests/chisquare/D efault2.aspx

Compare p-value to significance level Our Calculated p-value = 0.43 Significance level (α) = 0.05 So p-value > 0.05 There is no significant difference in proportion of accidents among VA users and non-users

Comparing multiple interfaces Within vs Between Subject studies

Within Subjects design Each participant tested for all conditions/interface alternatives For example 4 participants first try Interface A Then the same 4 participants try Interface B While you measure performance in both phases 12:38 A B

Between Subjects design Participants are randomly assigned to groups Each participant tested for only one conditions/interface For example 2 participants first try Interface A And the other 2 participants try Interface B While you measure performance for both groups A 12:38 B

Learning effects Drawback of within-subjects study design Learning or gaining experience by participants due to order of presentation of interfaces E.g. Trying out Interface A might give them experienced that improves their performance on the following Interface B

Counter-balancing (2 levels) A solution for learning is counter-balancing: Showing interfaces in different order to different participants Group 1 A B Group 2 Counter-balancing for two levels (two interfaces)

Counter-balancing (3 levels) Group 1 A B C Group 2 Group 3 Further reading: http://www.yorku.ca/mack/RN-Counterbalancing.html

Comparison of both types Considerations  Between  Within Sample Size needed Large Small Carryover Effects No  Yes Impact on Attitudes Comparative Judgment  Better Study Duration  Shorter Longer Adapted from: https://measuringu.com/between-within/