Analysis and Interpretation: Exposition of Data

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
QUANTITATIVE DATA ANALYSIS
Chapter Seventeen HYPOTHESIS TESTING
Topic 2: Statistical Concepts and Market Returns
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Richard M. Jacobs, OSA, Ph.D.
Inferential Statistics
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
AM Recitation 2/10/11.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
CHAPTER 1 Basic Statistics Statistics in Engineering
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Chapter 15 Data Analysis: Testing for Significant Differences.
Analyzing and Interpreting Quantitative Data
Descriptive Statistics
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Research Ethics:. Ethics in psychological research: History of Ethics and Research – WWII, Nuremberg, UN, Human and Animal rights Today - Tri-Council.
Descriptive & Inferential Statistics Adopted from ;Merryellen Towey Schulz, Ph.D. College of Saint Mary EDU 496.
Chapter Eight: Using Statistics to Answer Questions.
Data Analysis.
Chapter 6: Analyzing and Interpreting Quantitative Data
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
LIS 570 Summarising and presenting data - Univariate analysis.
HL Psychology Internal Assessment
Chapter 13 Understanding research results: statistical inference.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.
Data analysis and basic statistics KSU Fellowship in Clinical Pathology Clinical Biochemistry Unit
Outline Sampling Measurement Descriptive Statistics:
Statistics & Evidence-Based Practice
Exploratory Data Analysis
Logic of Hypothesis Testing
Chapter 12 Understanding Research Results: Description and Correlation
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Measurements Statistics
Different Types of Data
Dependent-Samples t-Test
INF397C Introduction to Research in Information Studies Spring, Day 12
Inference and Tests of Hypotheses
Comparing Groups April 6-7, 2017 CS 160 – Section 10.
Statistics.
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Analyzing and Interpreting Quantitative Data
Chapter 5 STATISTICS (PART 1).
AP Biology Intro to Statistics
Applied Statistical Analysis
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Analysis and Interpretation: Exposition of Data
Module 8 Statistical Reasoning in Everyday Life
Introduction to Statistics
Basic Statistical Terms
Summary descriptive statistics: means and standard deviations:
Data analysis and basic statistics
15.1 The Role of Statistics in the Research Process
Chapter Nine: Using Statistics to Answer Questions
Descriptive Statistics
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

Analysis and Interpretation: Exposition of Data Chapter 13 Analysis and Interpretation: Exposition of Data

Descriptive Statistics: Why Summarize Data? A clear presentation of the data is necessary because it allows the reader to critically evaluate the data you are reporting Two common reasons we summarize data To clarify what patterns were observed in a data set at a glance, and to be concise Before we summarize data, LOOK AT the data to identify possible omissions, errors, or other anomalies Advantage: Make sure all errors have been removed before further analysis is conducted

Descriptive Statistics: Why Summarize Data? Two common ways to analyze a data set Describe it Make decisions about how to interpret it Descriptive statistics – Procedures used to summarize, organize, and make sense of a set of scores or observations, typically presented graphically, in tabular form (in tables), or as summary statistics (single values) Includes the mean, median, mode, variance, and frequencies Can use only with quantitative, not qualitative, data

Dealing with Outliers: To Drop or Not to Drop? There is an argument to be made regarding retaining outliers. By lieu of simply existing, they are representative of the population from which they are drawn. However…

Frequencies and Distributions: Tables and Graphs Frequency – A value that describes the number of times or how often a category, score, or range of scores occurs Tables and graphs of frequency data can make the presentation and interpretation of data clearer

Frequencies and Distributions: Tables and Graphs Frequency Distribution Table – A tabular summary display for a distribution of data organized or summarized in terms of how often a category, score, or range of scores occurs

Frequencies and Distributions: Tables and Graphs Frequency Distribution Graphs Histogram: Graphical display used to summarize the frequency of continuous data that are distributed in numeric intervals using connected bars Discrete data Bar chart, or bar graph Pie chart

Measures of Central Tendency Central tendency – Statistical measures for locating a single score that tends to be near the center of a distribution and is more representative or descriptive of all scores in a distribution Although we lose some meaning anytime we reduce a set of data to a single score, statistical measures of central tendency ensure that the single score meaningfully represents a data set Three measures of central tendency: The mean The median The mode

Measures of Central Tendency The mean (M)– The sum of all scores (Σx) divided by the number of scores summed (n) in a sample, or in a subset of scores selected from a larger population Used to describe data that are normally distributed and measures on an interval or ratio scale Normal distribution: A theoretical distribution with data that are symmetrically distributed around the mean, median, and mode

Measures of Central Tendency The median – The middle value in a distribution of data listed in numeric order Used to describe data that have a skewed distribution and measures on an ordinal scale Skewed distribution: A distribution of scores that includes outliers or scores that fall substantially above or below most other scores in a data set The mode – The value in a data set that occurs most often or more frequently Can be used to describe data in any distribution, so long as one or more scores occur most often The mode is rarely used as the sole way to describe data Typically used to describe data in a modal distribution and measures on a nominal scale

Measures of Central Tendency

Measures of Variability Measures of central tendency inform us only of scores that tend to be near the center of a distribution, but do not inform us of all other scores in a distribution The most common procedure for locating all other scores is to identify the mean and then compute the variability of scores from the mean Variability: A measure of the dispersion or spread of scores in a distribution and ranges from 0 to +∞ Variability can never be negative Two key measures of variability are: Variance Standard deviation

Measures of Variability The variance Sample variance (s2): A measure of variability for the average squared distance that scores in a sample deviate from the sample mean A deviation is a measure of distance Sum of squares (SS): The sum of the squared deviations of scores from the mean and is the value placed in the numerator of the sample variance formula SS = Σ (x – M)2 To find the average squared distance of scores from the mean, divide by the number of scores However, dividing by the number of scores or sample size, will underestimate the variance of scores in a population The solution is to divide by one less than the number of scores or deviations summed Degrees of freedom (df) for sample variance: One less than the sample size, or n – 1 df = n – 1

Measures of Variability An advantage of the sample variance is that its interpretation is clear: The larger the sample variance, the farther that scores deviate from the mean on average One limitation is that the average distance of scores from the mean is squared To find the distance, and not the squared distance, of scores from the mean we need a new measure of variability called the standard deviation

Measures of Variability The standard deviation Sample standard deviation (SD): Measure of variability for the average distance that scores in a sample deviate from the sample mean and is computed by taking the square root of the sample variance The SD is most informative for the normal distribution For a normal distribution, over 99% of all scores will fall within three SDs of the mean

Measures of Variability Empirical rule: A rule for normally distributed data that states that at least 99.7% of data fall within three SDs of the mean; at least 95% of data fall within two SDs of the mean; at least 68% of data fall within one SD of the mean

Graphing Means and Correlations Graphing group means We can graph a mean for one or more groups using a graph with lines or bars to represent the means A bar graph is used when the groups on the x-axis are represented on a nominal or ordinal scale A line graph is used when the groups on the x-axis are represented on an interval or ratio scale

Graphing Means and Correlations Graphing correlations Scatterplot: A graphical display of discrete data points (x, y) used to summarize the relationship between two variables

Ethics in Focus: Deception Due to the Distortion of Data Presenting data can be an ethical concern when the data are distorted in any way When a graph is distorted, it can deceive the reader into thinking differences exist, when in truth differences are negligible (Frankford-Nachmias & Leon-Guerrero, 2006) Three common distortions to look for in graphs are: Displays with an unlabeled axis Displays with one axis altered in relation to the other axis Displays in which the vertical axis (y-axis) does not begin with 0

Ethics in Focus: Deception Due to the Distortion of Data Distortion can occur when presenting summary statistics Two common distortions to look for with summary statistics: When data are omitted When differences are described in a way that gives the impression of larger differences than really are meaningful in the data Some data should naturally be reported together Means and SDs should be reported together; correlations and proportions should be reported with sample size; standard error should be reported anytime data are recorded in a sample

Analysis and Interpretation: Making Decisions about Data Chapter 14 Analysis and Interpretation: Making Decisions about Data

Inferential Statistics: What Are We Making Inferences About? Inferential statistics – Procedures that allow researchers to infer or generalize observations made with samples to the larger population from which they were selected Allows us to use data measured in a sample to draw conclusions about the larger population of interest, which would not otherwise be possible

Inferential Statistics: What Are We Making Inferences About? Null hypothesis significance testing (NHST) Inferential statistics include a diverse set of tests of statistical significant more formally known as NHST To use NHST, we begin by stating a null hypothesis Null hypothesis, stated as the null: A statement about a population parameter, such as the population mean, that is assumed to be true, but contradicts the research hypothesis After we state a null hypothesis, then we set an alpha level and subsequently determine the critical value of our test statistic with which we will decide to retain or reject the null hypothesis

Inferential Statistics: What Are We Making Inferences About? Level of significance, or significance level (alpha level): A criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis. The level of significance for most studies in the behavioral sciences is .05 or 5% When the likelihood of obtaining a sample outcome is less than 5% if the null hypothesis were true, we reject the null hypothesis When the likelihood of obtaining a sample outcome is greater than 5% if the null hypothesis were true, we retain the null hypothesis

Inferential Statistics: What Are We Making Inferences About? To determine the likelihood or probability of obtaining a sample outcome, if the value stated in the null hypothesis is true, we compute a test statistic Test statistic: A mathematical formula that allows researchers to determine the likelihood of obtaining sample outcomes if the null hypothesis were true The value of the test statistic can be used to make a decision regarding the null hypothesis Examples of test statistics include those already introduced, such as the correlation coefficient (Chapter 8), the t tests (Chapters 5, 10, and 11), and the analysis of variance tests (Chapters 10-12) Used to find the p value

Inferential Statistics: What Are We Making Inferences About? p value: Probability of obtaining a sample outcome if the value stated in the null hypothesis were true. The p is compared to the level of significance to make a decision about a null hypothesis The p value is interpreted as error When p > .05, we retain the null hypothesis and state that an effect or difference failed to reach significance When p < .05, we reject the null hypothesis and state that an effect or difference reached significance Significance, or statistical significance: Describes a decision made concerning a value stated in the null hypothesis

Types of Error and Power Anytime we select a sample from a population, there is some probability of sampling error inasmuch as p is some value greater than 0 There are four decision alternatives regarding the truth and falsity of the decision we make about a null hypothesis

Parametric Testing: Applying the Decision Tree Parametric tests – Significance tests that are used to test hypotheses about parameters in a population in which the data in the population are normally distributed and measured on an interval or ratio scale of measurement

Nonparametric Tests: Applying the Decision Tree Nonparametric tests – Significance tests that are used to test hypotheses about data that can have any type of distribution and to analyze data on a nominal or ordinal scale of measurement Often called distribution-free tests because the shape of the distribution in the population can be any shape The reason that the variance and shape of a distribution in the population does not matter is that a test statistic for nonparametric tests will not measure variance to determine significance

Nonparametric Tests: Applying the Decision Tree Tests for ordinal data Choosing an appropriate nonparametric test largely depends on how participants were observed (between- or within-subjects) and the number of groups in the design We can require the use of nonparametric tests in two common situations: 1. Data may be on an interval or ratio scale, but are not normally distributed. In these situations, we convert the data to ranks (ordinal) and use the nonparametric alternative test to analyze the data 2. Record ranked data in which case the variability of ranks (ordinal) is not meaningful and so a nonparametric test is required These tests can be readily computed in SPSS

Nonparametric Tests: Applying the Decision Tree

Nonparametric Tests: Applying the Decision Tree Tests for nominal (categorical) data Nonparametric tests can be used to analyze nominal data Count the frequency of occurrence in two or more categories for one or two factors Variance is meaningless Chi-square goodness-of-fit test: Statistical procedure used to determine whether observed frequencies at each level of one categorical variable are similar to or different from frequencies expected Chi-square test for independence: Statistical procedure used to determine whether frequencies observed at the combination of levels of two categorical variables are similar to or different from frequencies expected

Nonparametric Tests: Applying the Decision Tree

Effect Size: How Big Is an Effect in the Population? Effect – A mean difference or discrepancy between what was observed in a sample and what was expected to be observed in the population The decision using NHST only indicates if an effect exists but does not inform us of the size of that effect in the population Effect size – A statistical measure of the size or magnitude of an observed effect in a population, which allows researchers to describe how far scores shifted in a population, or the percent of variance in a DV that can be explained by the levels of a factor

Effect Size: How Big Is an Effect in the Population?

Effect Size: How Big Is an Effect in the Population? To interpret, we need to identify the factor with the smaller degrees of freedom, dfsmaller The df for a factor are the number of categories for a factor, minus one

Estimation: What Are the Possible Values of a Parameter? As an alternative to NHST, we can learn more about a parameter without ever stating a null hypothesis This approach requires only that we set limits for the possible values of a population parameter within which it is likely to be contained Estimation: A statistical procedure in which a sample statistic is used to estimate the value of an unknown population parameter. Two types of estimation are point estimation and interval estimation Point estimation: A sample statistic (e.g., a sample mean) that is used to estimate a population parameter (e.g., a population mean)

Estimation: What Are the Possible Values of a Parameter? Interval estimation, called the confidence interval (CI): The interval or range of possible values within which an unknown population parameter is likely to be contained Level of confidence: The probability or likelihood that an interval estimate will contain the value of an unknown population parameter (e.g., a population mean) Typical levels of confidence are stated at 95%, or 99% in behavioral research Confidence limits: The upper and lower boundaries of a CI given within a specified level of confidence Interval estimates are reported as a point estimate ± interval estimate E.g., You may read that 53% ± 3% of Americans believe that evolution is true. The ± 3%, the margin or error, is added and subtracted from the point estimate to find the confidence limits of an interval estimate

Confidence Intervals, Significance, and Effect Size We can use the information conveyed by a CI to determine the significance of an outcome We can compare a CI with the decision for a significance test 1. If the null hypothesis were inside a CI, the decision would have been to retain the null hypothesis (not significant) 2. If the null hypothesis were outside the a CI, the decision would have been to reject the null hypothesis (significant)

“I’ve come to think that the most fundamental problem with p-values is that no one can really say what they are.” - FiveThirtyEight http://fivethirtyeight.com/features/not-even-scientists-can-easily- explain-p-values/ Problems with p-values 1. Easy to misrepresent/commit fraud by “data snooping” 2. Easy to commit errors with multiple comparisons 3. Significant difference between two samples doesn’t necessarily reflect the same difference in the parent populations Confounded by n “absence of evidence” vs. ”evidence of absence” A significant difference may be useless without effect size Some null hypotheses are meaningless/impractical (Inter-rater reliability)