Lies, damned lies & statistics

Slides:



Advertisements
Similar presentations
Chapter 3 Introduction to Quantitative Research
Advertisements

Chapter 3 Introduction to Quantitative Research
Richard M. Jacobs, OSA, Ph.D.
Statistics. Review of Statistics Levels of Measurement Descriptive and Inferential Statistics.
Basic Statistical Concepts
Statistical Tests Karen H. Hagglund, M.S.
QUANTITATIVE DATA ANALYSIS
Intro to Statistics for the Behavioral Sciences PSYC 1900
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection Getting Ready for Data Collection The Data Collection Process The Data Collection.
Descriptive Statistics
Analysis of Research Data
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Introduction to Educational Statistics
Edpsy 511 Homework 1: Due 2/6.
Social Research Methods
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Measures of Central Tendency
Today: Central Tendency & Dispersion
Understanding Research Results
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Objective To understand measures of central tendency and use them to analyze data.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Chapter 3 Statistical Concepts.
Statistics and Research methods Wiskunde voor HMI Betsy van Dijk.
بسم الله الرّحمن الرّحيم
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Chapter Eleven A Primer for Descriptive Statistics.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
QUANTITATIVE RESEARCH AND BASIC STATISTICS. TODAYS AGENDA Progress, challenges and support needed Response to TAP Check-in, Warm-up responses and TAP.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
EDPSY Chp. 2: Measurement and Statistical Notation.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Displaying Distributions with Graphs. the science of collecting, analyzing, and drawing conclusions from data.
Chapter Eight: Using Statistics to Answer Questions.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Data Analysis.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Chapter 10: Describing the Data Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
STATS DAY First a few review questions. Which of the following correlation coefficients would a statistician know, at first glance, is a mistake? A. 0.0.
Chapter 12 Understanding Research Results: Description and Correlation
Measurements Statistics
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Description of Data (Summary and Variability measures)
STATS DAY First a few review questions.
Module 8 Statistical Reasoning in Everyday Life
Introduction to Statistics
Basic Statistical Terms
15.1 The Role of Statistics in the Research Process
Chapter Nine: Using Statistics to Answer Questions
Descriptive Statistics
Biostatistics Lecture (2).
Presentation transcript:

Lies, damned lies & statistics Communication Research week 10

Basics of descriptive statistics Statisticians use mathematical methods to analyse, summarise and interpret data that have been collected Descriptive statistics describe the basic features of the study and allows the researcher to get a feel for the data The choice of statistical method of analysis depends on the data that have to be analysed Communication Research Spring 2005

Descriptive vs inferential statistics Descriptive statistics refer to methods used to obtain, from raw data, information that characterises or summarises the whole set of data Inferential statistics allow us to generalise from the data collected to the general population they were taken from Communication Research Spring 2005

Communication Research Spring 2005

Different statistical measures Raw data is unorganised but can be tabulated to make it easier to understand and to interpret It is usually presented as a frequency table or graph A frequency chart will allow a researcher to see trends or groupings of data and how they are distributed Communication Research Spring 2005

Some Basic Concepts Related to Statistics Data: The raw material of statistics. Numbers that result from measurements or counting. Statistics: The field of study concerned with the collection, organization, summarization and analysis of data and the drawing of inferences about a body of data when only a part of the data is observed. Sources of Data Routinely kept records Surveys Experiments External Sources Communication Research Spring 2005

Communication Research Spring 2005 Some Basic Concepts Related to Statistics Random Variable: A variable whose values arise as a result of chance factors and cannot be exactly predicted in advance. Population: A population of entities is defined as the largest collection of entities for which we have an interest at a particular time. Sample: A part of a population. Communication Research Spring 2005

Communication Research Spring 2005 The Simple Random Sample Statistical Inference The procedure by which we reach a conclusion about a population on the basis of the information contained in a sample that has been drawn from that population. Simple random sample If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected, the sample is called a simple random sample. 2 out of 4=2C4=6 Communication Research Spring 2005

Characteristics of each distribution Location – where on the axis is the distribution positioned? Dispersion – how broad is the distribution? Shape – what is the form (appearance, pattern) of the distribution? The type of data you have to analyse will determine the statistical measure chosen Statistics describing the location of the distribution are called measures of central tendency Communication Research Spring 2005

Measures of central tendency – the mean The mean is the sum of all observed data values divided by the sample size (the arithmetic average) Describing data that are interval or ratio in nature (eg speed of response, age in years) calls for the mean One of the main disadvantages is that it is most profoundly affected by extreme scores Communication Research Spring 2005

Communication Research Spring 2005 Calculating a Mean Score Scores: 79 81 82 86 88 91 93 95 97 total = 878 Divide by n = 10 scores Mean = 87.8 Communication Research Spring 2005

Measures of central tendency – the median The median is the score or the point of distribution above which one half of the scores lie eg in a simple set of scores such as 1, 3, & 5 the median is 3 The median is best suited to data that are ordinal or ranked ( eg birth order, rank in class) To compute the median Order the scores from lowest to highest Count the number of scores Select the middle score When the number of scores is even, find the mean of the two middle scores eg 31 33 35 38 40 41 42 43 44 46 47 48 49 50 N = 14 (no of scores); Median = (42 + 43) ÷ 2 = 42.5 Communication Research Spring 2005

Two distributions of scores Distribution 1 Distribution 2 16 19 22 25 28 30 35 Mean = 25 Range = 20 24 25 26 Mean = 25 Range = 3 Communication Research Spring 2005

Measures of central tendency – the mode The mode is the most frequently observed value in the frequency distribution ie it is the score that occurs most frequently The mode is best used for nominal data and for data that are qualitative in nature such as gender, eye colour, ethnicity, school or group membership In the following list of numbers: 58 27 24 41 27 26 41 53 24 29 41 53 47 28 56 The mode is 41 because it occurs 3 times A common mistake is to identify the mode as how frequently the value occurs (3) not the value itself (41) Communication Research Spring 2005

Which measure of central tendency? Which measure when? Which measure of central tendency? Measure Level of measurement Examples Mode Nominal or categorical – ie qualitative Gender, hair or eye colour, group membership, ethnicity, school etc Median Ordinal or ranked Rank in class, birth order Mean Interval and ratio Speed of response, age in years Communication Research Spring 2005

Three Measures of Variability Range: the difference between the highest and lowest scores in a distribution of scores. Variance: a measure of dispersion indicating the degree to which scores cluster around the mean score. Standard deviation: index of the amount of variation in a distribution of scores. Communication Research Spring 2005

Communication Research Spring 2005 Standard deviation SD is a measure of the variability indicating the degree to which all observed values deviate from the mean SD can only be used for interval and ratio data It is the most frequently used statistic as a measure of dispersion or variability The larger the SD, the more variable the set of scores is Communication Research Spring 2005

COMPUTING DEVIATION SCORES Raw Mean DEV. SQUARED score score deviation score 4 - 10 = -6 36 8 - 10 = -2 4 9 - 10 = -1 1 10 - 10 = 0 0 10 - 10 = 0 0 10 - 10 = 0 0 12 - 10 = 2 4 13 - 10 = 3 9 14 - 10 = 4 16 90/9 = 10.00 = MEAN 70/9 = 7.77 = Variance STANDARD DEVIATION: (Square Root of Variance) = 2.79 Communication Research Spring 2005

Communication Research Spring 2005 Types of Variables Variable Element that is identified in the hypothesis or research question Property or characteristic of people or things that varies in quality or magnitude Must have two or more levels Must be identified as independent or dependent Communication Research Spring 2005

Independent Variables Manipulation or variation of this variable is the cause of change in other variables Technically, independent variable is the term reserved for experimental studies Also called antecedent variable, experimental variable, treatment variable, causal variable, predictor variable Communication Research Spring 2005

Communication Research Spring 2005 Dependent Variables The variable of primary interest Research question/hypothesis describes, explains, or predicts changes in it The variable that is influenced or changed by the independent variable In non-experimental research, also called criterion variable, outcome variable Communication Research Spring 2005

Relationship Between Independent and Dependent Variables Cannot specify independent variables without specifying dependent variables Number of independent and dependent variables depends on the nature and complexity of the study The number and type of variables dictates which statistical test will be used Communication Research Spring 2005

Issues of Reliability and Validity Reliability = consistency in procedures and in reactions of participants Validity = truth - Does it measure what it intended to measure? When reliability and validity are achieved, data are free from systematic errors Communication Research Spring 2005

Threats to Reliability and Validity If measuring device cannot make fine distinctions If measuring device cannot capture people/things that differ When attempting to measure something irrelevant or unknown to respondent Can measuring device really capture the phenomenon? Communication Research Spring 2005

Other Sources of Variation Variation must represent true differences Other sources of variation Factors not measured Personal factors Differences in situational factors Differences in research administration Number of items measured Unclear measuring device Mechanical or procedural issues Statistical processing of data Communication Research Spring 2005

Communication Research Spring 2005 Types of variables Communication Research Spring 2005

Communication Research Spring 2005 Definitions Variable: a characteristic that changes or varies over time and/or different subjects under consideration. Changing over time Blood pressure, height, weight Changing across a population gender, race/ethnicity Communication Research Spring 2005

Communication Research Spring 2005 Definitions (con’t) Quantitative variables (numeric): measure a numerical quantity of amount on each experimental unit Qualitative variables (categorical): measure a non numeric quality or characteristic on each experimental unity by classifying each subject into a category Communication Research Spring 2005

Categorical variables Nominal: unordered categories Race/ethnicity Gender Ordinal: ordered categories likert scales( disagree, neutral, agree ) Income categories Communication Research Spring 2005

Univariate statistics (numerical variables) Summary measures Measures of location Measures of spread Overall pattern (distribution) Unimodal (one major peak) vs. bimodal) (2 peaks) Symmetric vs. skewed Outliers-an individual value that falls outside the overall pattern Communication Research Spring 2005

Communication Research Spring 2005 Skewness The skewness of a distribution is measured by comparing the relative positions of the mean, median and mode. Distribution is symmetrical Mean = Median = Mode Distribution skewed right Median lies between mode and mean, and mode is less than mean Distribution skewed left Median lies between mode and mean, and mode is greater than mean Communication Research Spring 2005

Communication Research Spring 2005 Relative positions of the mean and median for (a) right-skewed, (b) symmetric, and (c) left-skewed distributions Note: The mean assumes that the data is normally distributed. If this is not the case it is better to report the median as the measure of location. Communication Research Spring 2005

Summary statistics Measures of spread (scale) Variance: The average of the squared deviations of each sample value from the sample mean, except that instead of dividing the sum of the squared deviations by the sample size N, the sum is divided by N-1. Standard deviation: The square root of the sample variance Range: the difference between the maximum and minimum values in the sample. Communication Research Spring 2005

Normal curves same mean but different standard deviation Communication Research Spring 2005

Graphical display of numerical variables (histogram) Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1 Communication Research Spring 2005 18

Graphical display of numerical variables (stem and leaf plot) Raw Data Stem Leaf 2 3 4 5 6 7 8 9 3 9 7 9 5 6 9 0 7 7 8 8 0 2 4 5 5 6 7 7 8 9 1 1 2 3 3 6 8 9 1 1 2 4 7 86 76 23 77 81 79 68 92 59 75 83 49 91 47 72 82 74 70 56 60 88 97 39 78 94 55 67 89 Communication Research Spring 2005 28

Graphical display of numerical variables (box plot) Negatively Skewed Positively Symmetric (Not Skewed) S < 0 S = 0 S > 0 Communication Research Spring 2005 54

Univariate statistics (categorical variables) Summary measures Count=frequency Percent=frequency/total sample The distribution of a categorical variable lists the categories and gives either a count or a percent of individuals who fall in each category Communication Research Spring 2005

Communication Research Spring 2005 Displaying categorical variables Rank Cause of Death Frequency (%) 1 Heart Disease 710,760 (43%) 2 Cancer 553,091 (33%) 3 Stroke 167,661 (11%) 4 CLRD 122,009 ( 7%) 5 Accidents 97,900 ( 6%) Total All five causes 1,651,421 Communication Research Spring 2005

Communication Research Spring 2005 Common Applications T-Tests – the independent t-test is used to test for a difference between two independent groups (like males and females) on the means of a continuous variablecontinuous variable. one sample – compare a group to a known value For example, comparing the IQ of convicted felons to the known average of 100) paired samples – compare one group at two points in time For example, comparing pretest and posttest scores independent samples – compare two groups to each other Communication Research Spring 2005

Communication Research Spring 2005 Common Applications The Pearson's correlation is used to find a correlation between at least two continuous variables. The value for a Pearson's can fall between 0.00 (no correlation) and 1.00 (perfect correlation). Other factors such as group size will determine if the correlation is significant. Generally, correlations above 0.80 are considered pretty high Communication Research Spring 2005

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Days Absent Non-significant t-test 60 50 40 30 20 10 Common Applications Male Female Number of people Communication Research Spring 2005

Communication Research Spring 2005 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Days Absent Significant t-test 60 50 40 30 20 10 Common Applications Male Female Number of people Communication Research Spring 2005