Statistical Analysis How do we make sense of the data we collect during a study or an experiment?

Slides:



Advertisements
Similar presentations
Psychology: A Modular Approach to Mind and Behavior, Tenth Edition, Dennis Coon Appendix Appendix: Behavioral Statistics.
Advertisements

Table of Contents Exit Appendix Behavioral Statistics.
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Introduction to Summary Statistics
BHS Methods in Behavioral Sciences I April 18, 2003 Chapter 4 (Ray) – Descriptive Statistics.
Calculating & Reporting Healthcare Statistics
 There are times when an experiment cannot be carried out, but researchers would like to understand possible relationships in the data. Data is collected.
Measures of Central Tendency
Describing distributions with numbers
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
What is statistics? STATISTICS BOOT CAMP Study of the collection, organization, analysis, and interpretation of data Help us see what the unaided eye misses.
Statistical Analysis How do we make sense of the data we collect during a study or an experiment?
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Copyright © Allyn & Bacon 2007 Chapter 2: Research Methods.
Statistical Reasoning. Descriptive Statistics are used to organize and summarize data in a meaningful way. Frequency distributions – Where are the majority.
Statistics Recording the results from our studies.
Psychology’s Statistics Statistical Methods. Statistics  The overall purpose of statistics is to make to organize and make data more meaningful.  Ex.
Descriptive Statistics Descriptive Statistics describe a set of data.
Statistics Recording the results from our studies. Must use a common language so we all know what we are talking about.
Statistics: For what, for who? Basics: Mean, Median, Mode.
Statistical Reasoning. A. Describing data 1. Frequency distributions – Where are the majority of the scores?
Nature of Science Science Nature of Science Scientific methods Formulation of a hypothesis Formulation of a hypothesis Survey literature/Archives.
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
Measures of Variability Variability: describes the spread or dispersion of scores for a set of data.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Statistics Recording the results from our studies. Must use a common language so we all know what we are talking about.
Descriptive Statistics Descriptive Statistics describe a set of data.
Case Studies A detailed picture of one or a few subjects. Tells us a great story…but is just descriptive research. Does not even give us correlation data.
A way to organize data so that it has meaning!.  Descriptive - Allow us to make observations about the sample. Cannot make conclusions.  Inferential.
Statistical Analysis IB Topic 1. Why study statistics?  Scientists use the scientific method when designing experiments  Observations and experiments.
STATISTICS. What is the difference between descriptive and inferential statistics? Descriptive Statistics: Describe data Help us organize bits of data.
Statistical Reasoning “He told me I was average. I told him he was mean.”
Six topics in Statistics. Topic 1: Frequency Distributions Putting scores in order adds meaning Bar graphs (histograms) are visual representations of.
Chapter Eight: Using Statistics to Answer Questions.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Data Analysis.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
Statistics. Descriptive Statistics Organize & summarize data (ex: central tendency & variability.
STATISTICS STATISTICS Numerical data. How Do We Make Sense of the Data? descriptively Researchers use statistics for two major purposes: (1) descriptively.
Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.
A way to organize data so that it has meaning!.  Descriptive - Allow us to make observations about the sample. Cannot make conclusions.  Inferential.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistical Analysis IB Topic 1. IB assessment statements:  By the end of this topic, I can …: 1. State that error bars are a graphical representation.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
STATS DAY First a few review questions. Which of the following correlation coefficients would a statistician know, at first glance, is a mistake? A. 0.0.
Outline Sampling Measurement Descriptive Statistics:
Different Types of Data
Statistics.
Statistics in AP Psychology
Descriptive Statistics
Survey Method Most common type of study in psychology
Correlational Method Correlation expresses a relationship between two variable. Does not show causation. As more ice cream is eaten, more people are murdered.
Statistical Analysis How do we make sense of the data we collect during a study or an experiment?
Descriptive and Inferential Statistics
Experimental Method Looking to prove causal relationships.
STATS DAY First a few review questions.
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
Module 8 Statistical Reasoning in Everyday Life
Operational Definitions
Chapter Nine: Using Statistics to Answer Questions
Advanced Algebra Unit 1 Vocabulary
Descriptive Statistics
Presentation transcript:

Statistical Analysis How do we make sense of the data we collect during a study or an experiment?

Two Kinds of Statistical Analysis Descriptive Statistics Organize and summarize data Inferential Statistics Interpret data and draw conclusions Used to test validity of hypothesis

Descriptive Statistics Numbers that summarize a set of research data obtained from a sample Organized into a frequency distribution (orderly arrangement of scores) Can be pictured as a histogram (bar graph) Can be pictured as a frequency polygon (line graph that replaces the bars with single points and connects the points with a line)

Measures of Central Tendency Describe the average or most typical scores for a set of research data  Mode – the most frequently occurring score (least used measure of C.T.) Bimodal – if two scores appear most frequently Multimodal – if three or more scores appear most frequently  Median – the middle score when the set of data is organized by size  Mean – the arithmetic average of the set of scores (most commonly used)

Normal Distribution (also called normal curve or bell-curve) A “normal distribution” of data means that most of the examples in a set of data are close to the mean (average), while relatively few example tend to one extreme or the other. Scores are often normally distributed. When this happens, the mode, median, and mean are all the same (in this case, 100).

Measures of Central Tendency in Dunder Mifflin Salaries Watch out for extreme scores or outliers. Let’s look at the salaries of the employees of the Dunder Mifflin Paper Company in Scranton: $25,000-Pam $25,000- Kevin $25,000- Angela $100,000- Andy $100,000- Dwight $200,000- Jim $300,000- Michael The median salary looks good at $100,000. The mean salary also looks good at about $110,000. But the mode salary is only $25,000. Maybe not the best place to work. Then again, living in Scranton is kind of cheap.

Skewed Distributions When a few extreme scores (called outliers) significantly affect the mean. Distributions where most of the scores are squeezed into one end are skewed. In very skewed distributions, the median is a better measure of central tendency than the mean.

Central Tendency 1968 TOPPS Baseball Cards Nolan Ryan$1500 Billy Williams$8 Luis Aparicio$5 Harmon Killebrew$5 Orlando Cepeda$3.50 Maury Wills$3.50 Jim Bunning$3 Tony Conigliaro$3 Tony Oliva$3 Lou Pinella$3 Mickey Lolich$ With Ryan: - Median = $ Mean = $74.14 Elston Howard$2.25 Jim Bouton$2 Rocky Colavito$2 Boog Powell$2 Luis Tiant $2 Tim McCarver$1.75 Tug McGraw$1.75 Joe Torre $1.5 Rusty Staub$1.25 Curt Flood$1 - W/O Ryan: - Median = $ Mean = $2.85

Skews A few of the scores stretch out away from the group like a tail. The skew is named for the direction of the tail. Tail going to the left – negatively skewed Tail going to the right – positively skewed

Look at the above figure and note that when a variable is normally distributed, the mean, median, and mode are the same number. When the data are negatively skewed, this happens: mean < median < mode. When the data are positively skewed, this happens: mean > median > mode. If you go to the end of the curve, to where it is pulled out the most, you will see that the order goes mean, median, and mode as you “walk up the curve” for negatively and positively skewed curves. You can use the following two rules to provide some information about skewness even when you cannot see a line graph of the data (i.e., all you need is the mean and the median): 1. Rule One. If the mean is less than the median, the data are skewed to the left. 2. Rule Two. If the mean is greater than the median, the data are skewed to the right.

How to memorize positively or negatively skewed distributions

Measures of Variability Variability describes the spread or dispersion of scores for a set of data.  Range – The largest score minus the smallest score  Variance and standard deviation – indicate the degree to which scores differ from each other and vary around the mean value for the set. The higher the variance or SD, the more spread out the distribution is.

More on Standard Deviation Standard deviation is kind of the “mean of the mean” and can often help you get the real story behind the data. It is how far, on average, scores deviate from the mean. The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data. When the examples are pretty tightly bunched together and the bell-shaped curve is steep, the standard deviation is small. When the examples are spread apart and the bell curve is relatively flat, that tells you that you have a relatively large standard deviation. One standard deviation away from the mean in either direction on the horizontal axis (the red area on the graph) accounts for around 68% of the people in this group. Two standard deviations away from the mean (the red and green areas) account for roughly 95% of the people. And three standard deviations (the red, green and blue areas) account for about 99% of the people. If this curve were flatter and more spread out, the standard deviation would have to be larger in order to account for those 68% or so of the people. So that's why the standard deviation can tell you how spread out the examples in a set are from the mean. See how standard deviation changes as you manipulate a graph!

To Calculate Variance To calculate the variance for the set of numbers 4, 5, 5, 6, 6, 6, 6, 7, 7, 8:  Calculate the mean (average) – 60÷10 = 6  Subtract the mean from each score in the distribution above  This shows you how far each score deviates from the mean, and when you add all of these numbers together, they should always equal zero. 4-6=-2 5-6= 5-6= 6-6= = =2

To Calculate Variance (cont.)  However, we want to convert the scores to a form that allows us to add them up and not get zero. Therefore, we square all of the deviations scores, which removes all of the negative values.  Now when we add them up, we get 12. The larger this number is, the greater the dispersion of the scores is.  Now divide the sum above by the number of scores in the group. This gives you an estimate of the average distance that a score is away from the mean. 12 ÷ 10 = (square this) -2 x -2=4 -1 x -1=1 -1 x -1=1 00 x 0= x 1= x 2=4

To Calculate Standard Deviation To calculate standard deviation, all you do is calculate the square root of the variation you just calculated. √1.2 = 1.1 The smaller this number is, the more confident you can be in using the mean to represent the group. For more information on standard deviation, see explained-standard-deviation.html explained-standard-deviation.html

Descriptive Statistics PsychSim for activity and worksheet m5/Descriptive%20Statistics/PsychSi m_Shell.html m5/Descriptive%20Statistics/PsychSi m_Shell.html

Try it yourself For this set of data, calculate  Median  Mode  Mean  Range  Variance  Standard Deviation 1, 3, 5, 5, 6, 7, 7, 8, 9, 9

Answers: 1, 3, 5, 5, 6, 7, 7, 8, 9, 9 Median = 6.5 Mode = 5, 7, and 9 (multimodal) Mean = 6 Range = 8 Variance = 6 Standard Deviation = 2.4

Correlations A correlation is a statistical technique that allows us to understand the degree to which two variables are related. The strength and direction of correlations can be illustrated graphically in a scatterplot, in which paired X and Y scores for each subject are plotted as single points on a graph. The slope of a line that best fits the pattern of points suggests the degree and direction of the relationship between the two variables.

Scatterplots and Correlations Positive Correlation As one number increases, the other increases. Ex: Study time to GPA Negative Correlation As one number increases, the other decreases. Ex: Absences to GPA No Correlation Variables do not affect one another in a significant way Ex: Height to GPA

What does this scatterplot tell us?

Correlation Coefficients (r) A statistical measure of the degree of relatedness between two things. Ranges from to 1.00 Zero is no relationship is a stronger relationship than.34 Remember that correlation is not causation!

Correlation is not causation!

Illusory Correlation We believe there is a relationship between two things when it actually doesn’t exist. See om/spurious- correlations for some funny examples, and scover to create some of your own. om/spurious- correlations scover

“Feline High-Rise Syndrome” The New York Times reported evidence for cats’ exceptional ability to survive falls from high-rise buildings. They cited evidence that from June – November, 1984, 132 such victims were admitted to the Animal Medical Center, only 8 of which died from their injuries. What is the problem with this correlation? Would you take a dead cat to the Animal Medical Center to be admitted??

One more example…

Correlation PsychSim for activity and worksheet chool/prockwell/documents/AP%20Psy ch/PsychSim/PsychSim5.swf chool/prockwell/documents/AP%20Psy ch/PsychSim/PsychSim5.swf

Inferential Statistics Whereas descriptive statistics simply describe a data set, inferential statistics attempt to make inferences about a larger population based on a data set. For example, if you're interested in studying GBHS student behavior, you would have a hard time collecting data from each and every student. Instead, you collect data from a sample of students, representative of the entire GBHS student body, and then make inferences about the student body based on the data collected from your sample.

Inferential Statistics (cont.) Any time you collect data, it will contain variability due to chance. For example, by chance alone, you might collect data from more GBHS freshmen than from GBHS sophomores. If you repeated your data collection several times, you would get somewhat different results each time due to this chance variability. If this chance variability always exists in data collection, how can a researcher be confident that the inferences he or she makes about the larger population (the entire GBHS student body) is accurate? We use inferential statistics! Instead of making absolute conclusions about the population, researchers make statements about the population using the laws of probability and statistical significance.

Statistical Significance (sometimes known as a p score or p value) When inferential statistics demonstrate a high probability that research results are not due to chance, the results are said to be statistically significant. Psychologists say that something is statistically significant when the probability that it might be due to chance is less than 5 in 100 (indicated by the notation p <.05) In other words, there is less than a 5% chance that your results occurred just coincidentally, or by chance. Any conclusion drawn from inferential statistics is only a statement of the probability that the results reflect a real difference in the world, rather than a chance difference in the samples selected.

To Summarize Descriptive Statistics Organize and summarize data Central Tendency: mean, median, mode Standard deviation: variation in data Range: distance from smallest to largest Inferential Statistics Interpret data and draw conclusions Used to test validity of hypothesis

Critical Thinking with Statistics The old saying goes”… there are three kinds of lies – lies, damned lies, and statistics.” The presentation of research findings in the form of numbers, graphs, etc., may look impressive, but remember that they can be distorted to make you believe something that is not necessarily true. The next slide shows some common ways in which this is done.

Biased or insufficient samples “Four out of five dentists surveyed recommended Brand X gum.”  The # of dentists surveyed is not clear  How were the dentists chosen? Was it a random sample, or were 5 dentists chosen because they hold stock in Brand X gum? Many mail-in surveys suffer from a selection bias – the people who send them in may differ in important ways from those who do not.

The Misleading Average Example: The principal of a small private school met the criticism that his faculty has no teaching experience by issuing the statement that the average experience of each member of the faculty was 5 years. This statement was technically true: there were five teachers in the school including the principal, but the principal neglected to mention that he had twenty-five years of experience while the remaining four members had none.

Exaggerated graphs Depending on the height and width of each of the axes, different pictures of the data will emerge. Also, the data will look different, depending on how the axes (particularly the y-axis) are labeled.