TOPIC 1 STATISTICAL ANALYSIS

Slides:



Advertisements
Similar presentations
Unit 1: Science of Psychology
Advertisements

Statistical Tests Karen H. Hagglund, M.S.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
QUANTITATIVE DATA ANALYSIS
Calculating & Reporting Healthcare Statistics
Topic 2: Statistical Concepts and Market Returns
Social Research Methods
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
Understanding Research Results
Statistical Analysis I have all this data. Now what does it mean?
Answering questions about life with statistics ! The results of many investigations in biology are collected as numbers known as _____________________.
DATA ANALYSIS FOR RESEARCH PROJECTS
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Statistical Analysis Statistical Analysis
Data Collection & Processing Hand Grip Strength P textbook.
Topic 1: Statistical Analysis
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Statistical Analysis I have all this data. Now what does it mean?
QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures.
Measures of Dispersion CUMULATIVE FREQUENCIES INTER-QUARTILE RANGE RANGE MEAN DEVIATION VARIANCE and STANDARD DEVIATION STATISTICS: DESCRIBING VARIABILITY.
Analyzing and Interpreting Quantitative Data
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
Statistical Analysis Topic – Math skills requirements.
Descriptive Statistics: Numerical Methods
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
Research Ethics:. Ethics in psychological research: History of Ethics and Research – WWII, Nuremberg, UN, Human and Animal rights Today - Tri-Council.
Statistical Analysis IB Topic 1. Why study statistics?  Scientists use the scientific method when designing experiments  Observations and experiments.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
Statistical analysis. Types of Analysis Mean Range Standard Deviation Error Bars.
Chapter Eight: Using Statistics to Answer Questions.
Statistical Analysis. Null hypothesis: observed differences are due to chance (no causal relationship) Ex. If light intensity increases, then the rate.
Data Analysis.
Chapter 6: Analyzing and Interpreting Quantitative Data
PCB 3043L - General Ecology Data Analysis.
STATISTICS FOR SCIENCE RESEARCH (The Basics). Why Stats? Scientists analyze data collected in an experiment to look for patterns or relationships among.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
STATISTICS STATISTICS Numerical data. How Do We Make Sense of the Data? descriptively Researchers use statistics for two major purposes: (1) descriptively.
HL Psychology Internal Assessment
MAKING MEANING OUT OF DATA Statistics for IB-SL Biology.
CHAPTER 2: Basic Summary Statistics
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
STATISICAL ANALYSIS HLIB BIOLOGY TOPIC 1:. Why statistics? __________________ “Statistics refers to methods and rules for organizing and interpreting.
Statistical Analysis IB Topic 1. IB assessment statements:  By the end of this topic, I can …: 1. State that error bars are a graphical representation.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
STATS DAY First a few review questions. Which of the following correlation coefficients would a statistician know, at first glance, is a mistake? A. 0.0.
Outline Sampling Measurement Descriptive Statistics:
Statistics Made Simple
Statistical analysis.
STATISTICS FOR SCIENCE RESEARCH
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
Statistical analysis.
PCB 3043L - General Ecology Data Analysis.
Chapter 5 STATISTICS (PART 1).
Social Research Methods
STATS DAY First a few review questions.
Statistics for IB-SL Biology
Descriptive Statistics
Statistical Analysis IB Topic 1.
STATISTICS Topic 1 IB Biology Miss Werba.
Statistics Made Simple
CHAPTER 2: Basic Summary Statistics
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

TOPIC 1 STATISTICAL ANALYSIS

MAKING A SCIENTIFIC INVESTIGATION STEP 1: HAVE A RESEARCH QUESTION STEP 2: HAVE A HYPOTHESIS STEP 3: WRITE A METHOD TO TEST YOUR HYPOTHESIS (design a controlled experiment) STEP 4: COLLECT DATA STEP 5: ORGANIZE THE DATA STEP 6: ILLUSTRATE THE DATA USING AN APPROPRIATE DIAGRAM STEP 7: ANALYZE THE DATA USING THE CORRECT STATISTICAL METHODS, ENABLING A CONCLUSION TO BE DRAWN

STEP 4: DATA COLLECTION The collection of all things being investigated is called the population. It is usually impossible for us to collect data from every member of the population. We must therefore choose a sample from the population.

We must try to make sure that the sample is representative of the population from which it is drawn, so that we can generalize any findings about the sample to the population. Random sampling ensures that every member of the population has an equal chance of being included in the sample.

QUALITATIVE DATA (descriptive) II. QUANTITATIVE DATA (numerical) CONTINUOUS ex. length DISCRETE ex. number of eggs

STEP 5: ORGANIZING DATA Ways to Organize Raw Data: Constructing tables - Ranking - Tally chart - Frequency distribution

Use the table below to answer the following questions: Is discrete or continuous data represented? What type of data organization is below? Is the data table complete? How will you process this data? (What does this data ‘say’ to you?) Shell length / mm Number of limpets 8-11 2 12-15 5 16-19 8 20-23 10 24-27 9 28-31 32-35 1

Marine Intertidal Zone QUADRAT SAMPLING Marine Intertidal Zone

SPREADSHEET ACTIVITY 1: NORMAL DISTRIBUTION 1) Input the data from Limpet Shell Lengths in your spreadsheet 2) GRAPH: frequency distribution (normal distribution) 3) What does this graph tell you? Shell length / mm Number of limpets 8-11 2 12-15 5 16-19 8 20-23 10 24-27 9 28-31 32-35 1

Normal Distribution Skewed Distribution

Descriptive Statistics Includes: Calculating the: Mean Median Mode Range Standard deviation (variability) P value (level of confidence from a T-Test) PEARSON correlation coefficient (correlation/cause)

Mean (average): the average of all data entries; measure of central tendency for normal distribution. Median: middle value when data entries are placed in rank order; good measure of central tendency for skewed distributions. Mode: the most frequently ocurring value (the most common data value) Range: the difference between the smallest and largest data values. This gives simple measure of spread of data. (Note: gives us outliers – extremes which are very different from all other values)

1) Input the following data in your spreadsheet SPREADSHEET ACTIVITY 2 1) Input the following data in your spreadsheet Sample 1: 30 45 45 60 75 75 75 80 90 90 100 Sample 2: 60 60 70 70 80 80 90 90 100 100 120 120 2) Calculate the mean, median, mode & range a) manually (using scientific calculator) b) using your spreadsheet Note: you need to know how to complete all stats. calculations using: 1) formula 2) spreadsheet 3) calculator.

Do we stop data analysis at calculating the Mean, Median & Mode? No! The mean does not give us a complete picture of variation in our data. We need to calculate standard deviation The STDEV is a more complete measure of variation. It considers every value in the set. It is a measure of the spread of data around the mean

SPREADSHEET ACTIVITY 3: Standard Deviation 1) Input the following data in your spreadsheet. Mass (g) of mice bred in different environments Sample A (isolated mice) 22, 22, 23, 24, 24, 24, 24, 25, 26, 26 Sample B ( bred together) 16, 17, 20, 23, 24, 25,27, 28, 29, 31 2) Calculate the means for samples A & B 3) Calculate standard deviation (STDEVP) for A & B a) with formula b) with spreadsheet c) with calculator 4) Is variation high or low in Sample A? Sample B? 5) What does this variation tell us?

Analyzing Values from Mice Samples Looking at the calculated values for mean alone for sample A and B, it appears that there is no difference between the two populations of mice. (we cannot recognize variability of data) However, when looking at STDEV, we can see: For sample A – STDEV is low For sample B – STDEV is high Wide variation in this data set makes us question the experimental design. Is it possible that mice bred in environment ‘B’ were subject to other environmental factors ? What is causing wide variation of data?

x x x x x x x x x 22 24 26 x x x x x x x x x x 16 24 31

For normally distributed data: Standard Deviation: A measure of how the individual observations of a data set are dispersed or spread out around the mean (average). For normally distributed data: 68% of all values lie within ±1 standard deviation of the mean 95% of all values lie within ±2 standard deviations of the mean

Reasons for Using Statistics In a population, we usually find that not all the values are identical. Instead, there are differences between the values even inside a population. We call this VARIATION. The data we obtain from a study has variability. We often need to describe the variation within a population to help us decide whether a difference between sample means truly represents a difference between populations means. How can we describe this variation? (via statistics)

Why Use Standard Deviation? The value provides a description of the variation which considers every data item. Large differences in the sizes of the standard deviation between samples being compared can indicate: 1) that control variables are not constant 2) that there is a problem with validity of the investigation. The standard deviation can be used as a support in hypothesis testing.

We can graphically represent STDEV as ERROR BARS

Error Bars In many charts and graphs, we show the mean values of our samples. It is useful to show a measure of the variation inside each of these samples. We do this by adding error bars to the chart or graph.

Error Bars An error bar is a line that extends above and below a bar in a chart of a data point in a graph. It could represent the range for that sample, or the standard deviation. The length of the line represents the size of the range or size of standard deviation – it extends an equal distance above and below the value of the mean. Error bars are graphical representations of the variability of data.

Significance Significance: real; true difference between two or more samples in the phenomena that we are examining (testing to see if findings are not just by chance) Note: statistical significance is our main tool in deciding whether the data supports the hypothesis.

What information do the means of data give? What additional information do error bars give? How does this affect interpretation of the figures? - Error bars help us determine whether or not the difference between two sets of data is significant (real). A large difference between the means of samples, and small standard deviations for thes samples, indicates that it is likely that the difference between the means is statistically significant. A small difference between these means and large standard deviations fro these samples indicates that it is likely that the difference between these means is not statistically significant.

Confidence Levels It is seldom possible to say with absolute certainty that the difference between sample means is significant with complete certainty (100% confidence) Instead, we determine if the difference between the sample means is probably significant. Most often, scientists/biologists want to be 95% confident that the difference between the samples is significant. This means that there is only 5% chance that the samples could be different purely due to chance and not because of a real difference between the populations. We could say: p = 0.05 (the probability (p) that chance alone produced the difference between our sample means is 5%.

Determining Confidence of Significance with T-Test How do we determine if our findings are significant? We Need to calculate our t value and find p value. Apply t-test to calculate t-value – will help determine p-value (significance at a certain level of confidence): Data should be normally distributed Sample size should be at least 10

T-Test Need to include the following information for T-Test calculation: 1) size of the difference between means of the samples 2) number of items in each sample 3) the amount of variation about the mean of each sample (standard deviation) Value for t from data can be calculated using: Formula Scientific calculator Spread sheet (Microsoft Excel)

SPREASHEET ACTIVITY 4: T-TEST (P-value) 1) Input data from Clegg Text Chapter 21 Page 681 2) Calculate: mean and standard deviation 3) Calculate: P-value (from T-Test) a) spreadsheet b) calculator 4) What does this P-value tell you?

T-Test & P-Value using a Calculator Need to use table of t-values! Calculate T-Test Value (t-value) Identify Degrees of Freedom for your experiment ((sample 1 + sample 2)-2) = DF Example: (10+10)-2 = 18 Find row 18 in DF column Find t value in row 18 under “t values” column Once you found your t value, look to the bottom row in that column for p value.

Two – tailed test A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x. The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05. We would use a two-tailed test to see if two means are different from each other (ie from different populations), or from the same population.

Most likely observation observed or more extreme result arising by chance

Cause & Correlation Correlation: a relationship or connection between two or more things. (observations without an experiment can only show a correlation) Cause: a phenomenon that gives rise to a result. (experimentation gives evidence for cause of result) Example: we might do an experiment to see if watering bean plants prevents wilting. Observing that wilting occurs when the soil is dry is a simple correlation, but the experiment gives us evidence that the lack of water is the cause of the wilting. Experiments proved a test which shows cause.

LIGHT INTENSITY (X UNITS) SPREADSHEET ACTIVITY 5: 1) Inpute the following data 2) Calculate the PEARSON Correlation Coefficient (r value) LIGHT INTENSITY (X UNITS) PLANT HEIGHT (CM) 6 5 7 10 9 15 20 11 25 12 30 3) Explain what this r-value tells you. 4) Explain that existence of a correlation does not establish that there is a causal relationship between two variables.

Positive Correlation: The correlation in the same direction is called positive correlation. If one variable increases, the other variable also increases or if one variable decrease and the other variable also decreases. For example, the length of an iron bar will increase as the temperature increases. Negative Correlation: The correlation in opposite direction is called negative correlation, if one variable is increase other is decrease and vice versa, for example, the volume of gas will decrease as the pressure increase or the demand of a particular commodity is increase as price of such commodity is decrease. No Correlation or Zero Correlation: If there is no relationship between the two variables such that the value of one variable change and the other variable remain constant is called no or zero correlation.