Presentation is loading. Please wait.

Presentation is loading. Please wait.

TOPIC 1 STATISTICAL ANALYSIS

Similar presentations


Presentation on theme: "TOPIC 1 STATISTICAL ANALYSIS"— Presentation transcript:

1 TOPIC 1 STATISTICAL ANALYSIS

2 MAKING A SCIENTIFIC INVESTIGATION
STEP 1: HAVE A RESEARCH QUESTION STEP 2: HAVE A HYPOTHESIS STEP 3: WRITE A METHOD TO TEST YOUR HYPOTHESIS (design a controlled experiment) STEP 4: COLLECT DATA STEP 5: ORGANIZE THE DATA STEP 6: ILLUSTRATE THE DATA USING AN APPROPRIATE DIAGRAM STEP 7: ANALYZE THE DATA USING THE CORRECT STATISTICAL METHODS, ENABLING A CONCLUSION TO BE DRAWN

3 STEP 4: DATA COLLECTION The collection of all things being investigated is called the population. It is usually impossible for us to collect data from every member of the population. We must therefore choose a sample from the population.

4 We must try to make sure that the sample is
representative of the population from which it is drawn, so that we can generalize any findings about the sample to the population. Random sampling ensures that every member of the population has an equal chance of being included in the sample.

5 QUALITATIVE DATA (descriptive)
II. QUANTITATIVE DATA (numerical) CONTINUOUS ex. length DISCRETE ex. number of eggs

6 STEP 5: ORGANIZING DATA Ways to Organize Raw Data: Constructing tables - Ranking - Tally chart - Frequency distribution

7 Use the table below to answer the following questions:
Is discrete or continuous data represented? What type of data organization is below? Is the data table complete? How will you process this data? (What does this data ‘say’ to you?) Shell length / mm Number of limpets 8-11 2 12-15 5 16-19 8 20-23 10 24-27 9 28-31 32-35 1

8 Marine Intertidal Zone
QUADRAT SAMPLING Marine Intertidal Zone

9

10 SPREADSHEET ACTIVITY 1: NORMAL DISTRIBUTION
1) Input the data from Limpet Shell Lengths in your spreadsheet 2) GRAPH: frequency distribution (normal distribution) 3) What does this graph tell you? Shell length / mm Number of limpets 8-11 2 12-15 5 16-19 8 20-23 10 24-27 9 28-31 32-35 1

11 Normal Distribution Skewed Distribution

12 Descriptive Statistics Includes:
Calculating the: Mean Median Mode Range Standard deviation (variability) P value (level of confidence from a T-Test) PEARSON correlation coefficient (correlation/cause)

13 Mean (average): the average of all data entries; measure of central tendency for normal distribution. Median: middle value when data entries are placed in rank order; good measure of central tendency for skewed distributions. Mode: the most frequently ocurring value (the most common data value) Range: the difference between the smallest and largest data values. This gives simple measure of spread of data. (Note: gives us outliers – extremes which are very different from all other values)

14 1) Input the following data in your spreadsheet
SPREADSHEET ACTIVITY 2 1) Input the following data in your spreadsheet Sample 1: Sample 2: 2) Calculate the mean, median, mode & range a) manually (using scientific calculator) b) using your spreadsheet Note: you need to know how to complete all stats. calculations using: 1) formula 2) spreadsheet 3) calculator.

15 Do we stop data analysis at calculating the Mean, Median & Mode?
No! The mean does not give us a complete picture of variation in our data. We need to calculate standard deviation The STDEV is a more complete measure of variation. It considers every value in the set. It is a measure of the spread of data around the mean

16 SPREADSHEET ACTIVITY 3: Standard Deviation
1) Input the following data in your spreadsheet. Mass (g) of mice bred in different environments Sample A (isolated mice) 22, 22, 23, 24, 24, 24, 24, 25, 26, 26 Sample B ( bred together) 16, 17, 20, 23, 24, 25,27, 28, 29, 31 2) Calculate the means for samples A & B 3) Calculate standard deviation (STDEVP) for A & B a) with formula b) with spreadsheet c) with calculator 4) Is variation high or low in Sample A? Sample B? 5) What does this variation tell us?

17 Analyzing Values from Mice Samples
Looking at the calculated values for mean alone for sample A and B, it appears that there is no difference between the two populations of mice. (we cannot recognize variability of data) However, when looking at STDEV, we can see: For sample A – STDEV is low For sample B – STDEV is high Wide variation in this data set makes us question the experimental design. Is it possible that mice bred in environment ‘B’ were subject to other environmental factors ? What is causing wide variation of data?

18 x x x x x x x x x x x x x x x x x x x

19 For normally distributed data:
Standard Deviation: A measure of how the individual observations of a data set are dispersed or spread out around the mean (average). For normally distributed data: 68% of all values lie within ±1 standard deviation of the mean 95% of all values lie within ±2 standard deviations of the mean

20 Reasons for Using Statistics
In a population, we usually find that not all the values are identical. Instead, there are differences between the values even inside a population. We call this VARIATION. The data we obtain from a study has variability. We often need to describe the variation within a population to help us decide whether a difference between sample means truly represents a difference between populations means. How can we describe this variation? (via statistics)

21 Why Use Standard Deviation?
The value provides a description of the variation which considers every data item. Large differences in the sizes of the standard deviation between samples being compared can indicate: 1) that control variables are not constant 2) that there is a problem with validity of the investigation. The standard deviation can be used as a support in hypothesis testing.

22 We can graphically represent STDEV as ERROR BARS

23 Error Bars In many charts and graphs, we show the mean values of our samples. It is useful to show a measure of the variation inside each of these samples. We do this by adding error bars to the chart or graph.

24 Error Bars An error bar is a line that extends above and below a bar in a chart of a data point in a graph. It could represent the range for that sample, or the standard deviation. The length of the line represents the size of the range or size of standard deviation – it extends an equal distance above and below the value of the mean. Error bars are graphical representations of the variability of data.

25 Significance Significance: real; true difference between two or more samples in the phenomena that we are examining (testing to see if findings are not just by chance) Note: statistical significance is our main tool in deciding whether the data supports the hypothesis.

26 What information do the means of data give?
What additional information do error bars give? How does this affect interpretation of the figures? - Error bars help us determine whether or not the difference between two sets of data is significant (real). A large difference between the means of samples, and small standard deviations for thes samples, indicates that it is likely that the difference between the means is statistically significant. A small difference between these means and large standard deviations fro these samples indicates that it is likely that the difference between these means is not statistically significant.

27 Confidence Levels It is seldom possible to say with absolute certainty that the difference between sample means is significant with complete certainty (100% confidence) Instead, we determine if the difference between the sample means is probably significant. Most often, scientists/biologists want to be 95% confident that the difference between the samples is significant. This means that there is only 5% chance that the samples could be different purely due to chance and not because of a real difference between the populations. We could say: p = 0.05 (the probability (p) that chance alone produced the difference between our sample means is 5%.

28 Determining Confidence of Significance with T-Test
How do we determine if our findings are significant? We Need to calculate our t value and find p value. Apply t-test to calculate t-value – will help determine p-value (significance at a certain level of confidence): Data should be normally distributed Sample size should be at least 10

29 T-Test Need to include the following information for T-Test calculation: 1) size of the difference between means of the samples 2) number of items in each sample 3) the amount of variation about the mean of each sample (standard deviation) Value for t from data can be calculated using: Formula Scientific calculator Spread sheet (Microsoft Excel)

30 SPREASHEET ACTIVITY 4: T-TEST (P-value) 1) Input data from Clegg Text Chapter 21 Page 681 2) Calculate: mean and standard deviation 3) Calculate: P-value (from T-Test) a) spreadsheet b) calculator 4) What does this P-value tell you?

31 T-Test & P-Value using a Calculator
Need to use table of t-values! Calculate T-Test Value (t-value) Identify Degrees of Freedom for your experiment ((sample 1 + sample 2)-2) = DF Example: (10+10)-2 = 18 Find row 18 in DF column Find t value in row 18 under “t values” column Once you found your t value, look to the bottom row in that column for p value.

32 Two – tailed test A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x. The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05. We would use a two-tailed test to see if two means are different from each other (ie from different populations), or from the same population.

33 Most likely observation
observed or more extreme result arising by chance

34 Cause & Correlation Correlation: a relationship or connection between two or more things. (observations without an experiment can only show a correlation) Cause: a phenomenon that gives rise to a result. (experimentation gives evidence for cause of result) Example: we might do an experiment to see if watering bean plants prevents wilting. Observing that wilting occurs when the soil is dry is a simple correlation, but the experiment gives us evidence that the lack of water is the cause of the wilting. Experiments proved a test which shows cause.

35 LIGHT INTENSITY (X UNITS)
SPREADSHEET ACTIVITY 5: 1) Inpute the following data 2) Calculate the PEARSON Correlation Coefficient (r value) LIGHT INTENSITY (X UNITS) PLANT HEIGHT (CM) 6 5 7 10 9 15 20 11 25 12 30 3) Explain what this r-value tells you. 4) Explain that existence of a correlation does not establish that there is a causal relationship between two variables.

36 Positive Correlation:
The correlation in the same direction is called positive correlation. If one variable increases, the other variable also increases or if one variable decrease and the other variable also decreases. For example, the length of an iron bar will increase as the temperature increases. Negative Correlation: The correlation in opposite direction is called negative correlation, if one variable is increase other is decrease and vice versa, for example, the volume of gas will decrease as the pressure increase or the demand of a particular commodity is increase as price of such commodity is decrease. No Correlation or Zero Correlation: If there is no relationship between the two variables such that the value of one variable change and the other variable remain constant is called no or zero correlation.


Download ppt "TOPIC 1 STATISTICAL ANALYSIS"

Similar presentations


Ads by Google