Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data.

Slides:



Advertisements
Similar presentations
Using the T-test IB Biology Topic 1.
Advertisements

Statistical Analysis WHY ?.
Are our results reliable enough to support a conclusion?
Review Feb Adapted from: Taylor, S. (2009). Statistical Analysis. Taken from:
Psychology: A Modular Approach to Mind and Behavior, Tenth Edition, Dennis Coon Appendix Appendix: Behavioral Statistics.
Table of Contents Exit Appendix Behavioral Statistics.
1 STATISTICS!!! The science of data. 2 What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis.
Answering questions about life with statistics ! The results of many investigations in biology are collected as numbers known as _____________________.
DATA ANALYSIS FOR RESEARCH PROJECTS
TOPIC 1 STATISTICAL ANALYSIS
Assessment Statements – State that error bars are a graphical representation of the variability of data – Calculate the mean and standard deviation.
Data Collection & Processing Hand Grip Strength P textbook.
Topic 1: Statistical Analysis
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
STATISTICS!!! The science of data. What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for.
STATISTICS!!! The science of data. What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
6.1 Statistical Analysis.
Measures of Dispersion CUMULATIVE FREQUENCIES INTER-QUARTILE RANGE RANGE MEAN DEVIATION VARIANCE and STANDARD DEVIATION STATISTICS: DESCRIBING VARIABILITY.
By C. Kohn Waterford Agricultural Sciences.   A major concern in science is proving that what we have observed would occur again if we repeated the.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Statistical Analysis Topic 1. Statistics State that error bars are a graphical representation of the variability of data Calculate the mean.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Uncertainty and Error in Measurement (IB text - Ch 11) (If reviewing this slide in the senior year, there is also uncertainty information in the AP text.
Data Collection and Processing (DCP) 1. Key Aspects (1) DCPRecording Raw Data Processing Raw Data Presenting Processed Data CompleteRecords appropriate.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
STATISTICS!!! The science of data.
Statistical Analysis IB Topic 1. Why study statistics?  Scientists use the scientific method when designing experiments  Observations and experiments.
STATISTICS!!! The science of data. What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for.
Chapter Eight: Using Statistics to Answer Questions.
Statistical Analysis Topic 1. Statistics State that error bars are a graphical representation of the variability of data Calculate the mean.
Data Analysis.
RESEARCH & DATA ANALYSIS
PCB 3043L - General Ecology Data Analysis.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
Are our results reliable enough to support a conclusion?
STATISTICS STATISTICS Numerical data. How Do We Make Sense of the Data? descriptively Researchers use statistics for two major purposes: (1) descriptively.
MAKING MEANING OUT OF DATA Statistics for IB-SL Biology.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
The T-Test Are our results reliable enough to support a conclusion?
USING GRAPHING SKILLS. Axis While drawing graphs, we have two axis. X-axis: for consistent variables Y-axis: for other variable.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Statistical Analysis IB Topic 1. IB assessment statements:  By the end of this topic, I can …: 1. State that error bars are a graphical representation.
Using the T-test Topic 1.
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
STATS DAY First a few review questions.
STATISTICS!!! The science of data.
STATISTICAL ANALYSIS.
AP Biology Intro to Statistics
Statistics in Science Data can be collected about a population (surveys) Data can be collected about a process (experimentation)
Statistics for IB-SL Biology
Are our results reliable enough to support a conclusion?
What is Data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for making calculations or drawing.
Statistical Analysis IB Topic 1.
STATISTICAL ANALYSIS.
Are our results reliable enough to support a conclusion?
Are our results reliable enough to support a conclusion?
Are our results reliable enough to support a conclusion?
Statistical analysis.
Chapter Nine: Using Statistics to Answer Questions
Are our results reliable enough to support a conclusion?
Are our results reliable enough to support a conclusion?
Presentation transcript:

Statistics The POWER of Data

Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data.

Data Data is information, in the form of facts or figures, used as a basis for making calculations or drawing conclusions Data can be quantitative or qualitative. Data can be obtained by a variety of methodologies: –Random Sample –Quadrat Study –Questionnaires –Experiments

Types of Data: Qualitative Information that relates to characteristics or descriptions (observable qualities) Examples –Species of plant –Type of insect –Shades of color –Rank of flavor in taste testing Qualitative data can be “scored” and evaluated numerically IB Junior Class Qualitative data: friendly demeanors Hard workers environmentalists positive school spirit

Types of Data: Quantitative Quantitative – measured using a naturally occurring numerical scale Measurements are often displayed graphically Examples –Chemical concentration –Temperature –Length –Weight…etc.

Error Analysis ALL measurements are subject to uncertainties and this must always be stated. –Measure someone’s height. Are they slouching? Wearing shoes? Standing on even ground? Is it morning or night? –Measuring multiple times REDUCES the error!! The limits of the equipment used add some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass- produced a good measure of 1 cm? 1mm? 0.1mm? For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement!!

Instrument Error Most measurements in the lab are done with devices that have a marked scale. Here we are measuring the length of a pendulum. Even using this idealized, zoomed-in picture, we cannot tell for sure whether the length to the end of the mass is cm or cm. However, it is certainly closer to cm than to cm or cm. Thus we can state with absolute confidence that the length L is 128.9cm + 0.1cm

Error and Electronics ALL devices have an error potential; sometimes it is written on the instrument itself The error of an electronic device is usually half of the last precision digit.

Error and Graphing Error bars are a graphical representation of the variability of data. Error bars may show confidence intervals, standard errors, standard deviations, range of data or other quantities. More trials = less error

Significant Figures Be sure that the number of significant digits in the data table/graph reflects the precision of the instrument used (for ex. If the manufacturer states that the accuracy of a balance is to 0.1g – and your average mass is 2.06g, be sure to round the average to 2.1g) Your data must be consistent with your measurement tool regarding significant figures.

Creating Error Bars on Graphs When we need an accurate measurement, we usually repeat the measurement several times and calculate an average value Then we do the same for the next measurement Average Once the two averages are calculated for each set of data, the average values can be plotted together on a graph, to visualize the relationship between the two!

Comparing Averages

Creating Error Bars The simplest way to draw an error bar is to use the mean as the central point, and to use the distance of the measurement that is furthest from the average as the endpoints of the data bar

Error Bars Error bars that overlap can suggest that there is not a significant difference

Other Data Calculations mode: value that appears most frequently median: When all data are listed from least to greatest, the value at which half of the observations are greater, and half are lesser. The most commonly used measure of central tendency is the mean, or arithmetic average (sum of data points divided by the number of points) You should be able to find the mean, mode and quartiles of your data and this should be shown on your graphs or in your data charts. 13, 18, 13, 14, 13, 16, 14, 21, 13 mean: 15 median: 14 mode: 13 range: 8

Standard Deviation Standard deviation is used to summarize the spread of values around the mean For normally distributed data, about 68% of all values lie within ±1 standard deviation of the mean. This rises to about 95% for ±2 standard deviations. A small standard deviation indicates that the data is clustered closely around the mean value. Conversely, a large standard deviation indicates a wider spread around the mean. Standard deviation can also be used in drawing error bars

Calculating Standard Deviation You can use the old formula for calculating standard deviation or the much easier calculator button! We will practice using the formula.

Calculating Standard Deviation Standard Deviation can be calculated: – on your calculator (substitute the cell name of the first value in your dataset for A1, and the cell name of the last value for A30.) –In Microsoft excel (type the following code into the cell where you want the standard deviation result, using the “unbiased,” or “n- 1” method: STDEV(A1:A30) (substitute the cell name of the first value in your dataset for A1, and the cell name of the last value for A30.) –On your computer: u/~jdf37/mean.htm u/~jdf37/mean.htm u/~jdf37/mean.htm

Let’s measure some people!!

Is our data reliable enough to support a conclusion?

Imagine we chose two children at random from two class rooms… … and compare their height …

… we find that one pupil is taller than the other WHY?

REASON 1: There is a significant difference between the two groups, so pupils in 211 are taller than pupils in Ninth Grade th grade

REASON 2: By chance, we picked a short pupil from 210 and a tall one from Heather (11 th grade) Alex (11 th grade)

How do we decide which reason is most likely? MEASURE MORE STUDENTS!!!

If there is a significant difference between the two groups… … the average or mean height of the two groups should be very… … DIFFERENT

If there is no significant difference between the two groups… … the average or mean height of the two groups should be very… … SIMILAR

Remember: Living things normally show a lot of variation, so…

It is VERY unlikely that the mean height of our two samples will be exactly the same 211 Sample Average height = 162 cm 210 Sample Average height = 168 cm Is the difference in average height of the samples large enough to be significant?

We can analyze the spread of the heights of the students in the samples by drawing histograms Here, the ranges of the two samples have a small overlap, so… … the difference between the means of the two samples IS probably significant Frequency Height (cm) 211 Sample Frequency Height (cm) 210 Sample

Here, the ranges of the two samples have a large overlap, so… … the difference between the two samples may NOT be significant. The difference in means is possibly due to random sampling error Frequency Height (cm) 211 Sample Frequency Height (cm) 210 Sample

To decide if there is a significant difference between two samples we must compare the mean height for each sample… … and the spread of heights in each sample. Statisticians calculate the standard deviation of a sample as a measure of the spread of a sample S x = Σx 2 - (Σx) 2 n n - 1 Where: Sx is the standard deviation of sample Σ stands for ‘sum of’ x stands for the individual measurements in the sample n is the number of individuals in the sample You can calculate standard deviation using the formula

Student’s t-test The Student’s t-test compares the averages and standard deviations of two samples to see if there is a significant difference between them. We start by calculating a number, t t can be calculated using the equation: ( x 1 – x 2 ) (s 1 ) 2 n1n1 (s 2 ) 2 n2n2 + t = Where: x 1 is the mean of sample 1 s 1 is the standard deviation of sample 1 n 1 is the number of individuals in sample 1 x 2 is the mean of sample 2 s 2 is the standard deviation of sample 2 n 2 is the number of individuals in sample 2

Worked Example: Random samples were taken of pupils in 211 and 210 Their recorded heights are shown below… Students in 211Students in 210 Student Height (cm) Step 1: Work out the mean height for each sample : x 1 = : x 2 = Step 2: Work out the difference in means 6.67x 2 – x 1 = – =

Step 3: Work out the standard deviation for each sample 211: s 1 = : s 2 =11.74 Step 4: Calculate s 2 /n for each sample (s 1 ) 2 n1n1 = 211: ÷ 15 =7.86 (s 2 ) 2 n2n2 = 210: ÷ 15 = 9.19

Step 5: Calculate (s 1 ) 2 n1n1 + (s 2 ) 2 n2n2 (s 1 ) 2 n1n1 + (s 2 ) 2 n2n2 = ( )=4.13 Step 6: Calculate t(Step 2 divided by Step 5) t = (s 1 ) 2 n1n1 + (s 2 ) 2 n2n2 = x 2 – x = 1.62

Step 7: Work out the number of degrees of freedom d.f. = n 1 + n 2 – 2 = – 2 =28 Step 8: Find the critical value of t for the relevant number of degrees of freedom Use the 95% (p=0.05) confidence limit Critical value = Our calculated value of t is below the critical value for 28d.f., therefore, there is no significant difference between the height of students in samples from 211 and 210

Ethics in Statistics Statistics are funny things; they can be misused and used to prove almost anything. –50% of the population of the US has below average intelligence! –4 of 5 doctors recommend….

Ways of Misusing Data Recreate experiment until the numbers say what you want. (interview MANY groups of 5 dentists) NEVER IGNORE DATA THAT DOESN’T SEEM TO MATCH YOUR BELIEFS!!!!!

Misusing Data: Dos and Don’ts The amount of data matters! More data is more reliable Avoid bias! Don’t have a perceived idea of what will happen (this is hard!) Accidents occur! 5% of results are likely to happen by accident (False alarm probability) Don’t discard some data; all data is important even if unexpected. Don’t over generalize (All apples are red) Don’t manipulate Don’t make up “better” data

Misusing Data Correlations don’t equal Causation!! –Experiments don’t PROVE; they merely suggest correlations *People with more moles live longer.

Data Misuse Examples Bell Curve –There are substantial individual and group differences in intelligence; these differences profoundly influence the social structure and organization of work in modern industrial societies, and they defy easy remediation.

FREAKONOMICS Schoolteachers and Sumo wrestlers have a lot in common. The Ku Klux Klan and Real Estate Agents work under the same principles Abortion lessons crime! Do names matter? –Low education parents: Ricky, Terry, Larry, Jazmine, Misty, Mercedes –High education parents: Marie-Claire, Glynnis, Aviva, Finnegan, MacGregor, Harper –???: Loser, Winner, Temptress, Sir John, Precious