Statistics Review for AP Biology From BSCS: Interaction of experiments and ideas, 2nd Edition. Prentice Hall, 1970 and Statistics for the Utterly Confused by Lloyd Jaisingh, McGraw-Hill, 2000
What is statistics? a branch of mathematics that provides techniques to analyze whether or not your data is significant (meaningful) Statistical applications are based on probability statements Nothing is “proved” with statistics Statistics are reported Statistics report the probability that similar results would occur if you repeated the experiment
Statistics deals with numbers Need to know nature of numbers collected Continuous variables: type of numbers associated with measuring or weighing; any value in a continuous interval of measurement. Examples: Weight of students, height of plants, time to flowering Discrete variables: type of numbers that are counted or categorical Numbers of boys, girls, insects, plants
Can you figure out… Which type of numbers? (discrete or continuous) Numbers of persons preferring Brand X in 5 different towns The weights of high school seniors The lengths of oak leaves The number of seeds germinating 35 tall and 12 dwarf pea plants
Can you figure out… Which type of numbers? (discrete or continuous) Numbers of persons preferring Brand X in 5 different towns The weights of high school seniors The lengths of oak leaves The number of seeds germinating 35 tall and 12 dwarf pea plants Answers: all are discrete except the 2nd and 3rd examples are continuous.
Populations and Samples Population includes all members of a group Example: all 9th grade students in America Number of 9th grade students at W-H Sample Used to make inferences about large populations Samples are a selection of the population Example: 2nd period AP Biology Why the need for statistics? Statistics are used to describe sample populations as estimators of the corresponding population Many times, finding complete information about a population is costly and time consuming. We can use samples to represent a population.
Sample Populations avoiding Bias Individuals in a sample population Must be a fair representation of the entire pop. Therefore sample members must be randomly selected (to avoid bias) Example: if you were looking at strength in students - picking students from the football team would NOT be random
Is there bias? A cage has 1000 rats, you pick the first 20 you can catch for your experiment. A public opinion poll is conducted using the telephone directory. You are conducting a study of a new diabetes drug; you advertise for participants in the newspaper and TV.
Is there bias? A cage has 1000 rats, you pick the first 20 you can catch for your experiment A public opinion poll is conducted using the telephone directory You are conducting a study of a new diabetes drug; you advertise for participants in the newspaper and TV All are biased: Rats-you grab the slower rats. Telephone-you call only people with a phone (wealth?) and people who are listed (responsible?). Newspaper/TV-you reach only people with newspaper (wealth/educated?) and TV( wealth?).
Statistical Computations (the Math) If you are using a sample population Arithmetic Mean (average) The mean shows that ½ the members of the pop fall on either side of an estimated value: mean The sum of all the scores divided by the total number of scores. http://en.wikipedia.org/wiki/Table_of_mathematical_symbols
Looking at profile of data: Distribution Distribution Chart of Heights of 100 Control Plants Looking at profile of data: Distribution What is the frequency of distribution, where are the data points? Distribution Chart of Heights of 100 Control Plants Class (height of plants-cm) Number of plants in each class 0.0-0.9 3 1.0-1.9 10 2.0-2.9 21 3.0-3.9 30 4.0-4.9 20 5.0-5.9 14 6.0-6.9 2
Histogram-Frequency Distribution Charts This is called a “normal” curve or a bell curve This is an “idealized” curve and is theoretical based on an infinite number derived from a sample
One of the first steps in data analysis is to create graphical displays of the data. Visual displays can make it easy to see patterns and can clarify how two variables affect each other. AP Biology Quantitative Skills Manual
Line Graphs Used when data on both scales of the graph (the x and y axes) are continuous. The dots indicate measurements that were actually made.
Basic Traits of A Good Graph 1. A Good Title A good title is one that tells exactly what information the author is trying to present with the graph. Relation Between Study Time and Score on a Biology Exam in 2011 -or- Study Time vs. Score on a Biology Exam in 2011 AP® Biology Investigative Labs: An Inquiry-Based Approach
Basic Traits of A Good Graph Axes should be consistently numbered. Axes should contain labels, including units. AP® Biology Investigative Labs: An Inquiry-Based Approach
Basic Traits of A Good Graph A frame should be put around the outside of the graph. AP® Biology Investigative Labs: An Inquiry-Based Approach
Basic Traits of A Good Graph Small marks, called index marks, can be drawn in. AP® Biology Investigative Labs: An Inquiry-Based Approach
Basic Traits of A Good Graph The independent variable is always shown on the x axis. The dependent variable is always shown on the y axis. Dependent Variable AP® Biology Investigative Labs: An Inquiry-Based Approach Independent Variable
Basic Traits of A Good Graph The line should not be extended to the origin if the data do not start there. AP® Biology Investigative Labs: An Inquiry-Based Approach
Bar Graphs Used to visually compare two samples of categorical or count data. Are also used to visually compare the calculated means with error bars of normal data . AP Biology Quantitative Skills Manual
Sample standard error bars (also known as the sample error of the sample mean) are the notations at the top of each shaded bar that shows the sample standard error (SE). AP Biology Quantitative Skills Manual
Mode and Median Mode: most frequently seen value (if no numbers repeat then the mode = 0) Median: the middle number If you have an odd number of data then the median is the value in the middle of the set If you have an even number of data then the median is the average between the two middle values in the set.
Q1 Calculate…
Q1 Answer
Fast Plants Data Analysis Calculate Mean
Fast Plants Data Analysis
Standard Deviation An important statistic that is also used to measure variation in biased samples. S is the symbol for standard deviation Calculated by taking the square root of the variance (Bozeman) Say an sample of pea plants has the following: Mean = 8cm; Variance = 2.5 ; s=1.6 Thus the measurements vary plus or minus +/- 1.6 cm from the mean
What does “S” mean? We can predict the probability of finding a pea plant at a predicted height… the probability of finding a pea plant above 12.8 cm or below 3.2 cm is less than 1% S is a valuable tool because it reveals predicted limits of finding a particular value
The Normal Curve and Standard Deviation A normal curve: Each vertical line is a unit of standard deviation 68% of values fall within +1 or -1 SD of the mean 95% of values fall within +2 & -2 SD units Nearly all members (>99%) fall within 3 std dev units http://classes.kumc.edu/sah/resources/sensory_processing/images/bell_curve.gif
Pea Plant Normal Distribution Curve with Std Dev
Standard Error of the Sample Means AKA Standard Error The mean, the variance, and the std dev help estimate characteristics of the population from a single sample So if many samples were taken then the means of the samples would also form a normal distribution curve that would be close to the whole population. The larger the samples the closer the means would be to the actual value But that would most likely be impossible to obtain so use a simple method to compute the means of all the samples
A Simple Method for estimating standard error Standard error is the calculated standard deviation divided by the square root of the size, or number of the population Standard error of the means is used to test the reliability of the data Example… If there are 10 corn plants with a standard deviation of 0.2 Sex = 0.2/ sq root of 10 = 0.2/3.03 = 0.006 0.006 represents one std dev in a sample of 10 plants If there were 100 plants the standard error would drop to 0.002 Why? Because when we take larger samples, our sample means get closer to the true mean value of the population. Thus, the distribution of the sample means would be less spread out and would have a lower standard deviation.
Sample standard error bars (also known as the sample error of the sample mean) are the notations at the top of each shaded bar that shows the sample standard error (SE). AP Biology Quantitative Skills Manual
Fast Plants Graph
Fast Plants Graph
Probability Tests What to do when you are comparing two samples to each other and you want to know if there is a significant difference between both sample populations (example the control and the experimental setup) How do you know there is a difference How large is a “difference”? How do you know the “difference” was caused by a treatment and not due to “normal” sampling variation or sampling bias?
Laws of Probability The results of one trial of a chance event do not affect the results of later trials of the same event. p = 0.5 ( a coin always has a 50:50 chance of coming up heads) The chance that two or more independent events will occur together is the product of their changes of occurring separately. (one outcome has nothing to do with the other) Example: What’s the likelihood of a 3 coming up on a dice: six sides to a dice: p = 1/6 Roll two dice with 3’s p = 1/6 *1/6= 1/36 which means there’s a 35/36 chance of rolling something else… Note probabilities must equal 1.0
Laws of Probability (continued) The probability that either of two or more mutually exclusive events will occur is the sum of their probabilities (only one can happen at a time). Example: What is the probability of rolling a total of either 2 or 12? Probability of rolling a 2 means a 1 on each of the dice; therefore p = 1/6*1/6 = 1/36 Probability of rolling a 12 means a 6 and a 6 on each of the dice; therefore p = 1/36 So the likelihood of rolling either is 1/36+1/36 = 2/36 or 1/18
The Use of the Null Hypothesis Is the difference in two sample populations due to chance or a real statistical difference? The null hypothesis assumes that there will be no “difference” or no “change” or no “effect” of the experimental treatment. If treatment A is no better than treatment B then the null hypothesis is supported. If there is a significant difference between A and B then the null hypothesis is rejected...
Chi square Used with discrete values Phenotypes, choice chambers, etc. Not used with continuous variables (like height… use t-test for samples less than 30 and z-test for samples greater than 30) O= observed values E= expected values http://www.jspearson.com/Science/chiSquare.html
http://course1.winona.edu/sberg/Equation/chi-squ2.gif
Interpreting a chi square Calculate degrees of freedom # of events, trials, phenotypes -1 Example 2 phenotypes-1 =1 Generally use the column labeled 0.05 (which means there is a 95% chance that any difference between what you expected and what you observed is within accepted random chance. Any value calculated that is larger means you reject your null hypothesis and there is a difference between observed and expect values.
How to use a chi square chart http://faculty.southwest.tn.edu/jiwilliams/probab2.gif
Q1: Chi Square A hetero red eyed female was crossed with a red eyed male. The results are shown below. Red eyes are sex-linked dominant to white, determine the chi square value. Round to the nearest hundredth. Phenotype # flies observed Red Eyes 134 White Eyes 66
Chi Square Strategy Given—observed You have to figure out expected. Usually to do a Punnett square to figure this out Plug in + +
Chi-Square Expected Observed—134 red eyes, 66 white eyes XR Xr white + XR XR XR Xr XR XR Xr Y Y Y (134-150)2 /150 (66-50)2 /50 + 3:1 ratio 134+ 66=200 150 red 50 white 1.70666 + + 5.12 6.83
chi square problems 2013 AP Exam
chi square problems 2013 AP Exam
chi square problems 2013 AP Exam
AP Biology Math Review 2015 Take out an APPROVED calculator and formula sheet. 2) You will solve each problem and grid in the answer.
Tips .123 The 1 is in the tenths place The 2 is in the hundreds place Grid LEFT to right Use the formula sheet Don’t round until the end Look at HOW the answer should be given “round to nearest…” .123 The 1 is in the tenths place The 2 is in the hundreds place The 3 is in the thousandths place
Tips
Q2 Calculate…
Q2 Calculate…Answer
Rate
Amount of O2 produced (mL) Q2: Rate Hydrogen peroxide is broken down to water and oxygen by the enzyme catalase. The following data were taken over 5 minutes. What is the rate of enzymatic reaction in mL/min from 2 to 4 minutes? Round to the nearest hundreds Time (mins) Amount of O2 produced (mL) 1 2.3 2 3.6 3 4.2 4 5.5 5 5.9
Rise/run= rate= 5.5-3.6/4-2 Rise/run= rate=1.9/2 Rise/run= rate= .95 Q2 Answer: Rise/run= rate= 5.5-3.6/4-2 Rise/run= rate=1.9/2 Rise/run= rate= .95
Q3 Calculate Rate…
Q3 Answer
Q4 Hardy-Weinberg
Q2 Answer
Q2: Surface Area and Volume What is the SA/V for this cell? Round your answer to the nearest hundredths.
Q2 SA= 4 r2 =4(3.14) 52 =314 Volume of a sphere= 4/3 r3 =4/3 (3.14)53 =523.33 SA/V=314/523.33 =.60
Q3: Water Potential and Solution Potential Solute potential= –iCRT i = The number of particles the molecule will make in water; for NaCl this would be 2; for sucrose or glucose, this number is 1 C = Molar concentration (from your experimental data) R = Pressure constant = 0.0831 liter bar/mole K T = Temperature in degrees Kelvin = 273 + °C of solution Sample Problem The molar concentration of a sugar solution in an open beaker has been determined to be 0.3M. Calculate the solute potential at 27 degrees celsius. Round your answer to the nearest tenths.
Q3 Solute potential= –iCRT -i= 1 C= 0.3 R = Pressure constant = 0.0831 T= 27 +273=300K Solute concentration= -7.5
Q4: Hardy Weinberg A census of birds nesting on a Galapagos Island revealed that 24 of them show a rare recessive condition that affected beak formation. The other 63 birds in this population show no beak defect. If this population is in HW equilibrium, what is the frequency of the dominant allele? Give your answer to the nearest hundredth
Hardy Weinberg Strategy Figure out what you are given Allele (p or q) or Genotypes (p2, 2pq, q2) Figure out what you are solving for Manipulate formulas to go from given to solving for Always dealing with decimals
Q4:Looking for p—dominant allele Homozygous Recessive=q2=24/87= .2758 q2= .2758 q=.5252 p+q=1 p=.47
Amount of O2 produced (mL) Q5: Rate Hydrogen peroxide is broken down to water and oxygen by the enzyme catalase. The following data were taken over 5 minutes. What is the rate of enzymatic reaction in mL/min from 2 to 4 minutes? Round to the nearest hundreds Time (mins) Amount of O2 produced (mL) 1 2.3 2 3.6 3 4.2 4 5.5 5 5.9
Rise/run= rate= 5.5-3.6/4-2 Rise/run= rate=1.9/2 Rise/run= rate= .95 Q5 Rise/run= rate= 5.5-3.6/4-2 Rise/run= rate=1.9/2 Rise/run= rate= .95
Q6: Laws of Probability Calculate the probability of tossing three coins simultaneously and obtaining three heads. Express in fraction form.
½ X ½ X ½=1/8 Q6 Probability of a heads is ½ Probability of heads AND a heads AND a heads ½ X ½ X ½=1/8
Q7: Population Growth N—total number in pop r—rate of growth There are 2,000 mice living in a field. If 1,000 mice are born each month and 200 mice die each month, what is the per capita growth rate of mice over a month? Round to the nearest tenths.
N=2000 rmax=1000-200=800 800/2000= 0.4
Q8 The net annual primary productivity of a particular wetland ecosystem is found to be 8,000 kcal/m2. If respiration by the aquatic producers is 12,000 kcal/m2per year, what is the gross annual primary productivity for this ecosystem, in kcal/m2 per year? Round to the nearest whole number.
Q8 NPP=GPP-R 8,000 = GPP – 12,000 8,000+ 12,000= GPP 20,000=GPP
Q9: Q10 Data taken to determine the effect of temperature on the rate of respiration in a goldfish is given in the table below. Calculate Q10 for this data. Round to the nearest whole number. Temperature (C) Respiration Rate (Minute) 16 21 22
Q9 Q10= ( 22 /16) 10/(21-16) Q10= (1.375) 2 Q10= 2
Q10:Standard Deviation Grasshoppers in Madagascar show variation in their back-leg length. Given the following data, determine the standard deviation for this data. Round the answer to the nearest hundredth. Length(cm): 2.0, 2.2, 2.2, 2.1, 2.0, 2.4 and 2.5
Average = 2.0 + 2.2 +2.2+2.1+2.0 +2.4 +2.5/7=2.2 Dev = -.2+ 0+ 0+-.1+-.2+.2+.3 Dev Squared = .04+0+0+.01+.04 +.04+.09= Sum of the Devs Squared = 0.22
Q11: Dilution Joe has a 2 g/L solution. He dilutes it and creates 3 L of a 1 g/L solution. How much of the original solution did he dilute? Round to the nearest tenths.
We are looking for V1: C1V1 = C2V2 2V1 = 1(3) V1= 1.5
Q12: log What is the hydrogen ion concentration of a solution of pH 8? Round to the nearest whole number
Q12 [H+] if pH = 8.0 [H+] = 10-pH [H+] = 10-8.0 1÷10⁸ = 0.00000001
Q13:Gibbs Free Energy PICK THE BEST CHOICE: A chemical reaction is most likely to occur spontaneously if the a) Free energy is negative b) Entropy change is negative c) Activation energy is positive d) Heat of reaction is positive
Q13 Answer: A
Variance (s2) Mathematically expressing the degree of variation of scores (data) from the mean A large variance means that the individual scores (data) of the sample deviate a lot from the mean. A small variance indicates the scores (data) deviate little from the mean
Calculating the variance for a whole population Σ = sum of; X = score, value, µ = mean, N= total of scores or values OR use the VAR function in Excel http://www.mnstate.edu/wasson/ed602calcvardevs.htm
Calculating the variance for a Biased SAMPLE population Σ = sum of; X = score, value, n -1 = total of scores or values-1 (often read as “x bar”) is the mean (average value of xi). Note the sample variance is larger…why? http://www.mnstate.edu/wasson/ed602calcvardevs.htm
Squares of deviation from mean Heights in Centimeters of Five Randomly Selected Pea Plants Grown at 8-10 °C Plant Height (cm) Deviations from mean Squares of deviation from mean (xi) (xi- x) (xi- x)2 A 10 2 4 B 7 -1 1 C 6 -2 D 8 E 9 Σ xi = 40 Σ (xi- x) = 0 Σ (xi- x)2 = 10 Xi = score or value; X (bar) = mean; Σ = sum of
Σ xi = 40 Σ (xi- x) = 0 Σ (xi- x)2 = 10 Finish Calculating the Variance Σ xi = 40 Σ (xi- x) = 0 Σ (xi- x)2 = 10 There were five plants; n=5; therefore n-1=4 So 10/4= 2.5 Variance helps to characterize the data concerning a sample by indicating the degree to which individual members within the sample vary from the mean
Q2 Answer