AP Biology Resources Statistical Analysis and Graphing
Introduction to Data Analysis Data is shorthand for information! Data can be precise, accurate, or both … Precision describes the reproducibility of a result. For example, if you measure a quantity several times and the values agree closely with one another, your measurement is precise. Accuracy describes how close a measured value is to the true or known value. The closer a measured value is to the true value, the more accurate it is. 2
Data Measurement & Sampling Guides the final experimental analysis Influenced by Sampling Use of controls Experimental error Measuring precision Instruments and methods used to collect data must be validated for accuracy Sampling is the main technique employed for data selection 3
Statistics “ The mathematical study of the likelihood and probability of events occurring based on known information and inferred by taking a limited number of samples.” From: Descriptive Statistics describe the population or sample from which the data were derived. Examples Range Min/Max Average(s) Median Mode Variance Standard Deviation Histograms and Normal Distributions 4
Averages Measures of Central Tendency Commonly called “averages,” measures of central tendency are important in statistics because of their ability to summarize entire sets of data with a single number. There are many types of averages but the most well known are the mean, median, and mode of a set. For the following set {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} The three named averages are Mean = 68.6 Median = 68 Mode = 74 5
Mean The Center of Mass The arithmetic mean is a value that is computed by dividing the sum of a set of terms by the number of terms. It is sometimes called “the average”, but it is more specific to call it “the mean”. For the following set {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} 1. Sum all terms = Then, divide by the number of terms (20 in this case). Mean = 68.6 Excel Function: AVERAGE() 6
Median The Middle Number(s) The median is the "middle" value when the list of numbers is ordered sequentially. For an even number of terms the median is usually the mean of the middle terms. For the following set {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} Reordered in sequence… {45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100} Median = ( ) ÷ 2 = 68 In a set with an even number of terms, it is occasionally appropriate to simply choose one of the middle values – necessity and nature of choice depends on context. Excel Function: MEDIAN() 7
Mode The Most Frequent The mode is the value that occurs the most frequently in a data set. Sets can be unimodal, bimodal, trimodal, or multimodal. A dot plot is useful for quickly identifying the mode(s) of a set. For the following set {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} Mode = 74 Excel Function: MODE() – returns the smallest mode if there are multiples Excel Function (2010): MODE.MULTI() – this must be entered as an array function to work properly
Comparing Mean, Median, and Mode The median may or may not be close to the mean. The data may or may not be symmetrical around the mean value. The mode, although it is the most frequent value, may not be close to either the mean or the median. Mean = 68.6 Median = 68 Mode = 74 9
The Mean and Balance We can think of the mean as the place where a set of identical weights put at different locations on a number line would balance. Mean =
However … Values in a Sample May Not Be Equally Important In a set of test scores, the score on the final exam may count more (carry more weight) than the scores on quizzes and chapter tests, In calculating the quality of water in an area, nine different parameters may be considered, but two of the parameters may be more critical for human safety Weighted Mean =
Weighted Mean Regain Your Balance A weighted mean uses the “heaviness” or “importance” of each element to find a new balance point… Weighted Mean =
Weighted Mean Mean, Corrected for Weight The weighted mean is similar to the mean of a frequency distribution, where “weight” and “frequency” are interchangeable. For the following set {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} There must be a corresponding set of weights: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4} Follow these steps to find the weighted mean. 1.Find the product of each value with its weight. 2.Sum the products. 3.Divide by the sum of weights. Ex. 73(1) + 66(1) + 67(1)… Weighted Mean ≅ 72.7 Excel Function: SUMPRODUCT() ÷ SUM() 13
Water Quality Index (WQI) WQI is an example of a weighted average (i.e., weighted mean) and, since the weights sum to one, it is somewhat easier to calculate. Test Results (Column A) Q-Value (Column B) Weighting Factor (Column C) TOTAL (Column D) 1. DO% sat Fecal Coliformcolonies/100 ml pHunits BODmg/L TemperatureΔ°C Total Phosphatemg/L Nitratesmg/L TurbidityNTU or Ft Total Solidsmg/L
Special Note The term profile graph is occasionally used to refer to a line graph that represents the change in a variable like WQI over a geographical area (e.g., by milepost). 15
Deviant Data Measures of Dispersion Measures of dispersion or “spread” represent how much the data differs or deviates in general or from the mean, median, or mode. Measures include Range Variance Standard Deviation And more! 16
Range A Measure of Dispersion The range is the difference between the minimum and maximum values in a data set. A large range usually (but not always) indicates wide dispersion of the values. {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} Range = 100 – 45 = 65 Excel Functions: MAX(), MIN() 17
Variance (s 2 ) Another Measure of Dispersion The variance of a set describes how far the numbers lie from the mean (or expected value). To calculate the variance 1.Determine the mean of the set. 2.Find each value’s deviation from the mean. 3.Square each of the deviations. 4.Sum the squared deviations and divide by N-1. N is the number of values in the set. N-1 is the correction factor for the variance of a sample. Population variance requires division by N. Data set:{30, 23, 22} Mean:75 ÷ 3 = 25 Variance:38 ÷ 2 = 19 Excel Functions: VARP(), VAR() Totals:
Variance (s 2 ) Another Measure of Dispersion The formula for variance is written as shown below. You may recall that the mean of the set below is 68.6, as calculated previously. {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} Variance s 2 = [(73 – 68.6) 2 + (66 – 68.6) 2 + (69 – 68.6) 2 + (67 – 68.6) 2 + … ] ÷ 20-1 ≈
Standard Deviation(s) You Guessed It … One More Measure of Dispersion Standard deviation is the square root of the variance. It can be thought of as the average distance from the mean of a data set. {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} Standard Deviation = Excel Functions: STDEVP(), STDEV() 20
Standard Deviation You Guessed It … One More Measure of Dispersion Find the standard deviation of this sample set: {1, 2, 3, 4, 5} Step 1: Calculate the mean ( ) of the set Step 2: Find each value deviation. Step 3: Square each of the deviations. Step 4: Sum the deviations and divide by N-1. Step 5: Take the square root Where: Σ is the sum x is an element of the sample is is the mean of the set N is the sample size (number of values) Step 2 Step 3 21
Standard Deviation Tells a Different Story than the Mean Mean = 15.5 #s = Mean = 15.5 #s =
Frequency Table A frequency table is a table that lists items in a set and records the number of occurrences. Choose categories and group the data appropriately. {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} Category Labels Frequency >901 23
Histogram A histogram is a graphical representation, similar to a bar chart in structure, that organizes a group of data points into user-specified ranges. The histogram condenses a data series into an easily interpreted visual by taking many data points and grouping them into logical ranges or bins. 24
Histogram A histogram is simply a bar* chart of a frequency table *A histogram in Excel is called a “column” chart. Category Labels Frequency >901 25
Histogram Analysis Histogram Data Set Analysis – Test Scores {73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100} Mean68.6 Median68 Mode74 Mean (68.6) and Median (68) Mode (74) -1 SD +1 SD 26
Distributions Descriptive statistics can be easier to interpret when graphically illustrated. Charting each data element can lead to very busy and confusing charts that do not help interpret the data. Dot plots and histograms provide a graphical illustration of how the data is distributed throughout its range. 27
Normal Distributions Normal distribution is considered the most prominent probability distribution in statistics. Bell Curve Shape Symmetrical Mean = Median = Mode Mean, Median, Mode 28
Neo/SCI® PO Box 3000 Nashua, NH USA Phone: FAX:
Credits Content: Kenneth G. Rainis, Eddie Marroon Powered by: Pixel KNOWLEDGE © Delta Education, LLC. A member of the School Specialty family. All rights reserved.