Download presentation
Presentation is loading. Please wait.
2
Statistics: Data Analysis and Presentation Fr Clinic II
3
Overview n Tables and Graphs n Populations and Samples n Mean, Median, and Standard Deviation n Standard Error & 95% Confidence Interval (CI) n Error Bars n Comparing Means of Two Data Sets n Linear Regression (LR)
4
Warning n Statistics is a huge field, I’ve simplified considerably here. For example: –Mean, Median, and Standard Deviation n There are alternative formulas –Standard Error and the 95% Confidence Interval n There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…) –Error Bars n Don’t go beyond the interpretations I give here! –Comparing Means of Two Data Sets n We just cover the t test for two means when the variances are unknown but equal, there are other tests –Linear Regression n We only look at simple LR and only calculate the intercept, slope and R 2. There is much more to LR!
5
Tables Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns 4 5 12
6
Figures 11 Figure 1: Turbidity of Pond Water, Treated and Untreated 20 10 7 5 1 11 Consistent Format, Title, Units Good Axis Titles, Big Fonts
7
Populations and Samples n Population –All of the possible outcomes of experiment or observation n US population n Particular type of steel beam n Sample –A finite number of outcomes measured or observations made n 1000 US citizens n 5 beams n We use samples to estimate population properties –Mean, Variability (e.g. standard deviation), Distribution n Height of 1000 US citizens used to estimate mean of US population
8
Mean and Median n Turbidity of Treated Water (NTU) Mean = Sum of values divided by number of samples = (1+3+3+6+8+10)/6 = 5.2 NTU Median = The middle number Rank - 1 2 3 4 5 6 Number - 1 3 3 6 8 10 For even number of sample points, average middle two = (3+6)/2 = 4.5 = (3+6)/2 = 4.5 1 3 6 8 10 Excel: Mean – AVERAGE; Median - MEDIAN
9
Variance n Measure of variability –sum of the square of the deviation about the mean divided by degrees of freedom n = number of data points Excel: variance – VAR
10
n Square-root of the variance n For phenomena following a Normal Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of the mean n Area under curve is probability of getting value within specified range Standard Deviation, s -1.961.96 95% Standard Deviations from Mean Excel: standard deviation – STDEV
11
n Standard error of mean –Of sample of size n –taken from population with standard deviation s –Estimate of mean depends on sample selected –As n , variance of mean estimate goes down, i.e., estimate of population mean improves –As n , mean estimate distribution approaches normal, regardless of population distribution Standard Error of Mean
12
n Interval within which we are 95 % confident the true mean lies n t 95%,n-1 is t-statistic for 95% CI if sample size = n –If n 30, let t 95%,n-1 = 1.96 (Normal Distribution) –Otherwise, use Excel formula: TINV(0.05,n-1) n n = number of data points 95% Confidence Interval (CI) for Mean
13
n Show data variability on plot of mean values n Types of error bars include: n ± Standard Deviation, ± Standard Error, ± 95% CI n Maximum and minimum value Error Bars
14
n Standard Deviation –Demonstrates data variability, but no comparison possible n Standard Error –If bars overlap, any difference in means is not statistically significant –If bars do not overlap, indicates nothing! n 95% Confidence Interval –If bars overlap, indicates nothing! –If bars do not overlap, difference is statistically significant n We’ll use 95 % CI Using Error Bars to compare data
15
Example 1 Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.
16
Example 2
17
What can we do? n Plot mean water quality data for various filters with error bars n Plot mean water quality over time with error bars
18
Comparing Filter Performance n Use t test to determine if the mean of two populations are different. –Based on two data sets n E.g., turbidity produced by two different filters
19
Comparing Two Data Sets using the t test n Example - You pump 20 gallons of water through filter 1 and 2. After every gallon, you measure the turbidity. –Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20 –Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20 n You ask the question - Do the Filters make water with a different mean turbidity?
20
Do the Filters make different water? n Use TTEST (Excel) n Fractional probability of being wrong if you answer yes –We want probability to be small 0.01 to 0.10 (1 to 10 %). Use 0.01
21
“t test” Questions n Do two filters make different water? –Take multiple measurements of a particular water quality parameter for 2 filters n Do two filters treat difference amounts of water between cleanings? –Measure amount of water filtered between cleanings for two filters n Does the amount of water a filter treats between cleaning differ after a certain amount of water is treated? –For a single filter, measure the amount of water treated between cleanings before and after a certain total amount of water is treated
22
Linear Regression n Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to get equation and R 2.
23
R 2 - Coefficient of multiple Determination ŷ i = Predicted y values, from regression equation y i = Observed y values R 2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.