Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics II: By the end of this class you should be able to: describe the meaning of and calculate the mean and standard deviation of a sample.

Similar presentations


Presentation on theme: "Descriptive Statistics II: By the end of this class you should be able to: describe the meaning of and calculate the mean and standard deviation of a sample."— Presentation transcript:

1 Descriptive Statistics II: By the end of this class you should be able to: describe the meaning of and calculate the mean and standard deviation of a sample estimate normal proportions based on mean and standard deviation plot a histograms with alternative scaling Palm: Section 7.1, 7.2 please download cordbreak1.mat & FWtemperature.txt

2 Exercise Download FWTemperature.txt Read into MATLAB Prepare a single figure with two plots –a histogram of March highs (row 2) –a histogram of April highs (row 4) Label these plots fully Print out the your commands and the resulting figure

3 Review: Quantifying Variation Mean Central Tendency >> mean(x) Standard Deviation Spread >> std(x) difference  deviation of each point about the mean squared  all values positive Summation  yields one number Divide by n-1  normalize the sum for based on degrees of freedom

4 FormulaMATLABEXCEL Mean >> mean( variable )= average( range ) Sample Standard Deviation >> std( variable )= stdev( range )

5 Calculate Mean & Standard Deviation for the Cord Sample > mean(data2) ans = 266.9500 >> std(data2) ans = 52.8212 The Normal (Gausian) Distribution  the bell curve (See next slide) A probability density function The area under any segment of the curve = the probability a point will fall in that region Standard Normal is centered about 0 (I.E., mean = 0) and marked off in number of standard deviations from the mean. Standard deviation is the distance from the mean to the inflection point of the curve.

6 The Normal (Gaussian) Distribution (Population) Standard Deviation Mean Mode

7 Note on Sample and Population Statistics Sample (The estimate from a sample of the whole population) Population (The true value from the entire population) Standard Deviation s  Mean or m 

8

9  one standard deviation >> m = mean(cord); s=std(cord) >> UL = m + s >> LL = m - s >> n1 = sum(cord >=LL & cord<=UL) >> n1/length(cord)*100 Calculating Proportions from Cord Data  two standard deviations >> UL = m + 2*s >> LL = m – 2*s >> n1 = sum(cord >=LL & cord<=UL) >> n1/length(cord)*100  three standard deviation >> m = mean(cord); s=std(cord) >> UL = m + 3*s >> LL = m – 3*s >> n1 = sum(cord >=LL & cord<=UL) >> n1/length(cord)*100 Results #%  1s3863.3  2s5896.7  3s60100

10 Expected Proportions for known  68 % 95.5 % 99.7% Percentage of observations in the given range  1 1  2 2  3 3 mean 

11 68 % Expected Proportions for known  16 %

12 Proportions and the Normal Distribution Conditions Data follows a normal distribution (most things do but not all) Samples do not effect each other (independent) The standard deviation is known (or determined from more than 15 – 20 samples) Result: mean  one standard deviation contains 68 % of the data mean  two standard deviation contains 96 % of the data mean  three standard deviation contains 99.7 % of the data Distribution is symmetric so you can predict several portions e.g. mean to + the mean plus one sd contains 34% of the data the points greater than one sd above the mean contain 16% ((100 – 68)/2 = 16)... Compare to results from our data sample

13 Proportions Problem Data analysis of the breaking strength of a certain fabric shows that it is normally distributed with a mean of 200 lb and a variance (  2 ) of 9. Estimate the percentage of fabric samples that will have a breaking strength between 197 lb and 203 lb. Estimate the percentage of fabric samples that will have a breaking strength no less than 194 lb.

14 Proportions problem solution mean = 200, variance = 9 standard deviation = square root(variance) = 3 1.Estimate the percentage of fabric samples that will have a breaking strength between 197 lb and 203 lb. Notice this range is plus or minus one standard deviation Therefore from previous discussion 68% of the data is expected to be in this range. 2. Estimate the percentage of fabric samples that will have a breaking strength no less than 194 lb We are looking for samples with a strength greater than 194. Notice 194 is two standard deviations less than the mean. with in  2s 95 % of the data should be included. This means there is 5% in the two tails outside this range. We are only eliminating the lower tail so we need to divide by 2 resulting in 2.5% less than 194 and therefore 97.5% greater than 194

15

16 Scaled Histogram (demonstrate) one more type of histogram – to match this case fraction of total area in a given bin – messier necessary when comparing histograms with different bin widths (or comparing to a normal curve) area under curve is scaled to equal to one you must set the bin width must divide by the total number of samples times the bin width >> x=145:20:370 >> z=hist(cord,x) >> zs=z/sum(z)/20 >> bar(x,zs) plus titles etc.

17 HistogramFrequencyFormulaUse Absolute Frequency absolute count in each bin = z for a quick picture Relative Frequency fraction of total count in each bin compare samples when total counts differ Scaled Frequency fraction of total area in each bin compare samples when bin sizes differs

18 Scaled Histogram and a Normal Curve Equation for normal distribution is in text and function to calculate is available online (normal1.m) Code below can be used to add a normal distribution to a curve builds on previous scaled distribution >> % determine the mean and standard deviation >> mu= mean(cord); sigma = std(cord); >> % create an x vector >> x1 = linspace(mu - 3*sigma, mu + 3 * sigma, 100); >> % calculate the y-coordinate of the normal distribution >> A = 1/(sigma*(2*pi)^0.5); >> y=A*exp(-(x1 - mu).^2 / (2*sigma^2)); >> % Hold the graph and add the normal curve >> hold on, plot(x1,y,'g', 'LineWidth', 3)

19

20 Review: Types of Histograms TypeFreq.FormulaUseMatlab Absolute Frequency absolute count in each bin = z for a quick picture >> hist(x, n) Relative Frequency fraction of total count in each bin compare samples when total counts differ >> [x,z] = hist(x) >> zr = z/sum(z) >> bar(x, zr) Scaled Frequency fraction of total area in each bin compare samples when bin sizes differs >> b = bin centers >> [x,z] = hist(x,b) >> zs = z/(sum(z)*w) >> bar(x, zs)

21 Additional Example (not covered in class) Looking at two sets of data Look at a histogram of the second set of data, ‘cord2’ How would you compare it to cord the first set of data? What problems do you run into?

22 How to Compare two data sets Could use figure command to plot both histograms or Could use subplots to plot both histograms >> subplot(1,2,1) >> hist(cord) >> ylabel 'Absolute Frequency', xlabel 'Breaking Strength(N)' >> title 'First Cord Sample' >> subplot(1,2,2) >> hist(cord2) >> ylabel 'Absolute Frequency', xlabel 'Breaking Strength(N)' >> title 'Second Cord Sample' >> ylim([0 12])

23 Resulting histograms: Issues: Different x value bins bar heights different because of different sample sizes (separate graphs can be hard to compare)

24 1. using the same bins histogram command can save the bin locations and can used saved bin locations: >> [z1,x]=hist(cord); >> z2 = hist(cord2,x); >> bar(x,z1) >> bar(x,z2) 2. dealing with sample size can better compare if bins contain the relative frequency (samples in bin/total samples) rather than absolute frequencies. >> [z1,x]=hist(cord); >> z2 = hist(cord2,x); >> zr1 = z1/sum(z1); >> zr2 = z2/sum(z2); >> bar(x,zr1) >> bar(x,zr2) 3. In these cases the histogram command does not produce a graph and the bar command is used to create the graph 4. As before we can create the plots on two figures, in different subplots or plotted on one graph  Plotting on one graph the bar command can plot on the same graph >> zr=[zr1',zr2']; >> bar(x,zr) >> ylabel ‘Relative Frequency' >> xlabel 'Breaking Strength(N)' >> legend('First Cord Sample', 'Second Cord Sample')

25


Download ppt "Descriptive Statistics II: By the end of this class you should be able to: describe the meaning of and calculate the mean and standard deviation of a sample."

Similar presentations


Ads by Google