Descriptive Statistics II: By the end of this class you should be able to: describe the meaning of and calculate the mean and standard deviation of a sample.

Slides:



Advertisements
Similar presentations
Empirical Model Building I: Objectives: By the end of this class you should be able to: find the equation of the “best fit” line for a linear model explain.
Advertisements

Section #1 October 5 th Research & Variables 2.Frequency Distributions 3.Graphs 4.Percentiles 5.Central Tendency 6.Variability.
Introduction to Summary Statistics
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
Statistics Intro Univariate Analysis Central Tendency Dispersion.
Transforms What does the word transform mean?. Transforms What does the word transform mean? –Changing something into another thing.
Measures of Dispersion
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores.
Data observation and Descriptive Statistics
1 Chapter 4: Variability. 2 Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure.
Central Tendency and Variability
1 ECE310 – Lecture 23 Random Signal Analysis 04/27/01.
Statistical Process Control
Today: Central Tendency & Dispersion
BPT 2423 – STATISTICAL PROCESS CONTROL.  Frequency Distribution  Normal Distribution / Probability  Areas Under The Normal Curve  Application of Normal.
Psy B07 Chapter 2Slide 1 DESCRIBING AND EXPLORING DATA.
Objective To understand measures of central tendency and use them to analyze data.
Psychometrics.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Probability & the Normal Distribution
Descriptive Statistics and Graphing. The Normal Distribution If the frequency (or number) of data points is plotted on the Y-axis, a bell-shaped curve.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Descriptive Statistics I: By the end of this class you should be able to: Palm: Section 7.1, 7.2 Program cords and delays in your music programs plot a.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Measures of Dispersion & The Standard Normal Distribution 2/5/07.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
Measures of Dispersion & The Standard Normal Distribution 9/12/06.
Skewness & Kurtosis: Reference
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
STATISTICS.
What does Statistics Mean? Descriptive statistics –Number of people –Trends in employment –Data Inferential statistics –Make an inference about a population.
Find out where you can find rand and randInt in your calculator. Write down the keystrokes.
 Two basic types Descriptive  Describes the nature and properties of the data  Helps to organize and summarize information Inferential  Used in testing.
Central Tendency & Dispersion
1 Review Sections 2.1, 2.2, 1.3, 1.4, 1.5, 1.6 in text.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.
Bio-Statistic KUEU 3146 & KBEB 3153 Bio-Statistic Data grouping and presentations Part II: Summarizing Data.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
The Normal Approximation for Data. History The normal curve was discovered by Abraham de Moivre around Around 1870, the Belgian mathematician Adolph.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Statistical Methods Michael J. Watts
Doc.RNDr.Iveta Bedáňová, Ph.D.
AP Biology Intro to Statistics
Statistical Methods Michael J. Watts
Analyzing and Interpreting Quantitative Data
IENG 486: Statistical Quality & Process Control
Description of Data (Summary and Variability measures)
AP Biology Intro to Statistics
Central Tendency.
2.1 Density Curve and the Normal Distributions
Topic 5: Exploring Quantitative data
Descriptive and inferential statistics. Confidence interval
AP Biology Intro to Statistic
AP Biology Intro to Statistic
AP Biology Intro to Statistic
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Descriptive Statistics
Statistical Inference for the Mean: t-test
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Descriptive Statistics II: By the end of this class you should be able to: describe the meaning of and calculate the mean and standard deviation of a sample estimate normal proportions based on mean and standard deviation plot a histograms with alternative scaling Palm: Section 7.1, 7.2 please download cordbreak1.mat & FWtemperature.txt

Exercise Download FWTemperature.txt Read into MATLAB Prepare a single figure with two plots –a histogram of March highs (row 2) –a histogram of April highs (row 4) Label these plots fully Print out the your commands and the resulting figure

Review: Quantifying Variation Mean Central Tendency >> mean(x) Standard Deviation Spread >> std(x) difference  deviation of each point about the mean squared  all values positive Summation  yields one number Divide by n-1  normalize the sum for based on degrees of freedom

FormulaMATLABEXCEL Mean >> mean( variable )= average( range ) Sample Standard Deviation >> std( variable )= stdev( range )

Calculate Mean & Standard Deviation for the Cord Sample > mean(data2) ans = >> std(data2) ans = The Normal (Gausian) Distribution  the bell curve (See next slide) A probability density function The area under any segment of the curve = the probability a point will fall in that region Standard Normal is centered about 0 (I.E., mean = 0) and marked off in number of standard deviations from the mean. Standard deviation is the distance from the mean to the inflection point of the curve.

The Normal (Gaussian) Distribution (Population) Standard Deviation Mean Mode

Note on Sample and Population Statistics Sample (The estimate from a sample of the whole population) Population (The true value from the entire population) Standard Deviation s  Mean or m 

 one standard deviation >> m = mean(cord); s=std(cord) >> UL = m + s >> LL = m - s >> n1 = sum(cord >=LL & cord<=UL) >> n1/length(cord)*100 Calculating Proportions from Cord Data  two standard deviations >> UL = m + 2*s >> LL = m – 2*s >> n1 = sum(cord >=LL & cord<=UL) >> n1/length(cord)*100  three standard deviation >> m = mean(cord); s=std(cord) >> UL = m + 3*s >> LL = m – 3*s >> n1 = sum(cord >=LL & cord<=UL) >> n1/length(cord)*100 Results #%  1s  2s  3s60100

Expected Proportions for known  68 % 95.5 % 99.7% Percentage of observations in the given range  1 1  2 2  3 3 mean 

68 % Expected Proportions for known  16 %

Proportions and the Normal Distribution Conditions Data follows a normal distribution (most things do but not all) Samples do not effect each other (independent) The standard deviation is known (or determined from more than 15 – 20 samples) Result: mean  one standard deviation contains 68 % of the data mean  two standard deviation contains 96 % of the data mean  three standard deviation contains 99.7 % of the data Distribution is symmetric so you can predict several portions e.g. mean to + the mean plus one sd contains 34% of the data the points greater than one sd above the mean contain 16% ((100 – 68)/2 = 16)... Compare to results from our data sample

Proportions Problem Data analysis of the breaking strength of a certain fabric shows that it is normally distributed with a mean of 200 lb and a variance (  2 ) of 9. Estimate the percentage of fabric samples that will have a breaking strength between 197 lb and 203 lb. Estimate the percentage of fabric samples that will have a breaking strength no less than 194 lb.

Proportions problem solution mean = 200, variance = 9 standard deviation = square root(variance) = 3 1.Estimate the percentage of fabric samples that will have a breaking strength between 197 lb and 203 lb. Notice this range is plus or minus one standard deviation Therefore from previous discussion 68% of the data is expected to be in this range. 2. Estimate the percentage of fabric samples that will have a breaking strength no less than 194 lb We are looking for samples with a strength greater than 194. Notice 194 is two standard deviations less than the mean. with in  2s 95 % of the data should be included. This means there is 5% in the two tails outside this range. We are only eliminating the lower tail so we need to divide by 2 resulting in 2.5% less than 194 and therefore 97.5% greater than 194

Scaled Histogram (demonstrate) one more type of histogram – to match this case fraction of total area in a given bin – messier necessary when comparing histograms with different bin widths (or comparing to a normal curve) area under curve is scaled to equal to one you must set the bin width must divide by the total number of samples times the bin width >> x=145:20:370 >> z=hist(cord,x) >> zs=z/sum(z)/20 >> bar(x,zs) plus titles etc.

HistogramFrequencyFormulaUse Absolute Frequency absolute count in each bin = z for a quick picture Relative Frequency fraction of total count in each bin compare samples when total counts differ Scaled Frequency fraction of total area in each bin compare samples when bin sizes differs

Scaled Histogram and a Normal Curve Equation for normal distribution is in text and function to calculate is available online (normal1.m) Code below can be used to add a normal distribution to a curve builds on previous scaled distribution >> % determine the mean and standard deviation >> mu= mean(cord); sigma = std(cord); >> % create an x vector >> x1 = linspace(mu - 3*sigma, mu + 3 * sigma, 100); >> % calculate the y-coordinate of the normal distribution >> A = 1/(sigma*(2*pi)^0.5); >> y=A*exp(-(x1 - mu).^2 / (2*sigma^2)); >> % Hold the graph and add the normal curve >> hold on, plot(x1,y,'g', 'LineWidth', 3)

Review: Types of Histograms TypeFreq.FormulaUseMatlab Absolute Frequency absolute count in each bin = z for a quick picture >> hist(x, n) Relative Frequency fraction of total count in each bin compare samples when total counts differ >> [x,z] = hist(x) >> zr = z/sum(z) >> bar(x, zr) Scaled Frequency fraction of total area in each bin compare samples when bin sizes differs >> b = bin centers >> [x,z] = hist(x,b) >> zs = z/(sum(z)*w) >> bar(x, zs)

Additional Example (not covered in class) Looking at two sets of data Look at a histogram of the second set of data, ‘cord2’ How would you compare it to cord the first set of data? What problems do you run into?

How to Compare two data sets Could use figure command to plot both histograms or Could use subplots to plot both histograms >> subplot(1,2,1) >> hist(cord) >> ylabel 'Absolute Frequency', xlabel 'Breaking Strength(N)' >> title 'First Cord Sample' >> subplot(1,2,2) >> hist(cord2) >> ylabel 'Absolute Frequency', xlabel 'Breaking Strength(N)' >> title 'Second Cord Sample' >> ylim([0 12])

Resulting histograms: Issues: Different x value bins bar heights different because of different sample sizes (separate graphs can be hard to compare)

1. using the same bins histogram command can save the bin locations and can used saved bin locations: >> [z1,x]=hist(cord); >> z2 = hist(cord2,x); >> bar(x,z1) >> bar(x,z2) 2. dealing with sample size can better compare if bins contain the relative frequency (samples in bin/total samples) rather than absolute frequencies. >> [z1,x]=hist(cord); >> z2 = hist(cord2,x); >> zr1 = z1/sum(z1); >> zr2 = z2/sum(z2); >> bar(x,zr1) >> bar(x,zr2) 3. In these cases the histogram command does not produce a graph and the bar command is used to create the graph 4. As before we can create the plots on two figures, in different subplots or plotted on one graph  Plotting on one graph the bar command can plot on the same graph >> zr=[zr1',zr2']; >> bar(x,zr) >> ylabel ‘Relative Frequency' >> xlabel 'Breaking Strength(N)' >> legend('First Cord Sample', 'Second Cord Sample')