Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 501 Spring 2004 Go through intro doc Homework 1:

Similar presentations


Presentation on theme: "Stat 501 Spring 2004 Go through intro doc Homework 1:"— Presentation transcript:

1 Stat 501 Spring 2004 Go through intro doc Homework 1:
Send me an with: Name (full and what I should call you) Major / year At least 1 thing you want to learn in this course Any stats background? Anything else you want to tell me

2 Data Example A small “Gestational Age / Birthweigtht” dataset.
24 Babies: 12 boys and 12 girls Assume this is a representative sample for the population of interest Data: Gestational Age (weeks) Birthweight (grams) Gender (1=male, 2=female)

3

4 Two types of data: Qualitative: Quantitative: “#s”
“qualities / not able to be ordered” ex: gender Quantitative: “#s” Discrete: weeks of gestational age possible values correspond to integers (or a subset of the integers) Continuous: Birth weight possible values correspond to real numbers (between any 2 numbers, a third is possible)

5 Histograms: A summary of the distribution of quantitative data
Histogram of birth weight 8/24 6/24 4/24 Probability 2/24 2500 2700 2900 3100 3300 3500 birth weight (g)

6 Histograms: A summary of the distribution of quantitative data
Divide range of data into bins of equal width. Each bin gets a bar with a height proportional to the number of data points in the bin. Example: height of bar above the number 2900 is = 33.3% = 8/24 8= # of babies with weight between 2800g and 3000g 24 = total # of observations (“n”) Note that number of bins is subjective. See page 26 in the book.

7 More about histograms Histograms show the “shape” or “distribution” of quantitative data: Skewed to the left = long left “tail” Gestational age at birth for all babies (some are premature, but almost none are more than 42 weeks) Skewed to the right = long right “tail” Symmetric Unimodal: one peak, bimodal: two peaks

8 Histograms also have a probability interpretation
Choose one point from the dataset. The probability that it falls in any particular bin is proportional to the corresponding bar’s height. Note that probabilities are in the interval from 0 to 1.

9 Histograms also have a probability interpretation
Important Concept: Histograms are based on samples from a true population. They estimate the probabilities described above. As the sample size (n) increases, the estimates are better guesses of the true population behavior. Histograms are estimates of a function: Input: bin location, Output: probability We call this function the “distribution” What’s an estimate of the probability that a new baby weighs 3kg or less?

10 Numerical Summaries for Quantitative Data
Let x1,…,xn be the dataset Measures of the “center” of histograms. Sample mean: X: “x bar” = (x1+…+xn)/n m = true mean (“mu”) of the full population. This is unknown. x bar estimates m Median: Value where 50% are smaller and 50% are larger. Median is also an estimate an unknown true quantity. (PIR example)

11 Median versus mean They tend to be similar if the data are fairly symmetric Median is less sensitive to “extreme and anomalous” observations (“outliers”) than the median. Example: 400 graduates: 399 of them make $40,000 a year 1 is a starting pitcher and makes $10 million Mean: $64,900 Median: $40,000

12 Numerical Summaries: Measure of “spread of histogram”
Measure 1: Range = largest x – smallest x Measure 2: Sample Variance: s2 = [ (x1 – xbar)2 +…+ (xn – xbar)2 ] / (n-1) “average squared variation around the mean” Sample standard devation = s = sqrt(s2) s2 estimates a true variance: s2 s estimates a true standard devation: s What does standard deviation mean?

13 Meaning of standard deviation:
When distribution roughly has a “bell curve” shape, then about 68% of the data are within +/- 1 standard deviation of the mean about 95% of the data are within +/- 2 standard deviations of the mean

14 Why we’ll care: 3024 2911 Birthweight (g) Female Babies Male Babies Example of kind of question we’ll want to answer: Is the true mean birth weight for male and female babies different? (Answer depends of the variability of birthweight.)


Download ppt "Stat 501 Spring 2004 Go through intro doc Homework 1:"

Similar presentations


Ads by Google