Stat 501 Spring 2004 Go through intro doc Homework 1:

Slides:



Advertisements
Similar presentations
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Advertisements

Introduction to Summary Statistics
Stat 2411 Statistical Methods Chapter 4. Measure of Variation.
Data observation and Descriptive Statistics
Today: Central Tendency & Dispersion
Describing Data: Numerical
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Objective To understand measures of central tendency and use them to analyze data.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
 Multiple choice questions…grab handout!. Data Analysis: Displaying Quantitative Data.
Objectives 1.2 Describing distributions with numbers
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
1.2 Types of Variables Definition: Qualitative variables measure a quality or characteristic on each experimental unit. Examples: Eye color, state of residence,
Categorical vs. Quantitative…
To be given to you next time: Short Project, What do students drive? AP Problems.
Math 145 September 11, Recap  Individuals – are the objects described by a set of data. Individuals may be people, but they may also be animals.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
MATH 1107 Elementary Statistics Lecture 3 Describing and Exploring Data – Central Tendency, Variation and Relative Standing.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
Stat 2411 Statistical Methods Chapter 4. Measure of Variation.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 3 Numerical Descriptive Measures. 3.1 Measures of central tendency for ungrouped data A measure of central tendency gives the center of a histogram.
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Chapter 1: Exploring Data
Statistics 200 Lecture #4 Thursday, September 1, 2016
CHAPTER 1 Exploring Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Methods for Describing Data Sets
Stat 2411 Statistical Methods
Chapter 1 & 3.
Descriptive Statistics: Presenting and Describing Data
Laugh, and the world laughs with you. Weep and you weep alone
Distributions and Graphical Representations
Unit 1 - Graphs and Distributions
Basics of Statistics.
CHAPTER 2 Modeling Distributions of Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Topic 5: Exploring Quantitative data
Warmup Draw a stemplot Describe the distribution (SOCS)
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Measure of Central Tendency
Mean, Median, Mode The Mean is the simple average of the data values. Most appropriate for symmetric data. The Median is the middle value. It’s best.
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Stat 2411 Statistical Methods Chapter 4. Measure of Variation.
Histograms and Measures of Center vs. Spread
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Advanced Algebra Unit 1 Vocabulary
Math 341 January 24, 2007.
Numerical Descriptive Measures
CHAPTER 2 Modeling Distributions of Data
Biostatistics Lecture (2).
CHAPTER 2 Modeling Distributions of Data
Presentation transcript:

Stat 501 Spring 2004 Go through intro doc Homework 1: Send me an email with: Name (full and what I should call you) Major / year At least 1 thing you want to learn in this course Any stats background? Anything else you want to tell me

Data Example A small “Gestational Age / Birthweigtht” dataset. 24 Babies: 12 boys and 12 girls Assume this is a representative sample for the population of interest Data: Gestational Age (weeks) Birthweight (grams) Gender (1=male, 2=female)

Two types of data: Qualitative: Quantitative: “#s” “qualities / not able to be ordered” ex: gender Quantitative: “#s” Discrete: weeks of gestational age possible values correspond to integers (or a subset of the integers) Continuous: Birth weight possible values correspond to real numbers (between any 2 numbers, a third is possible)

Histograms: A summary of the distribution of quantitative data Histogram of birth weight 8/24 6/24 4/24 Probability 2/24 2500 2700 2900 3100 3300 3500 birth weight (g)

Histograms: A summary of the distribution of quantitative data Divide range of data into bins of equal width. Each bin gets a bar with a height proportional to the number of data points in the bin. Example: height of bar above the number 2900 is 0.333 = 33.3% = 8/24 8= # of babies with weight between 2800g and 3000g 24 = total # of observations (“n”) Note that number of bins is subjective. See page 26 in the book.

More about histograms Histograms show the “shape” or “distribution” of quantitative data: Skewed to the left = long left “tail” Gestational age at birth for all babies (some are premature, but almost none are more than 42 weeks) Skewed to the right = long right “tail” Symmetric Unimodal: one peak, bimodal: two peaks

Histograms also have a probability interpretation Choose one point from the dataset. The probability that it falls in any particular bin is proportional to the corresponding bar’s height. Note that probabilities are in the interval from 0 to 1.

Histograms also have a probability interpretation Important Concept: Histograms are based on samples from a true population. They estimate the probabilities described above. As the sample size (n) increases, the estimates are better guesses of the true population behavior. Histograms are estimates of a function: Input: bin location, Output: probability We call this function the “distribution” What’s an estimate of the probability that a new baby weighs 3kg or less?

Numerical Summaries for Quantitative Data Let x1,…,xn be the dataset Measures of the “center” of histograms. Sample mean: X: “x bar” = (x1+…+xn)/n m = true mean (“mu”) of the full population. This is unknown. x bar estimates m Median: Value where 50% are smaller and 50% are larger. Median is also an estimate an unknown true quantity. (PIR example)

Median versus mean They tend to be similar if the data are fairly symmetric Median is less sensitive to “extreme and anomalous” observations (“outliers”) than the median. Example: 400 graduates: 399 of them make $40,000 a year 1 is a starting pitcher and makes $10 million Mean: $64,900 Median: $40,000

Numerical Summaries: Measure of “spread of histogram” Measure 1: Range = largest x – smallest x Measure 2: Sample Variance: s2 = [ (x1 – xbar)2 +…+ (xn – xbar)2 ] / (n-1) “average squared variation around the mean” Sample standard devation = s = sqrt(s2) s2 estimates a true variance: s2 s estimates a true standard devation: s What does standard deviation mean?

Meaning of standard deviation: When distribution roughly has a “bell curve” shape, then about 68% of the data are within +/- 1 standard deviation of the mean about 95% of the data are within +/- 2 standard deviations of the mean

Why we’ll care: 3024 2911 Birthweight (g) Female Babies Male Babies Example of kind of question we’ll want to answer: Is the true mean birth weight for male and female babies different? (Answer depends of the variability of birthweight.)