Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
The Normal distributions BPS chapter 3 © 2006 W.H. Freeman and Company.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Definitions Uniform Distribution is a probability distribution in which the continuous random variable values are spread evenly over the range of possibilities;
1.2: Describing Distributions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
CHAPTER 3: The Normal Distributions Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
BPS - 5th Ed. Chapter 31 The Normal Distributions.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.
1 Frequency Distributions. 2 Density Function We’ve discussed frequency distributions. Now we discuss a variation, which is called a density function.
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Basic Statistics Standard Scores and the Normal Distribution.
3.3 Density Curves and Normal Distributions
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
The Normal distributions BPS chapter 3 © 2006 W.H. Freeman and Company.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Stat 1510: Statistical Thinking and Concepts 1 Density Curves and Normal Distribution.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
CHAPTER 3: The Normal Distributions ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
CHAPTER 3: The Normal Distributions
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
Jan. 19 Statistic for the day: Number of Wisconsin’s 33 Senators who voted in favor of a 1988 bill that allows the blind to hunt: 27 Assignment: Read Chapter.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 6 Probability Distributions Section 6.2 Probabilities for Bell-Shaped Distributions.
Density Curves Section 2.1. Strategy to explore data on a single variable Plot the data (histogram or stemplot) CUSS Calculate numerical summary to describe.
NORMAL DISTRIBUTION AND ITS APPL ICATION. INTRODUCTION Statistically, a population is the set of all possible values of a variable. Random selection of.
Thursday August 29, 2013 The Z Transformation. Today: Z-Scores First--Upper and lower real limits: Boundaries of intervals for scores that are represented.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
BPS - 5th Ed. Chapter 31 The Normal Distributions.
Essential Statistics Chapter 31 The Normal Distributions.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
NORMAL DISTRIBUTION Chapter 3. DENSITY CURVES Example: here is a histogram of vocabulary scores of 947 seventh graders. BPS - 5TH ED. CHAPTER 3 2 The.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Questions?? Example: Exercise set B #1 p. 38 –Set up table with intervals, frequencies & percent per year –Graph appears on p. 39 Figure 5 –For discrete.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
Chap 6-1 Chapter 6 The Normal Distribution Statistics for Managers.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Introduction We learned from last chapter that histogram can be used to summarize large amounts of data. We learned from last chapter that histogram can.
Chance We will base on the frequency theory to study chances (or probability).
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
The Normal Approximation for Data. History The normal curve was discovered by Abraham de Moivre around Around 1870, the Belgian mathematician Adolph.
The normal approximation for probability histograms.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Introduction A histogram is a graph that summarizes data.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics, A First Course 4 th.
The Normal Distributions.  1. Always plot your data ◦ Usually a histogram or stemplot  2. Look for the overall pattern ◦ Shape, center, spread, deviations.
STATS DAY First a few review questions. Which of the following correlation coefficients would a statistician know, at first glance, is a mistake? A. 0.0.
Continuous random variables
STATS DAY First a few review questions.
CHAPTER 3: The Normal Distributions
Density Curves and Normal Distribution
Chapter 4 – Part 3.
Descriptive and inferential statistics. Confidence interval
Basic Practice of Statistics - 3rd Edition The Normal Distributions
CONTINUOUS RANDOM VARIABLES AND THE NORMAL DISTRIBUTION
Statistics for Managers Using Microsoft® Excel 5th Edition
Summary (Week 1) Categorical vs. Quantitative Variables
CHAPTER 3: The Normal Distributions
Basic Practice of Statistics - 3rd Edition The Normal Distributions
Sampling Distributions (§ )
Advanced Algebra Unit 1 Vocabulary
CHAPTER 3: The Normal Distributions
The Normal Distribution
Presentation transcript:

Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability

Design of experiments Method: Investigators compare the responses of a treatment group with a control group. Treatment group: The group of subjects that are given treatments. Control group: The group of subjects that are not treated. (Given placebos.) Double-blind experiment: The subjects do not know whether they are in treatment or in control; neither do those who evaluate the responses. (e.g. Doctors evaluate the patients responses, investigators compare the responses.) This guard against bias, either in responses or in evaluations.

Design of experiments Controlled experiments: Investigators assign the subjects into two groups. If the experiments is randomized, then the subjects are assigned at random. Observational study: The subjects assign themselves to different groups, the investigators just watch what happens. Observational study has a great weakness: confounding. However, the controlled experiments minimize this problem.

Design of experiments Confounding factor: The treatment group is different from the control group with respect to other factors. The effect of these factors are confounded with the effect of the treatment. These factors are called confounders. Confounders have to be associated with both disease and exposure. Example: An observational study on smoking with related disease. The disease will be lung cancer or heart attack. The exposure will be smoking. A gene is a confounder if it is related to both lung cancer and smoking.

Simpson’s paradox Relationships between percentages in subgroups can be reversed when the subgroups are combined. Example: sex bias in graduate admissions.

Cross-sectional vs longitudinal In a cross-sectional study, different subjects are compared to each other at one point in time. (e.g. The HANES is a cross-sectional study.) In a longitudinal study, subjects are followed over time, and compared with themselves at different points in time. Example: In the HANES2, the average height of men appears to decrease after age 20, dropping about two inches in 50 years. Similarly for women. Could we conclude that an average person got shorter at this rate? Not really. Because the HANES is a cross-sectional study: the people in the group of age are completely different from those in the group of age The first group was born around 50 years later than the second group.

Histogram What is a histogram? A histogram is a graph that summarizes data. (It is just a summary.) Histogram consists of a set of blocks, and the area of each block represents the percentage of cases in the corresponding class interval. The total area is 100%. To calculate the height: The height represents the crowding in that class interval. It equals to the area divided by the length of that interval.

Histogram To draw a histogram: A distribution table may help: count the frequency, then calculate the percentage. Draw a horizontal axis with given scale. (Then, for most of the cases, draw a vertical axis for density scale.) Compute the height for each class interval. Draw the blocks. Quiz 1 will be a typical example for you.

Ave and SD A list of numbers (usually a data set) can be summarized by its average and standard deviation. Average locates the “center”, and SD measures the “spread”. Average = sum of entries / number of entries. The SD measures distance from the average. And SD = r.m.s. of the deviations from the average.

Convert to standard units A value is converted to standard units by seeing how many SDs it is above or below the average. Values above the average are given a plus sign; values below the average get a minus sign. The horizontal axis of the graph of the normal curve is in standard units. Many histograms for data are similar in shape to the normal curve, provided they are drawn to the same scale: making the horizontal scales match up involves standard units.

Example A histogram for the calculus test scores. Average is 70 and SD is 10. Number of students is 200. We convert the horizontal axis into standard units. Then we match the vertical scale by fixing the areas. (Or just multiply the corresponding factor.) Then we sketch the normal curve. (A bell shape curve with center height about 40%.)

Normal approximation Example: Find the number of scores within 1.6 SDs of the average in the previous example. (Or equivalently, we can say, what is the number of scores between 54 and 86.) Solution: From the normal table, we find that the region under the normal curve between -1.6 and 1.6 has an area 89.04% ≈ 90%. So the number should be about 200 x 90% = 180.

Percentile A percentile is a number of the quantitative variable, representing the corresponding percentage. For example, say, in the previous example, the 10 th percentile is 60. This means, about 10% of the students (population) is below or equal 60 (the percentile level). Exercise: What is the 25 th percentile of the list: 1,2,3,4? (See next slide.) A percentile rank is a percent of the percentile: e.g. 10%. All histograms, whether or not they follow the normal curve, can be summarized using percentiles.

The 25 th percentile of the list: 1,2,3,4 Correction: what I showed you in class had a mistake. I apologize for that. Solution: The 25 th percentile means that, about 25% of the entries is below or equal to the percentile, say z. So the number of entries that is about 25% is 4 x 25% = 1. Hence there is only one entry is below or equal to z. This implies z = 1. So the 25 th percentile of the list is 1. Similarly, the 75 th percentile of the list: 8,4,2,9 is 8. (4 x 75% = 3, after ordering, the 3 rd entry is 8.) In general, for discrete data set, like a list, if the number of entries we calculate is not an integer, then the percentile is not defined. For example, the 20 th percentile of the list is not defined, since 4 x 20% = 0.8.

Percentile approximation Example: In the previous example, if one of the students claims his score is higher than 90.32% of his classmates, use the normal approximation to estimate his score. (Or equivalently, what is the 90 th percentile of the distribution of the score.) Solution: Let’s say the spot at the standard units is z, such that the region to the left of z has the area 90.32%. Then the area to the left of –z will be 100% % = 9.68%. So the area between –z and z is 90.32% % = 80.64%. From the normal table, z = 1.3. So the score of the student is about 1.3 x = 83.

Median and Interquartile The median is another way to locate the center of a histogram, with half the area to the left and half to the right. (The 50 th percentile.) The interquartile range = 75 th percentile – 25 th percentile. When the distribution has a long tail, we use median as the center of the histogram, and we use the interquartile range as a measure of spread.

Change of scale Adding the same number to every entry on a list adds that constant to the average; the SD does not change. Multiplying every entry on a list by the same positive number multiplies the average and the SD by that constant. These changes of scale do not change the standard units.

Measurement error Chance errors change from measurement to measurement, sometimes up and sometimes down. Bias affects all measurements the same way, pushing them in the same direction. If there is no bias in a measurement procedure, then the long-run average of repeated measurements should give the exact value of the thing being measured: the chance errors should cancel out. If there is bias, then the long-run average will itself be either too high or too low. Bias can not be detected just by looking at the measurements themselves.

Size of the chance error The likely size of the chance error in a single measurement can be estimated by the SD of repeated measurements. Example: Homework Set 3, problem 5.

Probability The probability of something gives the percentage of times the thing is expected to happen, when the basic process is repeated over and over again. Probabilities are between 0% and 100%. Impossibility is represented by 0%, certainty by 100%. The probability of something equals 100% minus the probability of the opposite thing. For example: We draw a ticket from a box with tickets: 1,2,3,4,5. Then the probability of drawing a number 4 or more is 2/5. The probability of drawing a number 3 or less is 1 – 2/5 = 3/5.

Formulas for probability The multiplication rule: P(A, B) = P(A|B) x P(B). The conditional probability: P(A|B) = P(A, B) / P(B). Two events are independent if the chances for the second one stay the same no matter how the first one turns out: P(A|B) = P(A). Consequence of independence: P(A, B) = P(A) x P(B). For example, We draw twice from a box with tickets: 1,2,3,4,5. Then the probability of the first draw being a number 4 or more and second draw being a number 3 or less is: 2/5 x 3/5 = 6/25 = 0.24.

Drawing tickets from a box When we draw tickets at random, all tickets in the box share the same chance to be picked. Draws made at random with replacement are independent. Without replacement, the draws are dependent. (Exclude some extreme cases.)

Good Luck!