Location: Chemistry 001 Time: Friday, Nov 20, 7pm to 9pm
o Make Copies of: o Areas under the Normal Curve o Appendix B.1, page 784 o (Student’s t distribution) o Appendix B.2, page 785 o Binomial Probability Distribution o Appendix B.9, pages 794,798
Decide on the number of classes to group data into. Rule: 2 k greater than n, where k=number of classes, & n=number of observations. Ex.: if we have 80 observations, we should use 7 classes (2 6 =64 80) Determine the class interval or width. It should be the same for all classes. Rule Constructing a Frequency Distribution
Count the number of items in each class. The number of items in each class is called the class frequency. Constructing a Frequency Distribution
Histogram Frequency Polygon Cumulative Frequency Polygon Graphic Presentation
Histogram
Frequency Polygon Frequency Polygon A Frequency Polygon consists of line segments connecting the points formed by the class midpoint and the class frequency
Cumulative Frequency Distribution Cumulative Frequency Distribution A Cumulative Frequency Distribution is used to determine how many or what proportion of the data values are below or above a certain value.
Cumulative Frequency Polygon 100% 50% 75% 25%
Chapter 3
o Arithmetic mean o Weighted mean o Median o Mode o If Median<Mean then +vly skewed. o If Median>Mean then –vly skewed Measures of Location
Dispersion Dispersion refers to the spread or variability in the data. o Range o Mean deviation o Variance Population Var, Sample Var o Standard deviation Measures of Dispersion
Mean of Grouped Data (from a frequency distribution) f is the frequency in each class M is the midpoint of each class n is the total number of frequencies
Chapter 4
Percentiles Location of a percentile: n = number of observations P = desired percentile
o Q 1 =L 25, Q2=Median=L 50, Q 3 =L 75. o 70 th Decile = L 70. Example: 43, 61, 75, 91, 101, 104 L 25 =(6+1)25/100=1.75 (1.75 th position) or.75 of distance between 1 st and 2 nd observation= 43 + (61-43)(0.75)=56.5. Percentiles Location of a percentile: n = number of observations P = desired percentile
Box Plot L=13, H=30, Q1=15, Q2=18, Q3=22. Outlier if value > Q (Q 3 -Q 1 ), or if value <Q 1 – 1.5(Q 3 -Q 1 ), where Q 3 -Q 1 is the inter-quartile range. Q1Q3 LH Median
Ex: Using the twelve stock prices, we find the mean to be 84.42, standard deviation, 7.18, median, Coefficient of variation = 8.5% Coefficient of skewness = s MedianX sk 3
Chapter 5
Classical Probability Based on the assumption that the outcomes of an experiment are equally likely. Probability of an event= Number of favorable outcomes / Total number of possible outcomes. Example: We roll a die. What is the probability of the event “an even # appears face up”? Possible outcomes are:1,2,3,4,5,6.(6) Favorable outcomes are:2,4,6.(3) Probability of an even number=3/6 =.5
Empirical Probability Based on relative frequency. Probability of an event= Number of times event occurred in the past / Total number of observations Example: What is the probability of a future space shuttle mission being successful, given that 2 out of the last 113 missions ended with a disaster? Probability of a successful mission= Number of successful flights / Total number of flights. P(A)= 111 / 113=.98
A conditional probability is the probability of a particular event occurring, given that another event has occurred. The probability of the event A given that the event B has occurred is written P(A|B). Conditional Probability Two events A and B are independent if the occurrence of one has no effect on the probability of the occurrence of the other P(A|B)=P(A) or P(B|A)=P(B).
Rules of Addition Special Rule of Addition - If two events A and B are mutually exclusive, the probability of one or the other event’s occurring equals the sum of their probabilities. P(A or B) = P(A) + P(B) The General Rule of Addition - If A and B are two events that are not mutually exclusive, then P(A or B) is given by the following formula: P(A or B) = P(A) + P(B) - P(A and B) Rules for Computing Probabilities - Addition Rule AB A and B Joint Probability of A and B
Contingency Table: Example: The Dean of the School of Business at Owens University collected the following information about undergraduate students in her college
Chapter 6
Constructing a PDF and a CDF for a Discreet Random Variable Example: Toss a coin three times and let X be the number of heads. What is the PDF and CDF of X? OutcomeProb.X H H H1/83 H H T1/82 H T H1/82 H T T1/81 T H H1/82 T H T1/81 T T H1/81 T T T1/80 xP(X = x)F(x)=P(X ≤ x) 01/8 13/8=1/8+3/8=1/2 23/8=1/2+3/8=7/8 31/81
Expected Value (Mean) Mathematically: The expected value (or mean) of a RV X is µ = E(X) = Sometimes write µ X Additivity: E(X + Y) = E(X) + E(Y)
Variance and Standard Deviation A measure of the variability of a RV is its Variance To compute the variance of a discrete RV X Compute µ For each possible x, compute (x – µ ) 2 p(x) Add up these values It helps to construct a table In a formula: OR Standard Deviation (SD):
Variance and Standard Deviation Consider the pdf of the random variable: µ = 3/2 V ar(X) = (0 – 3/2) 2 (1/8) + (1 – 3/2) 2 (3/8) + (2 – 3/2) 2 (3/8) + (3 – 3/2) 2 (1/8) = 3/4 What is the CDF at 2 = F(2)=P(X=2)+P(X=1)+P(X=0)=7/8 x0123 p(x)1/83/8 1/8
The Binomial Distribution Let X be the number of “successes” in n independent “trials,” each with success probability p, Such an X is a Binomial R.V. with parameters n and p What is the mean and variance of a Binomial Random Variable? In the book: the probability p is denoted by π, n is the number of trials x is the number of observed successes, x=0…n p is the probability of success on each trial where
An important part of understanding probability/statistics is recognizing a “binomial situation” Binomial example Number of defective products in a sample of items. n = number of items, p = probability of a product being defective Number of students in this class who are in senior year n = number of students in this class, p = probability of a student being a senior. Number of no-shows for a flight n = number of passengers, p = probability of a no show flight Number of times next week I’ll get stuck in traffic on my way to school n = number of work days per week, p = probability I get stuck in traffic The Binomial Distribution
Chapter 7
For Discrete RV X, the pdf is given by p(x)=P(X=x) for all possible values of x. For a Continuous RV X, P(X=x)=0 for all values of x. Example: If X is the amount of time you wait in line at Starbucks then P(X=30.567… seconds)=0. The pdf of a continuous RV is represented by a function p(x) for all values of x where the area under p(x) is 1. The Uniform and Normal Distributions are commonly used Continuous Distributions. Continuous Probability Distributions
Uniform Distribution The simplest distribution for a continuous random variable. Rectangular in shape, constant (uniform) height Defined by minimum and maximum values a and b. Areas within the distribution represent probabilities Example: Time to fly on MEA from Beirut to Paris ranges from 4 hrs to 5hrs. Random variable is flight time; it is continuous. P(x) 1/(b-a) A continuous Uniform Distribution a b x
Mean: SD: Height: if a≤ x ≤b, 0 elsewhere.
The Standard Normal Distribution The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. z distribution. It is also called the z distribution. z-value A z-value is the signed distance between a selected value, designated X, and the mean µ, divided by the standard deviation, σ. The formula is:
We want to know the area under the curve between the mean, 283, and grams. Or P(283< X <285.4) We convert the x values into z values z value for 283: z= (x-μ)/σ= ( )/1.6 = 0 z value for 285.4: z= ( )/1.6 = 1.5 P(283< weight <285.4) = P(0<z<1.5) = The area under the curve, between 0.00 and 1.5 = The Normal Distribution
What is the value of X for which 5% will be larger than X. What is the value of X for which 95% will fall below. We obtain z from Appendix B.1, z=1.65 We convert to the x value, x= σ z+ μ. The Normal Distribution
Chapters 8 and 9
Samples are used to estimate population characteristics. Unlikely sample mean (standard dev.) equals to population mean (standard dev.) Error made in estimating the population mean based on the sample? Definition: Difference between a sample statistic and its corresponding population parameter Example: output of each employee: 97,103,96,99,105 units. Select samples of two and find their mean Sample1: {97,105} with mean = 101 Sampling Error = Sample2: {103,96} with mean = 100 Sampling Error = Sampling errors are random and occur by chance. To make accurate predictions based on sample results, we need to first develop sampling distributions of the sample means. Sampling Error
Sampling: Distribution of the Sample Mean (Sigma Known) o If a population follows the normal distribution, the sampling distribution of the sample mean will also follow the normal distribution. o To determine the probability a sample mean falls within a particular region, use: Note that: is called the Standard Error of the Mean.
Sampling: Distribution of the Sample Mean (Sigma Unknown) o If the population does not follow the normal distribution, but the sample is of at least 30 observations, the sample means will follow the normal distribution. o To determine the probability a sample mean falls within a particular region, use:
Definition: The statistic computed from sample information and used to estimate the population parameter. Examples: Sample mean, is a point estimate for the population mean, µ Sample standard error s is a point estimate of population standard deviation σ Sample proportion p is a point estimate of population proportion π Point Estimate
Confidence interval equations, Confidence Interval
Is the population normal? Is n 30 or more? Is the population SD known? Use a nonparametric test Use the z distribution Use t if n less than or equal to 30, Use z if n is more than 30 Use the z distribution No Yes NoYes When to Use the t Distribution Eq3 Eq1 Eq2 or Eq1 Assume Normal and go through the flow chart again Eq2
Sample Size for Estimating Population Mean n is the sample size; z is the standard normal value corresponding to the desired level of confidence; s is an estimate of the population SD; E is the maximum allowable error (1/2 length of the CI). If the result is not a whole number, round up.
Standard Error of the Sample Proportion Confidence Interval for a Population Proportion
Three items need to be specified: 1.The desired level of confidence. 2.The margin of error in the population proportion. 3.An estimate of the population proportion. Sample Size for the Population Proportion If an estimate of π is not available, use p=0.5 to approximately estimate the sample size.
Finite-Population Correction Factor If the population size N is not very large, then we use a population correction factor when computing the CI. If (n/N > 0.05) then use :, OR
Finite-Population Correction Factor If the population size N is not very large, then we use a population correction factor when computing the CI. If (n/N > 0.05) then use :
Material NOT Included in the Midterm o Chapter 2: o Chebyzhev’s Theorem o Geometric Mean o Chapter 4: o Software Coefficient of Skewness o Stem-and-Leaf Displays o Chapter 5: o Permutation Equation
Material NOT Included in the Midterm o Chapter 6: o Hypergeometric Probability Distribution o Poisson Probability Distribution o Covariance o Chapter 7: o The Normal Approximation to the Binomial