Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!
Independent vs. Dependent Events Independent Events: unrelated events that intersect at chance levels given relative probabilities of each event Dependent Events: events that are related in some way So... how to tell if two events are independent or dependent? Look at the INTERSECTION: P(A B) if P(A B) = P(A)*P(B) --> independent if P(A B) P(A)*P(B) --> dependent
Random Variables Random Variable: variable that takes on a particular numerical value based on outcome of a random experiment Random Experiment (aka Random Phenomenon): trial that will result in one of several possible outcomes can’t predict outcome of any specific trial can predict pattern in the LONG RUN
Random Variables Example: Random Experiment: flip a coin 3 times Random Variable: # of heads
Random Variables Discrete vs Continuous finite vs infinite # possible outcomes Scales of Measurement Categorical/Nominal Ordinal Interval Ratio
Data World vs. Theory World Theory World: Idealization of reality (idealization of what you might expect from a simple experiment) Theoretical probability distribution POPULATION parameter: a number that describes the population. fixed but usually unknown Data World: data that results from an actual simple experiment Frequency distribution SAMPLE statistic: a number that describes the sample (ex: mean, standard deviation, sum,...)
So far... Graphing & summarizing sample distributions (DESCRIPTIVE) Counting Rules Probability Random Variables one more key concept is needed to start doing INFERENTIAL statistics: SAMPLING DISTRIBUTION
Binomial Situation Bernoulli Trial a random experiment having exactly two possible outcomes, generically called "Success" and "Failure” probability of “Success” = p probability of “Failure” = q = (1-p) HeadsTails Good Robot Bad Robot Examples: Coin toss: “Success”=Heads p=.5 Robot Factory: “Success”=Good Robot p=.75
Binomial Situation Binomial Situation: n: # of Bernoulli trials trials are independent p (probability of “success”) remains constant across trials Binomial Random Variable: X = # of the n trials that are “successes”
Binomial Situation: collect data! Population : Outcomes of all possible coin tosses (for a fair coin) Success=Heads p=.5 Let’s do 10 tosses n=10 (sample size) Bernoulli Trial: one coin toss Binomial Random Variable: X=# of the 10 tosses that come up heads (aka Sample Statistic) Sample: X =....
Binomial Distribution p=.5, n=10 This is the SAMPLING DISTRIBUTION of X!
Sampling Distribution Sampling Distribution: Distribution of values that your sample statistic would take on, if you kept taking samples of the same size, from the same population, FOREVER (infinitely many times). Note: this is a THEORETICAL PROBABILITY DISTRIBUTION
Binomial Situation: collect data! Population : Outcomes of all possible coin tosses (for a fair coin) Success=Heads p=.5 Let’s do 10 tosses n=10 (sample size) Bernoulli Trial: one coin toss Binomial Random Variable: X=# of the 10 tosses that come up heads (aka Sample Statistic) Sample: X = Sampling Distribution
Binomial Situation: collect data! Population : Outcomes of all possible coin tosses (for a fair coin) Success=Heads p=.5 Let’s do 10 tosses n=10 (sample size) Bernoulli Trial: one coin toss Binomial Random Variable: X=# of the 10 tosses that come up heads (aka Sample Statistic) Sample: X = 3 Sampling Distribution
Binomial Formula Binomial Random Variable specific # of successes you could get combination called the Binomial Coefficient probability of success probability of failure specific # of failures
Binomial Formula 3 Sampling Distribution p(X=3) = Remember this idea.... Hmm... what if we had gotten X=0?... pretty unlikely outcome... fair coin? Population : Outcomes of all possible coin tosses (for a fair coin) p=.5 n=10
More on the Binomial Distribution X ~ B(n,p) these are the parameters for the sampling distribution of X # heads in 5 tosses of a coin: X~B(5,1/2) Expectation Variance Std. Dev. # heads in 5 tosses of a coin: Ex:
Let’s see some more Binomial Distributions What happens if we try doing a different # of trials (n) ? That is, try a different sample size...
Whoah. Anyone else notice those DISCRETE distributions starting to look smoother as sample size (n) increased? Let’s look at a few more binomial distributions, this time with a different probability of success...
Binomial Robot Factory 2 possible outcomes: Good Robot 90% Bad Robot 10% You’d like to know about how many BAD robots you’re likely to get before placing an order... p =.10 (... “success”) n = 5, 10, 20, 50, 100
Normal Approximation of the Binomial If n is large, then X ~ B(n,p) {Binomial Distribution} can be approximated by a NORMAL DISTRIBUTION with parameters:
Normal Distributions (aka “Bell Curve”) Probability Distributions of a Continuous Random Variable (smooth curve!) Class of distributions, all with the same overall shape Any specific Normal Distribution is characterized by two parameters: mean: standard deviation:
different means different standard deviations
Standardizing “Standardizing” a distribution of values results in re-labeling & stretching/squishing the x-axis useful: gets rid of units, puts all distributions on same scale for comparison HOWTO: simply convert every value to a: Z SCORE:
Standardizing Z score: Conceptual meaning: how many standard deviations from the mean a given score is (in a given distribution) Any distribution can be standardized Especially useful for Normal Distributions...
Standard Normal Distribution has mean: =0 has standard deviation: =1 ANY Normal Distribution can be converted to the Standard Normal Distribution...
Standard Normal Distribution
Normal Distributions & Probability Probability = area under the curve intervals cumulative probability [draw on board] For the Standard Normal Distribution: These areas have already been calculated for us (by someone else)
Standard Normal Distribution So, if this were a Sampling Distribution,...
Next Time More different types of distributions Binomial, Normal t, Chi-square FF And then... how will we use these to do inference? Remember: biggest new idea today was: SAMPLING DISTRIBUTION