26134 Business Statistics Autumn 2017 Tutorial 5: Continuous Probability Distributions bstats@uts.edu.au B MathFin (Hons) M Stat (UNSW) PhD (UTS) mahritaharahap.wordpress.com/ teaching-areas business.uts.edu.au
Last week revision
What is a random variable? A random variable, usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of random variables, discrete and continuous.
Binomial Distribution 𝑓 𝑥 =𝑃 𝑋=𝑥 =nCx pxq(n-x) where: f(x) = probability of x successes in n independent trials p = probability of success q = 1-p = probability of failure 𝑥 = number of successful trials n-x = number of failed trials
𝑓 𝑥 =𝑃 𝑋=𝑥 = 𝜆 𝑥 𝑒 −𝜆 𝑥! Poisson Distribution where: 𝑓 𝑥 =𝑃 𝑋=𝑥 = 𝜆 𝑥 𝑒 −𝜆 𝑥! where: f(x) = probability of x occurrences in a specified interval 𝜆 = mean number of events in a specified interval
Hypergeometric Distribution 𝑓 𝑥 =𝑃 𝑋=𝑥 = rCxN−rCn−x NCn where: f(x) = probability of x successes in n dependent trials n = number of trials N = number of elements in the population r = number of elements in the population labelled success x= number of elements in the sample labelled success
Cumulative Probability Discrete Distributions P(X<=x)≠P(X<x) P(X=x) ≠ 0 i.e. P(X<=2) ≠ P(X<2) Continuous Distributions P(X<=x)=P(X<x) P(X=x) ≈ 0 i.e. P(X<=2)=P(X<2)
Continuous Distributions
Activity 1: Uniform Distribution
Activity 2: Normal Distribution X=rate of return on a proposed investment 𝑋~𝑁(μ=0.30 ,σ=0.10) 𝑃 𝑋≤$0.23 =𝑃(𝑍≤ 𝑥−μ σ ) = 𝑃(𝑍≤ 0.23−0.30 0.10 ) = 𝑃(𝑍≤−0.70) = 𝑍~𝑁(μ=0 ,σ=1)
We need to do this to standardise the distribution so we can find the probabilities using the tables. To find probabilities under the normal distribution, random variable X must be converted to random variable Z that follows a standard normal distribution denoted as Z~N(μ=0,σ=1). We need to do this to standardise the distribution so we can find the probabilities using the tables.
Why do we standardise the normal distribution to find probabilities? When we are given a problem about a random variable X~N(μ,σ) that follows a normal distribution, to find cumulative probabilities P[X<x] we would have to calculate a complex definite integral. To do this computation easier, we have standard practices in statsistics where we can convert the normal random variable X~N(μ,σ) into a standard normal random variable Z~N(μ=0,σ=1) (by computing the associated z-score, then we can just look up the tables to find certain probabilities): @ Mahrita.Harahap@uts.edu.au
@ Mahrita.Harahap@uts.edu.au Calculating Probabilities using normal distribution applying the complement rule and/or symmetry rule and/or interval rule Complement Rule P(Z>z)=1-P(Z<z) Symmetry Rule P(Z<-z)=P(Z>z) Interval Rule P(-z<Z<z)=P(Z<z)-P(Z<-z) @ Mahrita.Harahap@uts.edu.au
Activity 3: Poisson and Exponential Distributions X = time of visits 𝜆 =1/ μ = mean number of events per unit of time = two clients to talk to on average per 30mins = one client to talk to on average per 15mins =1/15 clients on average per minute = 0.0666667 clients per minute
Activity 3: Poisson and Exponential Distributions X = time of visits 𝜆 =1/ μ = mean number of events per unit of time = two clients to talk to on average per 30mins = one client to talk to on average per 15mins =1/15 clients on average per minute = 0.0666667 clients per minute P(X<x) = 1 - e-λx P(X<10)=1-e-0.06667*10= We would use this for Poisson: 𝜆 = 2 clients to talk to on average per 30mins = 2/3 client to talk to on average per 10mins
Revision
In statistics we usually want to analyse a population parameter but collecting data for the whole population is usually impractical, expensive and unavailable. That is why we collect samples from the population (sampling) and make conclusions about the population parameters using the statistics of the sample (inference) with some level of confidence (level of significance). In statistics we usually want to analyse a population parameter but collecting data for the whole population is usually impractical, expensive and unavailable. That is why we collect samples from the population (sampling) and make conclusions about the population parameters using the statistics of the sample (inference) with some level of confidence (level of significance). Week 1
Types of data – Graphical displays Categorical (divides the cases into groups/categories) Nominal (no order) e.g. Nationality, Gender, Month Ordinal (order) e.g. Satisfaction Level or level of education Quantitative (measures a numerical quantity for each case) Discrete takes whole number values e.g. number of birds in a tree, shoe size Continuous e.g. height, temp Bar Chart Pie Chart Boxplot Histogram Week 1
Presenting Data Graphically Univariate Categorical Data Quantitative Data Bar Charts to depict frequencies Pie Charts to depict proportions Histogram to look at the distribution Boxplot graphical summary statistics
Displaying categorical data Bar Graph Pie Graph Good for comparing the number of individuals in different groups. Good for looking at parts as a whole. Good for depicting frequencies. Good for depicting proportions Week 1
Displaying quantitative data Histogram Boxplot The graphs on the next slide correspond to the recorded change in pulse rate of 92 students after half of the students ran on the spot for a minute and half of the students rested for a minute. Can you identify these two groups in the boxplot and histogram? Good for getting an overall ‘picture’ of the data Good for finding unusual observations and looking at the symmetry of the data Week 1
Presenting Data Graphically Bivariate Categorical x Categorical Categorical x Quantitative Quantitative x Quantitative Multiple Bar Charts to compare frequencies Crosstabs to depict frequencies, row and column proportions Multiple Comparison Boxplots Scatterplots relationship between two numerical variables
Types of data – Measures of central tendency Categorical Nominal (no order) e.g. Nationality, Gender Ordinal (order) e.g. S,M,L or level of education Quantitative Discrete takes whole number values e.g. number of birds in a tree Continuous e.g. height Mode Median or Mode Mean, Median or Mode Mean or Median or Mode Week 1
Types of data – Measures of spread Categorical Nominal (no order) e.g. Nationality, Gender Ordinal (order) e.g. S,M,L or level of education Quantitative Discrete takes whole number values e.g. number of birds in a tree Continuous e.g. height None IQR Range, IQR SD CV Range, IQR SD CV Week 1
Measures of dispersion The coefficient of variance is used when you are comparing the variability between groups with different means. It is defined as the ratio of the standard deviation to the mean, expressed as a percentage. Dividing the standard deviation by the mean standardises the measure of variability so it is suitable for comparison. 𝐶𝑉= 𝑠 𝑥 ∗100 Week 1
@ Dr. Sonika Singh, BSTATS, UTS Summary: Probability Marginal Probability Union Probability Joint Probability Conditional Probability Independent Probability U @ Dr. Sonika Singh, BSTATS, UTS
Conditional Probability 𝑃 𝐴 𝐵 = 𝑃(𝐴⋂ 𝐵) 𝑃(𝐵)
Independent Events 𝑃 𝐴⋂𝐵 =𝑃 𝐴 ∗𝑃(𝐵)
Contingency Analysis – Test for Independence (Chi-Square test) Critical Value=CHIINV(0.05,df) Conclusion: If χ2>critical value we reject Ho. We conclude that at the 5% level of significance we have enough evidence to show that the two variables are not independent. If χ2<critical value we do not reject Ho. We conclude that at the 5% level of significance we do not have enough evidence to show that the two variables are not independent.
Discrete Distributions Binomial Distributions Week 1
Continuous Distributions Uniform X~Uniform(a,b) Normal X~Normal(μ,σ) Exponential X~Exp(λ) Week 1
SEE YOU ALL NEXT WEEK!