Statistics -S1
Chapter 1 Mathematical models in probabilities and statistics
Mathematical models A simplification of a real world situation Adv.: quick and easy to produce, can simplify a more complex situation, enables for predictions to be made and can help to provide control Disadv.: only give a partial description of real situation and they only work for a certain range of values
Chapter 2 Representation and summary of data- location
Quantitative variables Variables associated a numerical value (e.g. height)
Qualitative variable Variables which do not have a numerical value (e.g. hair colour)
Continuous variable A variable that can take any value in a given range
Discrete variable A variable that can only take integral values within a given range
Mode/ modal class The value or class that appears most often E.g. 1, 2, 2, 2, 3, 3 ,4, 5 mode= 2
Mean x = ∑ x n Where… n = no. of observations ∑ x = sum of observations x = mean of the sample
Mean of a combined set of data x = n1x1 + n2x2 n1 + n2 Where… x = mean n = size of the sample xn = mean of individual sample
Frequency distribution table - mean x = ∑ fx ∑ f Where… ∑ fx = frequency multiplied by class or midpoint ∑ f = sum of the frequencies x = mean of the sample
Median The middle value of ordered data To find the position where the median lies… n 2 If the position isn’t an integer, round up Where n= number of observations
Interpolation Length of pine cone (mm) No. of pine cones, f Cumulative frequency 30-31 2 32-33 25 27 34-36 30 57 37-39 13 70 Median= 70/2 =35th value 33.5 Q2 36.5 Q2 – 33.5 = 35-27 36.5 – 33.5 57 - 27 27 35 57 Make Q2 the subject Q2 = 34.3
Coding Used to make large values easier to work with General form : y = x – a b No effect on product moment correlation coefficient Coded regression line may not be the same as the actual line
Chapter 3 Representation and summary of data-dispersion
Range Highest value - lowest value
Lower quartile, Q1 n 4 If n isn’t an integer, round up to find the corresponding position Where n = sample size
Upper quartile, Q3 3n 4 If n isn’t an integer, round up to find the corresponding position Where n = sample size
Interquartile range Q3 - Q1 Where… Q1 = lower quartile Q3 = upper quartile
Percentiles Split the data into 100 parts xth percentile = xn 100 Where n = sample size
Variance Represents the spread of a set of data = (∑ x )2 - ∑x 2 or = n n fx fx Remember: “Mean of the squares minus square of the mean
Standard deviation, σ √variance
Chapter 4 Representation of data
Stem and leaf diagrams
Back-to-back stem and leaf diagrams Used to compare two sets of data
Outliers Extreme values within the data Plot outliers on boxplots with an x Extreme values within the data Outlier above upper quartile, Q3: Q3 + (1.5 x interquartile range) Outlier below lower quartile, Q1 : Q1 - (1.5 x interquartile range)
Box plots Highest value Lowest value Upper quartile Lower quartile Median
3(mean – median) standard deviation Skewness 3(mean – median) standard deviation +ive number + skew -ive number -ive skew Close to 0 symmetrical
Positive skew Mode < median < mean Q2-Q1 < Q3-Q2
Negative skew Mode > median > mean Q2-Q1 > Q3-Q2
Symmetrical Mode = median = mean Q2-Q1 = Q3-Q2
Histograms Shows data distribution Continuous data No gaps between bars Area of bar α frequency
Frequency density Frequency density = frequency class width
Area = k x frequency
Chapter 5 Probability
Venn diagrams Whole rectangle represents sample space. Total probability = 1 Closed curves represent the outcomes for each event
P(A)
P(A’)
P(B)
P(B’)
P(A n B)
P(A u B)
P(A’ n B’) = P(A U B)
P(A’ U B’) = P(A n B)
P( A’ n B)
P( A n B’)
P(event A or event B or both) P(A U B)
Complementary probability P(A’) = 1 – P(A)
Addition rule P(AUB)= P(A) + P(B) – P(A B)
Conditional probability P(A given B) = P( A|B) = P(A B) P(B)
Multiplication rule P(A B) = P(A|B) x P(B) P(A B) = P(B|A) X P(A)
Independent P(A B) = P(A) X P(B) P(A|B) = P(A) P(B|A) = P(B)
Mutually exclusive P(A B) = O
Chapter 6 Correlation
Positive correlation Most points lie in 1st and 3rd quadrants Product moment coefficient correlation is closer to 1
Negative correlation Most points lie in the 2nd and 4th quadrants Product moment correlation coefficient is closer to -1
No correlation Points lie in all four quadrants Product moment correlation coefficient is O
Product moment correlation coefficient, r A measure of linear relationship r = Sxy √SxxSyy Where… Sxy = ∑xy - (∑x ∑y) n Sxx = ∑x2 - (∑x)2 Syy = ∑y2 - (∑y)2
Chapter 7 Regression
Independent (explanatory variable) The variable that is set independently of the other variable Plotted on the x-axis
Dependent (response) variable The variable whose values are determined by the values of the independent variable Plotted on the y-axis
Equation of regression line y = a + bx Gradient. For every increase in x, y increases by a factor of the gradient Y-intercept. When x is zero, y is equal to the value of a Where… b= Sxy a = y - bx Sxx
When to use the regression line When the points form/almost form a straight line
Coding in regression lines To turn the coded regression line into the actual regression line, substitute the codes into the answer
Interpolation When a value of the dependent variable is estimated within the range of the data
Extrapolation When a value is estimated outside of the range of the data Unreliable
Chapter 8 Discrete random variables
Variable Represented by X, Y, A, B etc.. Can take on any specified set of values
Random variable The value of a variable that is an outcome of an experiment, e.g. Rolling a die Discrete only on a discrete scale Continuous outcome can be any value on a continuous scale
Sample space The list of all possible outcomes of an experiment E.g. Spinning a four-sided and a three sided spinner at the same time:
Probability distribution A table showing the probability of each outcome in an experiment X 1 2 3 4 5 6 P(X=x) 1/6 Remember: All of the probabilities add up to one for discrete random variables
Cumulative distribution function, F(x) Shows the running totals of the probabilities X 1 2 3 4 5 6 P(X=x) 1/6 F(x) 2/6 3/6 4/6 5/6 6/6
Expected value, E(X) The total of the x values multiplied with the corresponding probabilities , ∑xP(X=x) E.g. (1 x 1/6) + (2 x 1/6) + (3 x 1/6) + (4 x 1/6)+ (5 x 1/6) + (6 x 1/6) = 3.5 X 1 2 3 4 5 6 P(X=x) 1/6
E(X2) Square the x values, multiply with their corresponding probabilities then total E.g. (12 x 1/6) + (22 x 1/6) + (32 x 1/6)+ (42 x 1/6)+ (52 x 1/6) + (62 x 1/6) = 91/6 X 1 2 3 4 5 6 P(X=x) 1/6
Variance of a random variable Var(X) = E(X2) – (E(X))2
E(aX+b) E(aX+b) = aE(X) + b
Var(aX+b) Var(aX+b) = a2Var(X)
Mean using coded data E.g. Y = X – 150 Mean of coded data = 5.1 50 Step 1 : rearrange making X the subject X= 50Y +150 Step 2 : Make E(X) the subject and solve E(X) = E(50Y +150) =50E(Y) +150 = 255 + 150 =406
Standard deviation of coded data E.g. Y = X – 150 σ = 2.5 50 Step 1 : Var(X) = Var( 50Y +150) =502Var(Y) = 502 x 2.52 = 15625 Step 2 : Standard deviation = √15625 = 125
Discrete uniform distribution Probabilities are the same (e.g. rolling a die) E(X) = n + 1 2 Var(X) = (n+1)(n-1) 12
Chapter 9 The Normal Distribution
Standard normal variable, Z Z ~ N(0, 12) Normal Standard deviation, σ2, is 1 “is distributed” Mean,μ , is 0
Normal distribution curve x f(x) μ α Area under curve represents probability. Total = 1 P(α<x)
Standardised curve z f(z) μ= 0 α Area under curve represents probability. Total = 1 P(α<z)
P(Z < α) Step 1 – Draw curve Step 2 - Find the probability in the table T Step 3 - look at the corresponding z value to find the value of α z α
P(Z > α) Step 1 – Draw curve Step 2 - Find P(Z < α) Step 3 – Subtract the answer from 1 z α
Random variable, X X ~ N (μ, σ2)
Finding z from Random variable, X If you are given a random variable, X, (e.g. 180kg) rather than z, find z-value using… Z = X – μ σ
Random variable example Find P(X < 53) given the random variable X ~ N(50, 42)… Step 1 – Sub-in values: P z < 53 -50 = 0.75 4 Step 2 – Find probability using table. P(Z<0.75) = 0.7734
Simultaneous equations to find σ and μ E.g. P(X>35) = 0.05, P(X<15) = 0.1469 Step 1- Draw curve Step 2 – Look at table to find z values for each Step 3 – Sub into Z = X – μ for each value σ Step 4- Use substitution method to obtain μ and σ
Probability between two values E.g. P(168 < z < 174), σ = 3.5, μ= 165 Step 1 – Draw curve Step 2 – Sub in values into and solve Z = X – μ σ This obtains P(o.86 < z< 2.55) for this example P(Z<2.55) – P(Z<0.86) = 0.9946 – 0.8051 = 0.1895