Statistics -S1.

Statistics -S1

Chapter 1 Mathematical models in probabilities and statistics

Mathematical models A simplification of a real world situation
Adv.: quick and easy to produce, can simplify a more complex situation, enables for predictions to be made and can help to provide control Disadv.: only give a partial description of real situation and they only work for a certain range of values

Chapter 2 Representation and summary of data- location

Quantitative variables
Variables associated a numerical value (e.g. height)

Qualitative variable Variables which do not have a numerical value (e.g. hair colour)

Continuous variable A variable that can take any value in a given range

Discrete variable A variable that can only take integral values within a given range

Mode/ modal class The value or class that appears most often
E.g. 1, 2, 2, 2, 3, 3 ,4, 5 mode= 2

Mean x = ∑ x n Where… n = no. of observations
∑ x = sum of observations x = mean of the sample

Mean of a combined set of data
x = n1x1 + n2x2 n1 + n2 Where… x = mean n = size of the sample xn = mean of individual sample

Frequency distribution table - mean
x = ∑ fx ∑ f Where… ∑ fx = frequency multiplied by class or midpoint ∑ f = sum of the frequencies x = mean of the sample

Median The middle value of ordered data
To find the position where the median lies… n 2 If the position isn’t an integer, round up Where n= number of observations

Interpolation Length of pine cone (mm) No. of pine cones, f
Cumulative frequency 30-31 2 32-33 25 27 34-36 30 57 37-39 13 70 Median= 70/2 =35th value 33.5 Q2 36.5 Q2 – = 36.5 – 27 35 57 Make Q2 the subject Q2 = 34.3

Coding Used to make large values easier to work with
General form : y = x – a b No effect on product moment correlation coefficient Coded regression line may not be the same as the actual line

Chapter 3 Representation and summary of data-dispersion

Range Highest value - lowest value

Lower quartile, Q1 n 4 If n isn’t an integer, round up to find the corresponding position Where n = sample size

Upper quartile, Q3 3n 4 If n isn’t an integer, round up to find the corresponding position Where n = sample size

Interquartile range Q3 - Q1 Where… Q1 = lower quartile
Q3 = upper quartile

Percentiles Split the data into 100 parts xth percentile = xn 100
Where n = sample size

Variance Represents the spread of a set of data = (∑ x )2 - ∑x 2 or =
n n fx fx Remember: “Mean of the squares minus square of the mean

Standard deviation, σ √variance

Chapter 4 Representation of data

Stem and leaf diagrams

Back-to-back stem and leaf diagrams
Used to compare two sets of data

Outliers Extreme values within the data
Plot outliers on boxplots with an x Extreme values within the data Outlier above upper quartile, Q3: Q3 + (1.5 x interquartile range) Outlier below lower quartile, Q1 : Q1 - (1.5 x interquartile range)

Box plots Highest value Lowest value Upper quartile Lower quartile
Median

3(mean – median) standard deviation
Skewness 3(mean – median) standard deviation +ive number  + skew -ive number  -ive skew Close to 0  symmetrical

Positive skew Mode < median < mean Q2-Q1 < Q3-Q2

Negative skew Mode > median > mean Q2-Q1 > Q3-Q2

Symmetrical Mode = median = mean Q2-Q1 = Q3-Q2

Histograms Shows data distribution Continuous data
No gaps between bars Area of bar α frequency

Frequency density Frequency density = frequency class width

Area = k x frequency

Chapter 5 Probability

Venn diagrams Whole rectangle represents sample space. Total probability = 1 Closed curves represent the outcomes for each event

P(A’)

P(B’)

P(A n B)

P(A u B)

P(A’ n B’) = P(A U B)

P(A’ U B’) = P(A n B)

P( A’ n B)

P( A n B’)

P(event A or event B or both)
P(A U B)

Complementary probability
P(A’) = 1 – P(A)

Addition rule P(AUB)= P(A) + P(B) – P(A B)

Conditional probability
P(A given B) = P( A|B) = P(A B) P(B)

Multiplication rule P(A B) = P(A|B) x P(B) P(A B) = P(B|A) X P(A)

Independent P(A B) = P(A) X P(B) P(A|B) = P(A) P(B|A) = P(B)

Mutually exclusive P(A B) = O

Chapter 6 Correlation

Positive correlation Most points lie in 1st and 3rd quadrants
Product moment coefficient correlation is closer to 1

Negative correlation Most points lie in the 2nd and 4th quadrants
Product moment correlation coefficient is closer to -1

No correlation Points lie in all four quadrants
Product moment correlation coefficient is O

Product moment correlation coefficient, r
A measure of linear relationship r = Sxy √SxxSyy Where… Sxy = ∑xy - (∑x ∑y) n Sxx = ∑x2 - (∑x)2 Syy = ∑y2 - (∑y)2

Chapter 7 Regression

Independent (explanatory variable)
The variable that is set independently of the other variable Plotted on the x-axis

Dependent (response) variable
The variable whose values are determined by the values of the independent variable Plotted on the y-axis

Equation of regression line
y = a + bx Gradient. For every increase in x, y increases by a factor of the gradient Y-intercept. When x is zero, y is equal to the value of a Where… b= Sxy a = y - bx Sxx

When to use the regression line
When the points form/almost form a straight line

Coding in regression lines
To turn the coded regression line into the actual regression line, substitute the codes into the answer

Interpolation When a value of the dependent variable is estimated within the range of the data

Extrapolation When a value is estimated outside of the range of the data Unreliable

Chapter 8 Discrete random variables

Variable Represented by X, Y, A, B etc..
Can take on any specified set of values

Random variable The value of a variable that is an outcome of an experiment, e.g. Rolling a die Discrete  only on a discrete scale Continuous  outcome can be any value on a continuous scale

Sample space The list of all possible outcomes of an experiment
E.g. Spinning a four-sided and a three sided spinner at the same time:

Probability distribution
A table showing the probability of each outcome in an experiment X 1 2 3 4 5 6 P(X=x) 1/6 Remember: All of the probabilities add up to one for discrete random variables

Cumulative distribution function, F(x)
Shows the running totals of the probabilities X 1 2 3 4 5 6 P(X=x) 1/6 F(x) 2/6 3/6 4/6 5/6 6/6

Expected value, E(X) The total of the x values multiplied with the corresponding probabilities , ∑xP(X=x) E.g. (1 x 1/6) + (2 x 1/6) + (3 x 1/6) + (4 x 1/6)+ (5 x 1/6) + (6 x 1/6) = 3.5 X 1 2 3 4 5 6 P(X=x) 1/6

E(X2) Square the x values, multiply with their corresponding probabilities then total E.g. (12 x 1/6) + (22 x 1/6) + (32 x 1/6) (42 x 1/6)+ (52 x 1/6) + (62 x 1/6) = 91/6 X 1 2 3 4 5 6 P(X=x) 1/6

Variance of a random variable
Var(X) = E(X2) – (E(X))2

E(aX+b) E(aX+b) = aE(X) + b

Var(aX+b) Var(aX+b) = a2Var(X)

Mean using coded data E.g. Y = X – 150 Mean of coded data = 5.1 50
Step 1 : rearrange making X the subject X= 50Y +150 Step 2 : Make E(X) the subject and solve E(X) = E(50Y +150) =50E(Y) +150 = =406

Standard deviation of coded data
E.g. Y = X – 150 σ = 2.5 50 Step 1 : Var(X) = Var( 50Y +150) =502Var(Y) = 502 x 2.52 = 15625 Step 2 : Standard deviation = √15625 = 125

Discrete uniform distribution
Probabilities are the same (e.g. rolling a die) E(X) = n + 1 2 Var(X) = (n+1)(n-1) 12

Chapter 9 The Normal Distribution

Standard normal variable, Z
Z ~ N(0, 12) Normal Standard deviation, σ2, is 1 “is distributed” Mean,μ , is 0

Normal distribution curve
x f(x) μ α Area under curve represents probability. Total = 1 P(α<x)

Standardised curve z f(z) μ= 0 α
Area under curve represents probability. Total = 1 P(α<z)

P(Z < α) Step 1 – Draw curve
Step 2 - Find the probability in the table T Step 3 - look at the corresponding z value to find the value of α z α

P(Z > α) Step 1 – Draw curve Step 2 - Find P(Z < α)
Step 3 – Subtract the answer from 1 z α

Random variable, X X ~ N (μ, σ2)

Finding z from Random variable, X
If you are given a random variable, X, (e.g. 180kg) rather than z, find z-value using… Z = X – μ σ

Random variable example
Find P(X < 53) given the random variable X ~ N(50, 42)… Step 1 – Sub-in values: P z < = Step 2 – Find probability using table. P(Z<0.75) =

Simultaneous equations to find σ and μ
E.g. P(X>35) = 0.05, P(X<15) = Step 1- Draw curve Step 2 – Look at table to find z values for each Step 3 – Sub into Z = X – μ for each value σ Step 4- Use substitution method to obtain μ and σ

Probability between two values
E.g. P(168 < z < 174), σ = 3.5, μ= 165 Step 1 – Draw curve Step 2 – Sub in values into and solve Z = X – μ σ This obtains P(o.86 < z< 2.55) for this example P(Z<2.55) – P(Z<0.86) = – =

Statistics -S1.

Similar presentations

Presentation on theme: "Statistics -S1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics -S1.

Similar presentations

Presentation on theme: "Statistics -S1."— Presentation transcript:

Similar presentations

About project

Feedback