Presentation is loading. Please wait.

Presentation is loading. Please wait.

UE/BiTS Hamburg Summer term 2018

Similar presentations


Presentation on theme: "UE/BiTS Hamburg Summer term 2018"— Presentation transcript:

1 UE/BiTS Hamburg Summer term 2018
Stochastics Prof. Dr. Stefan Kooths UE/BiTS Hamburg Summer term 2018

2 Contact data Prof. Dr. Stefan Kooths Head of Forecasting Center Kiel Institute for the World Economy Office Berlin In den Ministergärten Berlin 030/

3 The Kiel Institute for the World Economy
Forecasting Center

4 Be smarter than your phone …

5 Stochastic = randomly determined
Stochastics Stochastic = randomly determined Tossing a coin Rolling a die Stochastics (“science of making guesses”) We know more than nothing (possible alternatives) … … but not which alternative comes out for sure Dealing systematically with risk and uncertainty (making the best use of our limited knowledge)

6 Statistics and stochastics
Statistics: Description of data (e.g. mean value) Stochastics: Reasoning about data (e.g. expected value)

7 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

8 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

9 Basic concepts Random experiment Elementary events Sample space ()
can be infinitely repeated has a well-defined set of two or more possible outcomes has a concrete outcome that is unknown beforehand refers to any process of observation or measurement Elementary events are individual, mutually exclusive outcomes of a random experiment Sample space () is the set of all possible distinct outcomes of a random experiment Event is a subset of  that is composed of one or more elementary events

10 Examples of random experiments
Tossing a coin Possible outcomes: Head and tail  = {head, tail} Rolling a die Possible outcomes: Numbers 1 to 6  = {1, 2, 3, 4, 5, 6} Rolling two dice Possible outcomes: …  = …

11 Sets are used to formally handle random experiments
Mathematical sets Set Collection of different objects forming a whole Typically denoted by capital letters (A, B, C …) Symbol for empty sets:  Element Individual object of a set Sets are used to formally handle random experiments

12 Set operations

13 Exercise “set operations”
Definitions  = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} Set A: All even numbers in  Set B: All odd numbers in  Set C: All multiples of three in  Tasks Write A, B, and C in set notation Write A  B, A  B, A  C, A  C, and A\C

14 Example: Single toss of a balanced coin
Probability trees Probability tree Represents all possible outcomes of a random experiment From a starting point, draw as many branches as there are elementary events New level for each consecutive event Probabilities are noted next to the branches Example: Single toss of a balanced coin

15 Exercise “probability trees”
You have a box containing eight intact lightbulbs and two defective lightbulbs. You randomly choose a lightbulb from the box. What is the probability of that the chosen lightbulb is intact? defective? Draw a probability tree for this experiment. Draw a probability tree for two tosses of a balanced coin.

16 Subjective definition
Probability Classical definition Deducted form combinatory reasoning if all elementary events are equally likely (theoretical sample) Empirical definition Derived from experience and observation Subjective definition Reflecting personal degree of belief (“derived” from experience and intuition)

17 Subjective definition
Examples Classical definition Roll a fair die. Let A = {1,2}. Favorable cases: 1 and 2 Possible cases (): 1, 2, 3, 4, 5, 6 P(A) = 2/6 = 1/3 Empirical definition In the city of Alphaville, 5344 people took the driving test last year passed the test. For an individual taking the test, the probability of passing (if no other information is known!) can be estimated as P(passing the driving test) = 4530/5344  0.85. Any ideas for improving the estimation? Subjective definition A soccer trainer estimates that the probability of winning the champions league is 40 percent.

18 Disjoint and non-disjoint sets
Have no element in common A and B are disjoint, if their intersection is the empty set: A  B =  Mutually exclusive sets (A, B, C, …): All pairs are disjoint Non-disjoint sets Have at least one element in common A and B are non-disjoint, if their intersection is not the empty set: A  B ≠  A B A B

19 Axioms of Probability Non-negativity Unitarity Additivity
The probability of an event is a non-negative number 0  P(A) Unitarity The probability of the certain event is always equal to one P() = 1 Additivity The probability of mutually exclusive events is the sum of the individual probabilities. P(A  B  C  D  …) = P(A) + P(B) + P(C) + P(D) … Andrey N. Kolmogorov (1903 – 1987)

20 Basic rules following from Kolmogorov’s axioms
0  P(A)  1 for any subset A of  P() = 0 P(A)  P(B) if A is a subset of B, and A and B are subsets of  P(A) = 1 – P(A‘) with A‘ = \A

21 Exercises “Rules of Probability” “20-sided die”

22 Exercices (cont.) “Computer sales”

23 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

24 Adding probabilities Object of interest: Probability of observing event A OR event B P(A  B)

25 P(A  B) = P(A) + P(B) – P(A  B)
Adding probabilities Event A, event B: Any two events in the sample space  A   B   General addition rule: P(A  B) = P(A) + P(B) – P(A  B) Example In a city, there are two daily newspapers, The Sun and The Post. 22 percent of all households read The Sun, 35 percent read The Post, and 6 percent read both. You chose a household at random: What is the probability that they read a newspaper? Illustrate your reasoning by a Venn diagram. Avoiding double-counting!

26 Multiplying probabilities
Object of interest: Probability of observing event A AND event B P(A  B) Joint-probability

27 Example: Tossing a coin twice
Independent events Definition Two events are called independent if the occurrence of either one does not affect the probability of the other one Example: Tossing a coin twice Two-stage random experiment (first toss, second toss) A = Head in the first toss B = Head in the second toss Probability of head in the first toss and head in the second toss?

28 Dependent events Definition
Two events are called dependent if the occurrence of one event affects the probability of the other event Example: Drawing two cards from a deck of 32 cards (without putting back the first card) Two-stage random experiment (first draw, second draw) A = King in the first draw B = King in the second draw Probability of drawing two kings?

29 Conditional probability
Probability of event B given that event A has occurred P(B|A) = P(A B) P(A) with: P(A) > 0 P(B|A) > P(A  B) if A   Additional information! A B A  B Joint-probability No information on occurrence of A (A  B) is related to  A B A  B Conditional probability Information on occurrence of A (A  B) is related to A

30 Conditional probability: Example “taxi”
A market research agency studied 50 taxi companies in Alphaville to find out about customer satisfaction. In their survey they differentiated between old (more than 10 years in business) and new (less than 10 years in business) taxi companies. What is the probability that a random passenger catches a taxi from a new company and that the service is good? A passenger enters a taxi from a new company. What is the probability that the service is good? A passenger complains about bad service. What is the probability that he caught a taxi from a new company? Good service Bad service Old companies 16 4 New companies 10 20

31 Multiplication rule A and B are any two events in the sample space  P(A  B) = P(A)P(B|A) P(A  B) = P(B)P(A|B)

32 Independent and dependent events revisited
Marginal (= unconditional) probability P(A) = Probability of A P(B) = Probability of B Conditional probability P(A|B) = Probability of A given B P(B|A) = Probability of B given A Independent events: Conditional probability = unconditional probability P(A|B) = P(A)  P(B|A) = P(B) P(A  B) = P(A)P(B) Dependent events Conditional probability  unconditional probability P(A  B) = P(A)P(B|A) = P(B)P(A|B)

33 Marginal probabilities
“Taxi” example (cont.) Good service Bad service /n Old companies 16 4 New companies 10 20 Marginal probabilities Marginal probabilities What is the probability that a random passenger catches a taxi from an old (new) company? What is the probability that a random passenger enjoys good (bad) service on the next taxi ride? Stochastic dependency Are service quality and company age stochastically dependent or independent?

34 Theorem of total probability
Sample space () A8 A1 A4 P(A1  B) B A6 A2 A9 A3 A5 A7 A1, A2, …, Ak: Partition of the sample space (here: k = 9) Mutually exclusive Their union equals  P(B) = P(B|A1)P(A1) + P(B|A2)P(A2) + … + P(B|Ak) P(Ak) P(Ak  B)

35 Theorem of total probability: Example “construction”
The completion of a construction job may be delayed because of a strike. The probabilities are: 60 percent that there will be a strike 85 percent that the construction job will be completed if there is no strike 35 percent that the construction job will be completed it there is a strike Events Event A1: There will be a strike Event A2: There will be no strike Event B: Construction job will be completed Forecast: What is the probability that the construction job will be completed?

36 Theorem of total probability: Example “bad parts”
Bayes’ Theorem

37 Combines the multiplication rule and the theorem of total probability
Bayes’ Theorem Combines the multiplication rule and the theorem of total probability Interprets statistical dependency as cause and effect and allows to identify probable causes for observed effects 𝑃 𝐴 𝑗 𝐵 = 𝑃( 𝐴 𝑗 )∙𝑃(𝐵| 𝐴 𝑗 ) 𝑃(𝐵) = 𝑃( 𝐴 𝑗 )∙𝑃(𝐵| 𝐴 𝑗 ) 𝑖=1 𝑘 𝑃( 𝐴 𝑖 )∙𝑃(𝐵| 𝐴 𝑖 ) It is known that B has occurred. What is the probability that B was caused by Aj? Thomas Bayes (1701 – 1761)

38 Interpreting Bayes’ Theorem using probability trees
P(B|A1) B P(A1)P(B|A1) P(A1) A2 P(B|A2) B P(A2) P(A2)P(B|A2) P(Aj) P(B|Aj) P(Aj)P(B|Aj) P(Aj) Aj P(B|Aj) B P(Aj)P(B|Aj) P(Ak) Ak B P(B|Ak) P(Ak)P(B|Ak) The probability that event B was reached via the jth branch of the probability tree is the ratio of the probability associated with the jth branch to the sum of the probabilities associated with all k branches of the tree.

39 Bayes’ Theorem: Example “HIV test”
Person: neg. Person: pos. Test result: neg. Test result: pos. Germany 2013 80,000 infected persons out of 82,000,000 inhabitants HIV test 99.7 percent of all infected persons are correctly diagnosed 98.5 percent of all non-infected persons are correctly diagnosed Analysis A test result indicates an HIV infection (= event B). What is the probability that the tested person is actually infected (= event A1)? What would be the consequences of a compulsory, country-wide HIV test? Source: ZEIT ONLINE (Math up your life!) : Ein einziger Aids-Test reicht nie zur Gewissheit, 1. Dezember 2014.

40 Example “HIV test” (cont.)
Sample space () = German population A1 (HIV+) B (pos. HIV-test) A2 (HIV-)

41 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

42 Classical probability: Counting vs. combinatorics
P(A) = number of favorable cases for event A number of all possible cases = |A| || Example Rolling two fair dice. What is the probability to get 7 as the sum of both dice (= event A)? A = {(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}  |A| = 6  = {(1,1),(1,2), …, (6,5),(6,6)}  || = 36 P(A) = 6/36 = 1/6 Two options Counting (and counting and counting …) Thinking = combinatorial analysis

43 Permutation = Arrangement of n distinct objects
Permutations Permutation = Arrangement of n distinct objects Example “Colored cards” Distinct objects = three colored cards (n=3) How many different arrangements (combinations) exist for a sequence of these cards?  = {(red,blue,green),(blue,red,green),(green,blue,red), …} First position: 3 possibilities Second position: 2 (remaining) possibilities Third position: 1 (resulting) possibility || = 321 = 3! = 6 Read: “3 factorial”

44 Example “Colored cards” (cont.)
(red, blue, green) (red, green, blue) (blue, red, blue) || = 6 (blue, green, red) (green, red, blue) (green, blue, red)

45 Exercise “Soccer team line-up”
How many ways are there to line up the players of a soccer team (6 in the back row, 5 in the front row)?

46 Example colored cards (cont.)
Choosing only two out of the three cards N = 3 (number of objects to choose from) n = 2 (number of objects in each permutation) Drawing cards from the stack with and without replacement With replacement = a color can occur more than once Without replacement = a color cannot occur more than once How many possibilities exist to choose two cards when … … cards are not replaced and the order matters? … cards are replaced and the order matters? … cards are not replaced and the order does not matter? … cards are replaced and the order does not matter?

47 Example colored cards (cont.)
Results Without replacement With replacement Order matters a) b) Order does not matter c) d)

48 The workhorse of combinatorics: The urn model
Mental model to systemize the number of possible permutations if n out of N numbered balls are drawn Sometimes the order of the balls matters, sometimes it does not Sometimes the balls are drawn and put back into the urn (= with replacement), sometimes they are drawn without replacement Without replacement With replacement Order matters N! N−n ! N n Order does not matter N + n − 1 n

49 The Binomial Coefficient
Number of ways to choose an (unordered) subset of n elements from a fixed set of N elements N n = N! n!⋅ N−n ! Read: “N choose n”

50 Example “German national lottery”
Six balls are randomly drawn without replacement from a drum containing 49 numbered balls (numbered 1, 2, …, 49). The order is irrelevant. How many possible outcomes are there? What is the probability to get all 6 numbers right? What is the probability to get 5 out of the drawn 6 right?

51 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

52 Values of a random variable (realizations, observations)
Random variables Random variable Rule that assigns numbers to the outcomes of a random experiment Denoted by capital letter (e.g. X) Domain: Sample space () Codomain: ℝ Values of a random variable (realizations, observations) Usually real-valued Denoted by corresponding lowercase letters (e.g. x1, x2, …) X:   ℝ

53 Random variables: Examples
Random experiment: Rolling two dice;  = {(1,1), (1,2) …, (6,6)} Random variable: X = Sum of the numbers shown on both dice X  {2,3,4,…,12} Example 2 Random experiment: A coin is tossed twice Random variable: Y = Number of heads Y  {0,1,2}

54 Random variables: Examples (cont.)
Random experiment: Waiting at bus stop (bus frequency: 20 min) Random variable: X = Waiting time until next bus arrives in minutes X  [0;20] Example 4 Random experiment: Married couple is surveyed Random variable: Y = Joint income of both partners in euro Y  [0;[ Example 5 Random experiment: Lightbulb is chosen from production process Random variable: Z = Durability of chosen lightbulb Z  [0;[

55 Discrete and continuous random variables
Discrete random variable Domain has a finite (countably many) number of realizations Examples 1 and 2 Continuous random variable Domain has an infinite (uncountably many) number of realizations Examples 3 to 5 More on continuous random variables in chapters 6 and 7

56 Probability functions
Probability function f(x) Assigns probability to every value x of a discrete random variable X f(xi) = P(X = xi) = pi Can be a table, a function rule, or a mathematical formula Example: Number of heads in two coin tosses Table: Function rule: f x = for x= for x= for x=2 0 elsewhere Mathematical formula: f(x) = 2 −|x − 1| 4 x 1 2 f(x) 0,25 0,5

57 Requirements for probability functions
(1) For each value x there is exactly one function value f(x) (2) For all values x the function values are between 0 and 1 0  f(x)  1 (3) The sum of all function values of all n realizations is always 1 i=1 n f xi = 1 Exercise Find the probability function for example 1 and check whether the above requirements are met. Can the function given by f(x) = 𝑥 for x  {1,2,3,4,5} serve as the probability function of a discrete random variable?

58 Distribution functions
Distribution function F(x) Cumulated probability function Gives the probability that the random variable is less than or equal to some real number x F(x) = P(X  x) = 𝑥 𝑖 ≤𝑥 f( 𝑥 𝑖 ) F() = 0 and F() = 1 Example: Number of heads in two coin tosses f x = for x= for x= for x=2 0 elsewhere 𝐹 𝑥 = 0 for x < for 0≤x< for 1≤x<2 1 for x≥2 F(0) = f(0) F(1) = f(0) + f(1) F(2) = f(0) + f(1) + f(2)

59 Example 1 (sum of numbers shown by two dice): f(x) and F(x)

60 Describing probability distributions
Characteristic parameters of a random variable Expected value: E(X) =  Variance: V(X) = 2

61 Expected value E(X) Properties
Corresponds to arithmetic mean in descriptive statistics Represents the average value we would get it we repeated the random experiment many times Weighs all realizations by their probabilities E(X) =  = xi xi∙f(xi) Properties E(c) = c with c = const. E(X + c) = E(X) + c E(cX) = cE(X)

62 Variance and standard deviation
V(X) Describes the scatter of realizations are around the expected value Average squared deviation from the expected value if the random experiment was repeated many times Weighs all squared deviations by their probabilities V(X) = 2 = E((X - )2) = 𝑖=1 𝑛 ( 𝑥 𝑖 −)2∙𝑓( 𝑥 𝑖 ) = 𝑖=1 𝑛 𝑥 𝑖 2 ∙𝑓( 𝑥 𝑖 ) - 2 = E(X2) - 2 Standard deviation:  = V X Properties V(c) = 0 with c = const. V(X + c) = V(X) V(cX) = c2V(X)

63 Exercise: Average deviation from expected value
Why does the simple (= non-squared) average deviation from the expected value not tell us anything about the scatter of realizations around their mean? 𝑖=1 𝑛 ( 𝑥 𝑖 −)∙𝑓( 𝑥 𝑖 ) = …

64 Example 1: Mean and variance
kooths-stochastics-Chapter4-Example1.xlsx

65 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

66 Usefulness of specific discrete distributions
Family of similar random experiments (principle) Application to concrete problems by adapting parameters Distributions Binomial Hypergeometric Poisson

67 Example: Repeated flipping of a coin (fair or biased)
Bernoulli experiment Random experiment Two disjoint elementary events (success, failure) Any two experiments are statistically independent (drawing with replacement) Probability of success p is constant (consequently, probability of failure q = 1-p = const.) Example: Repeated flipping of a coin (fair or biased) Two disjoint outcomes: Head or Tail “Coin has no memory”, so repeated runs are independent The probability of Head is const, so is the probability of Tail p = q = 0.5 (fair coin) p  0.5, q = 1-p  0.5 (biased coin) Jakob Bernoulli (1655 – 1705)

68 Experimental learning
Rolling a fair die 8 times (or 8 fair dice simultaneously) X = number of dice showing a 6

69 Experimental learning
Round Number of dice showing “6” relative frequency 1 2 3 4 5 6 7 8 9 10

70 Binomial distribution
Repeated Bernoulli experiments p = const. = probability of success n = number of experiments Random variable X = number of successful outcomes P(X=x) : probability of x successful outcomes Solution Multiplication rule for independent events Combinatorics (urn model: irrelevant order, without replacement) P(X|n,p) = n x pxqn-x = n x px(1-p)n-x number of possible outcomes for X=x probability of one possible outcome for X=x

71 Bernoulli experiment: Urn model interpretation
Urn with pN black and (1-p)N white balls Probability of drawing x black balls order does not matter with replacement Note: Illustration for random experiment, NOT for combinations of possible outcomes!

72 Binominal distribution: Table
Probabilities of all x for typical values of n and p Example “customers”: The probability that a randomly selected customer makes a purchase is 20 percent What is the probability that 4 out of 6 customers make a purchase? What is the probability that no more than 3 customers decide to buy? What is the probability that at least one customer makes the purchase?

73 Binomial distribution: Expected value and variance
Expected value: E(X) = np Variance: V(X) = npq = np(1-p) Exercises Calculate and interpret the expected value and the variance for the customers example on the previous slide Calculate and interpret the expected value and the variance for p = 1 p = 0

74 Hypergeometric distribution
Urn with M black and N-M white balls Drawing n balls Success = drawing a black ball, X = number of successes Probability of drawing X=x black balls order does not matter without replacement  different from binomial distribution  p is no longer constant as events are statistically dependent P(X|N,M,n) = M x N−M n−x N n

75 Hypergeometric distribution: Examples
Example “employees” Out of 6 employees, three have been with the company for more than 5 years. If we now randomly select 4 employees, what is the probability that exactly two of them have been with the company for more than 5 years? What is N, what is M, what is n? Example “lottery 6 out of 49” What is the probability to guess four numbers right in a lottery where 6 out of 49 numbered balls are drawn?

76 Hypergeometric distribution: Expected value and variance
Expected value: E(X) = n M N Variance: V(X) = n M N (1 - M N ) N−n N−1

77 P(X|) = xe− x! E(X) = V(X) = 
Poisson distribution Probability of a number of events in a fixed interval of time/space/distance for which the average number of  can be expected Two events do not occur at exactly same time The probability for an event in a very short interval is very small (distribution of rare events) The probability of a success is proportional to the size of the interval For mutually exclusive intervals the number of successes are independent P(X|) = xe− x! E(X) = V(X) =  Denis Poisson (1781 – 1840 )

78 Poisson distribution: Examples
Events: Phone calls arriving at a telephone hotline. λ = average number of calls arriving per hour. Events: Defects in the insulation of an undersea cable. λ = average number of defects per kilometer of cable. Events: Houses sold by a real estate company. λ = average number of houses sold per day. Events: Electrons emitted by a piece of radioactive material. λ = average number of electrons emitted per minute Events: Defects in sheet metal. λ = Average number of defects per square meter of metal.

79 Poisson distribution for approximations
Binomial distribution n   (high number of repetitions, n > 10) p  0 (small probability of each event, p < 0.05)  = np Hypergeometric distribution n is large (high number of repetitions, n > 10) M/N is small (M/N < 0.05) N is large relative to n (n/M < 0.05)  = n  M N

80 Poisson distribution: Exercise “subway crime”
It is known that in the subway system of Alphaville, there are on average four pickpocketing incidents per hour. What is the probability … a) … that there are no pickpocketing incidents in a given hour? b) … that there are seven pickpocketing incidents in a given hour? c) … that there are 50 incidents in a given day?

81 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

82 Discrete and continuous random variables (recap)
Discrete random variable Domain has a finite (countably many) number of realizations Examples 1 and 2 (slide 53) Each value of X can be assigned a probability P(X=x) = f(x) Probability function Continuous random variable Domain has an infinite (uncountably many) number of realizations Examples 3 to 5 (slide 54) Each individual value of X has a probability of zero  focus on intervals Density function

83 Density function Definition
Areas under the curve of this function give the probabilities associated with the corresponding intervals along the x-axis Example “bus stop” Busses depart every ten minutes from a bus stop. If you arrive at the bus stop at a random time, the waiting time X until the next bus departs is a continuous random variable which can take any value between zero minutes and then minutes. f(x) = for 0≤x≤ otherwise Sketch f(x) and determine (visually) the probability that the waiting time is between five and seven minutes

84 Density function f(x) and distribution function F(x)
P(a  x  b) = a b f x dx f(x)  0 P(-  x  +) = − + f x dx = 1 Distribution function F(x) = P(X  x) = − x f t d t

85 Expected value and variance
E(X) =  = − + xf x dx Variance V(X) = 2 = E((X - )2) = − + (x − )2f x dx = − + x2f x dx −2 Example “bus stop” (cont.) Calculate the expected value and the variance for the waiting time at the bus stop.

86 Uniform distributions
Definition Density function is constant over some interval [a;b] f(X) = 1 b−a for a≤x≤𝑏 otherwise Expected value E(X) =  = − + xf x dx = a+b 2 Variance V(X) = 2 = E((X - )2) = − + (x − )2f x dx = (b−a)2 12

87 Normal distribution (= Gauss distribution)
Cornerstone of modern statistical theory Many random processes follow a normal distribution Other distributions can be approximated by the normal distribution (under certain conditions) All expected values are normally distributed for sufficiently large sample sizes (n > 30) Discovered in the 18th century (typical pattern for measurement errors) Abraham de Moivre (1667 – 1754) Carl Friedrich Gauß (1777 – 1855)

88 Normal distribution: The (magic) formula
A random variable X has a normal distribution (= is normally distributed) if its density function is given by N(X|; 2) = f(x) = 1 σ 2π e − x−μ σ 2 Expected value:  Variance: 2 Bell-shaped graph

89 How  and  affect the shape of the Gaussian density function
Example:  = 4 and  = 1.5

90 How  and  affect the shape of the Gaussian density function
 shifts the graph horizontally

91 How  and  affect the shape of the Gaussian density function
 stretches or compresses the graph around its center

92 Z-transformation (standardization): The standard normal distribution
Z-transformation: Z = X − μ σ Expected value: E(Z) = 0 Variance: V(Z) = 1 Simplifies tabulation of values of the normal distribution Step 1: Standardize normally distributed X Step 2: Lookup probabilities of the standardized normal distribution

93 Standard normal distribution: Table

94 Exercises X is a N(10; 25)-distributed variable. Use the table of the standard normal distribution and determine: P(0  X  11) P(8  X  12) P(X  15) A radio station has conducted a survey among 760 listeners. The result is that a certain rock program is listened to for 10 minutes on average. The standard deviation is 2 minutes. Why is the information on the number of surveyed listeners important? What percentage of all the listeners is tuned in for a duration between 9 and 11 minutes?

95 Exercise “electronic parts”
The life span of an electronic part is normally distributed and the expected use time is 2,000 hours while the standard deviation is 200 hours. Illustrate the z-transformation that turns this distribution into a standard normal distribution with the help of a sketched diagram. Determine the probability that one randomly chosen part has a use time between 2,000 and 2,400 hours. Do this both graphically with your sketched diagram and with the help of the table of the standard normal distribution. What is the probability that the life span is less than 1,800 hours? What is the probability that the life span is greater than 2,200 hours? What is the probability that the life span is greater than hours and less than hours? Which use time will not be exceeded with a probability of 90 percent? What is the maximum use time of those 20 percent of parts with the lowest quality (i.e. use time).

96 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

97 Exponential distribution
Random variable T Time that elapses between two single events that follow a Poisson distribution P(T  t) = 1 – e-t = 1 – 1 et E(T) = 1 λ V(X) = 1 λ2

98 On average an office receives 5 calls per hour.
Examples On average an office receives 5 calls per hour. What is the probability that there will be one incoming call in the next 30 minutes? A speed control identifies 4 speeding incidents per hour on average. What is the probability that they find one speeder in the next 20 minutes?

99 Approximations of distributions
Generally used if the actual distribution is difficult to compute. The quality of an approximation always depends on the parameters of the actual distribution. Trade-off between accuracy and calculation effort Of course, an approximation is never perfect (trade-off between accuracy and calculation effort). But for most purposes and under most circumstances it is well enough and it saves a lot of work. Review: Poisson distribution (slide 79) Binomial distribution Hypergeometric distribution

100 Approximations using the normal distribution
Binomial distribution (for large sample sizes) n  30 np  5 nq = n(1–p)  5 Poisson distribution (for large )   10 Normal distribution  = np  = npq  =   = 

101 Approximations for the Binomial distribution: Example 1
Random experiment: 50-time coin toss. What is the probability to obtain 20 tails? Exact solution P(X=20|n = 50, p = 0.5) = 0.520 0.530 = Approximation n = 50  30  np = 500.5 = 25  5  nq = 500.5 = 25  5  P(X=20|n = 50, p = 0.5)  P(19.5  X  20.5| = 25,  = ) z1 = − = z2 = − = -1.27 P(z1  Z  z2) = – =

102 Approximations for the Binomial distribution: Example 2
Random experiment: 50-time coin toss. What is the probability to obtain anything between 20 and 35 tails? Exact solution … (very tiring) Approximation n = 50  30  np = 500.5 = 25  5  nq = 500.5 = 25  5  P(20  X  35|n = 50, p = 0.5)  P(19.5  X  35.5| = 25,  = ) z1 = − = z2 = − = 2.97 P(z1  Z  z2) = =

103 Approximations for the Binomial distribution: Example 3
The call of a sales agent ends with a sale in 20 percent of the cases. If he makes 30 phone calls, what is the probability that he makes at least 10 sales?

104 Poisson distribution for increasing values of 

105 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

106 Statistical inference
Sample statistical inference Population Collecting a sample and analyzing the sample data to infer properties (e.g. mean value) about a population, which is larger than the observed sample data set Point estimate Interval estimate Hypothesis testing (next chapter) Less precise than a full census, but cheaper/faster/feasible

107 Samples and populations
statistical inference Population (size = N) Sample (size = n) Samples have statistics (Latin letters) Sample mean: x Sample variance: s2 Sample standard deviation: s Random variables! Populations have parameters (Greek letters) Mean:  Variance: 2 Standard deviation:  Typically unknown but of interest _

108 Sample statistics and point estimation
Sample mean: 𝑥 = 1 𝑛 𝑖=1 𝑛 𝑥𝑖 Sample variance: s2 = 1 𝑛−1 𝑖=1 𝑛 (𝑥𝑖− 𝑥 )2 Sample standard deviation: s = 𝑠 2 When we use the value of a sample statistic to estimate a population parameter, we call this point estimation and we refer to the statistic as a point estimator of the parameter. Random variables Random variable meaningful for n>1 only

109 Desired characteristics of point estimators
Unbiasedness: Expected value of point estimator = population parameter (no systematic tendency to underestimate or overestimate the truth) Small standard error* (= high efficiency): High concentration of point estimator around population parameter *standard error = standard deviation of the point estimator

110 Example There are five houses in Betastreet of Alphaville (= population). The number of children per house is as follows: Population mean:  = 1 For budgetary reasons, a survey company can only sample two houses in the street (n=2). If the randomly chosen houses are #3 and #4, then the sample mean is x = 0.5 _

111 Unbiasedness of a point estimator: Example (cont.)
Expected value of point estimator = population parameter E(x) = 1 =   x is an unbiased estimator of  _ _

112 Law of large numbers and standard error
The larger the sample size the higher the precision of the point estimators Law of large numbers: The sample mean approaches the population mean as the sample size grows Standard error of sample means (= standard deviation of the distribution of the sample means): X = σ n _

113 Interval estimation: Confidence intervals for the mean
Point estimator says nothing about its precision Interval estimation: Finding the range (based on a sample) that contains the true parameter of the population with a given probability (= confidence level ) Requires information on the distribution of the point estimator (here: sample mean)

114 The statisticians’ workhorses
Normal distribution for the sample mean Sample size is large (n  30): Central limit theorem (see next slide) Sample size is small (n < 30) and the population is normally distributed and  is known Student’s t-distribution for the sample mean Sample size is small (n < 30) and the population is normally distributed and  is unknown

115 Central limit theorem Draw a random sample of size n from any population with mean μ and standard deviation σ. When n > 30, the random variable X (= sample mean) follows a normal distribution with μX = μ X = σ n regardless of the population distribution from which the data are sampled. _ _ _

116 Normal distribution and standard normal distribution
-1 +1 -2 +2 -3 +3 z

117 Solving Z-transformation for population mean
_ _ Z-transformation: Z = X − μ σX   = X – Xz Confidence level   z-value (from table) 1 = 90 percent  z1 =  1.64 2 = 95 percent  z2 =  1.96 3 = 99 percent  z3 =  2.58 Confidence intervals P(X – Xz1    X + Xz1) = 0.90 P(X – Xz2    X + Xz2) = 0.95 P(X – Xz3    X + Xz3) = 0.99 _ _ _ _ _ _ _ _ _ _ _ _ _ _

118 Confidence intervals: Example “exam results”
An exam was taken by 194 students (= population) Arithmetic mean of results:  = 64.12 Standard deviation of results:  = 27.73 10 samples of size n = 30 Confidence intervals for confidence levels of 90 %, 95 %, 99 %

119 Confidence intervals: Example “exam results” (cont.)
With increasing confidence level the intervals become larger Intervals differ for each sample True value not always enclosed by interval

120 „Vertrauen ist gut, Konfidenz ist besser“
Source: Oestreich and Romberg (2010)

121 Survey design: Precision and necessary sample size
Solving the Z-transformation for n (if  is known) Z = X − μ σX = X − μ σ n = A σ n n = z A 2 Precision (= acceptable deviation) Degree of certainty _

122 Excursus: Student’s t-distribution (n < 30,  unknown)
Standard normal distribution t-distribution Previous plots for t-distribution degrees of freedom v = n-1 Source:

123 Outline Probabilities I Probabilities II Permutations and combinations Discrete random variables and distributions Specific discrete distributions Continuous random variables and their distribution I Continuous random variables and their distribution II Estimation and confidence intervals Statistical hypothesis test

124 Statistical inference
Sample statistical inference Population Collecting a sample and analyzing the sample data to infer properties (e.g. mean value) about a population, which is larger than the observed sample data set Point estimate Interval estimate Hypothesis testing Less precise than a full census, but cheaper/faster/feasible Similar calculations

125 General approach to hypothesis testing
Hypothesis for the value of a population parameter Example: Mean filling level of beer bottles is 0.5 litres (0) Take random sample and calculate sample parameter (x) Example: 100 bottles are checked for filling level, x = 0.498 Check whether sample parameter is in line with 0 Using probability distributions one evaluates deviations between hypothetical value and sample value How likely is the observed sample value? Goal: Rejecting the so-called null hypothesis (H0) in favor of the alternative hypothesis (H1) Court analogy: Not guilty (H0) vs. guilty (H1)

126 Error types and level of significance
Reality H0 is true H1 is true Test result Reject H0 Type I error P(Type I error) =  Correct decision Do not reject H0 Type II error P(Type II error) =  Stochastical evidence is based on probability distributions, hence the jugdement is never for 100 percent sure!  = level of significance (= error probability)

127 Test design: Two-sided and one-sided
Two-sided tests H0:  = 0 H1:   0 Error probability is /2 for values from the sample that are too high, and /2 for values from the sample that are too low One-sided tests (upper bound) H1:  < 0 Error probability is  for values from the sample that are too low One-sided tests (lower bound) H1:  > 0 Error probability is  for values from the sample that are too high

128 Rationale of test statistics
The test is based on a sample statistic that estimates the population parameter that appears in the hypotheses. Usually this is the same estimate that we would use in a confidence interval for the parameter. If H0 is true then the estimate should take a value near the parameter value specified by H0. Values of the estimate far from the parameter value specified by H0 give evidence against H0. The alternative hypothesis determines which directions count against H0. To assess how far the estimate is from the parameter, we must standardize the estimate. In many common situations the test statistic (standardized estimate) has the form: test statistic = estimate − hypothesized value standard deviation of the estimate

129 One-sided tests: Example

130 Critical value and rejection region
Test design (H0, H1) Level of significance ( = probability of type I errors) Assumptions on probability distribution in population Critical value for rejecting H0 (separates rejection region from the rest of the data space) Value of test statistic (xL) Value of standardized test statistic (z) _

131 Critical value: Example

132 Critical value: Example (cont.)

133 Identical decision for rejection of H0
p-value approach Critical value Given level of significance  critical value if H0 holds Compare value of test statistic with critical value p-value Probability of observing test statistic if H0 holds (= p-value) Compare p-value with level of significance Identical decision for rejection of H0

134 Critical values and p-value approach: Example

135 Trade-off between Type I and Type II errors
Calculation of type II errors requires crisp value for alternative hypothesis Example H0:  = 170 H1:  = 180  = 0.05  = 65 n = 400

136 Summary: Steps for hypothesis testing
Formulate hypotheses (H0 and H1, one-sided or two-sided) Sample data from the population of interest Compute the sample statistics you need from the data Compute the test statistic from the sample statistics Convert the test statistic into a probability (p-value) Formulate a conclusion regarding H0


Download ppt "UE/BiTS Hamburg Summer term 2018"

Similar presentations


Ads by Google