Download presentation
Presentation is loading. Please wait.
Published byRalph Newman Modified over 6 years ago
1
שיטות כמותיות בחקר רשתות פרופ' רן גלעדי המח' להנדסת מערכות תקשורת
Basic Statistics Rehearsal שיטות כמותיות בחקר רשתות פרופ' רן גלעדי המח' להנדסת מערכות תקשורת
2
Goal of this chapter to go again through basics of statistics, and mainly: Confidence level Hypothesis testing Significance level t-student test Goodness of fit χ2 test
3
Fundamentals – or go back to 1st year
Given n random variables: The Mean (expected value) of is or E[ ] The variance of is or Var( ), Standard deviation is Suppose are IID random observations. Sample mean is unbiased estimator of Sample variance is unbiased estimator of Mean Median Mode
4
The normal density curve
The height of a normal density curve at any point x is given by: The standard normal distribution is the normal distribution with mean 0 and standard deviation 1, denoted as N(0,1). The standardized value of x is defined as which is also called a z-score (it means how far x is from the mean in terms of deviations).
5
The 68%, 95%, 99.7% rule
6
Point / interval estimation
A point / interval estimator draws inference about a population by estimating the value of an unknown parameter using interval or a point (single value): ? Population distribution Parameter Sample distribution Point estimator Interval estimator Statistical Inference
7
Back to basics – (or to 1st year)
by itself is a random variable, every sequence i generates So, But, we don’t know However – in simulations observations are correlated, so all the above is irrelevant. S2(n) has negative bias:
8
Confidence level & interval for m
To estimate m, a sample of size n is drawn from the population, and its mean is calculated. Under certain conditions, is normally distributed (or approximately normally distributed), thus, for large n and the central limit theorem: This means Or, eventually:
9
W.S. “Student” Gossett (1876 - 1937)
1905
10
Back to confidence level & interval
What if σ is unknown? We use S2(n) instead. If n is large enough, we use For skewed (nonsymmetrical) distributions, bigger n is required. In case of normal distribution, tn is t-distribution with n-1 df (degrees of freedom). For n>1, in this case, confidence level (C) and interval can be determined precisely:
11
The student t-distribution
12
t-student applications
Test that a sample Xi comes from a normally distributed population with μ, i.e., X=N(μ,σ2): We have 10 measurements: 10.1, 10.1, 10.5, 10.7, 10.9, 11.1, 11.2, 11.2, 11.4 and Sample average is: Can it be population average of 10.5? S2 is: Checking for average of 10.5, t is From the tables of the student t-CDF for 9 df, we find that Pr(t<2.3;9)=97.65%. Since 2.35% support this average – it is unlikely. Test difference between two samples means: Set of 6 sticks: 11.23, 11.37, 11.04, 11.14, and are compared to set of 7 sticks: 11.38, 11.03, 11.26, 11.45, 11.45, and Assuming the two sets are N(μ,σ2) and have the same variance, are they from the same origin? X1=11.255, X2=11.306; S12=0.0257, S22=0.0241, S2= Since we need the “two tailed test”: Pr(|t|>0.582)=2[1-p(0.58;11)]≈2[1-0.72]=0.56, so, it is likely they are from the same origin.
13
More applications Calculate μ with some confidence level:
10 observations are from a normal distribution with unknown mean, X=N(μ,σ2): 1.20, 1.50, 1.68, 1.89, 0.95, 1.49, 1.58, 1.55, 0.50 and Sample average is 1.34, sample variance is What is the 90% confidence interval for the average? .99 .95 .9 n-1 63.657 12.706 6.314 1 9.925 4.303 2.920 2 5.841 3.182 2.353 3 4.604 2.776 2.132 4 4.032 2.571 2.015 5 3.707 2.447 1.943 6 3.499 2.365 1.895 7 3.355 2.306 1.860 8 3.250 2.262 1.833 9 3.169 2.228 1.812 10 2.576 1.960 1.645 ∞
14
How to use student t-tables
= .05 tA, n-1 t.100 t.05 t.025 t.01 t.005
15
Concepts of Hypothesis Testing
Two hypotheses: H0 - the null hypothesis The statement of “no effect” or “no difference” = default Ha - the alternative hypothesis The statement we hope or suspect is true (when H0 is rejected) Usually one would decide on Ha first (formulate it) Significance level, α, is a measure of how far the deviation from the expected behavior to justify rejection. Significance level of a test is usually 1% or 5%. It also denotes the risk of Pr[Error I] Reject Don’t reject Decision Truth Error I √ H0 Error II not H0
16
Goodness of fit tests The oldest GOF hypothesis test is the χ2 test from 1905, by Pearson. χ2 test essentially compares a histogram to a fitted probability density function (pdf). There are n observations (Xi’s), and we want to check H0 that the Xi’s are IID, random variables with some (pdf) . Histogramming: Divide the entire range into k adjacent intervals Count the Xi’s in each interval (= ) Calculate the expected proportion of the Xi’s in each of the interval from the fitted pdf (=nj) Now build the test statistic χ2: If H0 is true, χ2 should be low. How low? Tables…
17
χ2 distribution
18
χ2 applications The Prussian cavalrymen killed by horse kicks:
L. von Bortkiewicz, “Das Gesetz der Kleinen Zahlen”, Teubner, Leipzig, 1898. Records of 20 years, from 10 army corps (200 samples) Data: Total number of deaths: So, average number of deaths per year per corp: H0: number of deaths are Poisson-distributed, i.e., This is a composite hypothesis, since depends on unspecified . So, estimate and calculate Npk . Finally, calculate χ2 = 0.32, and compare to χ2 with df 2. χ2(2,95%)=5.99, so 0.32 clearly can’t reject H0. Total 4 3 2 1 Number of deaths, k 200 22 65 109 Number of observations of k, fk
19
Another one Results of experiment, observations,…
Calculated χ2 is 20.6 How many degrees of freedom? n-1=5 At =.05, χ2 from tables is 11.07 Or use CHIINV(5,.05) Conclusions?
20
P-Value (measures significance level)
Get p-value of χ2 test value df =5 χ2 =20.06 CHIDIST(20.06,5)=.00094 p=.00094 This is an alternative way of presenting the results
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.