Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the.

Similar presentations


Presentation on theme: "Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the."— Presentation transcript:

1 Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the normal distribution, the central limit theorem, the standard error. Krisztina Boda PhD Department of Medical Informatics, University of Szeged

2 Krisztina Boda INTERREG 2 Population, sample Population: the entire group of individuals that we want information about. Sample: a part of the population that we actually examine in order to get information A simple random sample of size n consists of n individuals chosen from the population in such a way that every set of n individuals has an equal chance to be in the sample actually selected.

3 Krisztina Boda INTERREG 3 Examples Sample data set  Questionnaire filled in by a group of pharmacy students  Blood pressure of 20 healthy women …… Population  Pharmacy students  Students  Blood pressure of women (whoever) ……

4 Krisztina Boda INTERREG 4 SamplePopulation Bar chart of relative frequencies of a categorical variable Distribution of that variable in the population (approximates)

5 Krisztina Boda INTERREG 5 SamplePopulation Histogram of relative frequencies of a continuous variable Distribution of that variable in the population (approximates)

6 Krisztina Boda 6 SamplePopulation Mean (x) Standard deviation (SD) Median Mean  (unknown) Standard deviation  (unknown) Median (unknown) (approximates)

7 Krisztina Boda INTERREG 7 The concept of probability Lets repeat an experiment n times under the same conditions. In a large number of n experiments the event A is observed to occur k times (0  k  n). k : frequency of the occurrence of the event A. k/n : relative frequency of the occurrence of the event A. 0  k/n  1 If n is large, k/n will approximate a given number. This number is called the probability of the occurrence of the event A and it is denoted by P(A). 0  P(A)  1

8 Krisztina Boda INTERREG 8 Example: the tossing of a (fair) coin k: number of „head”s n=10100 100010000100000 k=742 510 500549998 k/n=0.70.42 0.510.50050.49998 P(„head”)=0.5

9 Krisztina Boda INTERREG 9 Probability facts Any probability is a number between 0 and 1. All possible outcomes together must have probability 1. The probability of the complementary event of A is 1-P(A).

10 Krisztina Boda INTERREG 10 Rules of probability calculus Assumption: all elementary events are equally probable Examples: Rolling a dice. What is the probability that the dice shows 5?  If we let X represent the value of the outcome, then P(X=5)=1/6. What is the probability that the dice shows an odd number?  P(odd)=1/2. Here F=3, T=6, so F/T=3/6=1/2.

11 Krisztina Boda INTERREG 11 The density curve The density curve is an idealized description of the overall pattern of a distribution that smoothes out the irregularities in the actual data. The density curve  is on the above the horizontal axis, and  has area exactly 1 underneath it. The area under the curve and above any range of values is the proportion of all observations that fall in that range.

12 Krisztina Boda INTERREG 12 Special distribution of a continuous variable: The normal distribution The normal curve was developed mathematically in 1733 by DeMoivre as an approximation to the binomial distribution. His paper was not discovered until 1924 by Karl Pearson. Laplace used the normal curve in 1783 to describe the distribution of errors. Subsequently, Gauss used the normal curve to analyze astronomical data in 1809. The normal curve is often called the Gaussian distribution. The term bell-shaped curve is often used in everyday usage.

13 Krisztina Boda INTERREG 13 Special distribution of a continuous variable: The normal distribution N( ,  ) The density curves are symmetric, single-peaked and bell-shaped The 68-95-99.7 rule. In the normal distribution with mean  and standard deviation  :  68% of the observations fall within  of the mean   95% of the observations fall within 2  of the mean   99.7% of the observations fall within 3  of the mean 

14 Krisztina Boda INTERREG 14 Normal distributions N( ,  ) N(0,1) N(1,1) N(0,2) ,  : parameters (a parameter is a number that describes the distribution)

15 Krisztina Boda INTERREG 15 „Imagination” of the distribution given the sample mean and sample SD – supposing normal distribution In the papers generally sample means and SD-s are published. We use these value to know the original distribution For example, age in years: 55.2  15.7 70.9 39.5 68.26% of the data are supposed to be in this interval 95.44% of the data are supposed to be in the interval (23.8, 86.6)

16 Krisztina Boda INTERREG 16 Stent length per lesion (mm): 18.8  10.5 Using these values as parameters, the „normal” distribution would be like this: The original distribution of stent length was probably skewed, or if the distribution was normal, there were negative values in the data set.

17 Krisztina Boda INTERREG 17 Standard normal probabilities N(0,1) z(z): proportion of area to the left of z -40.00003 -30.0013 -2.580.0049 -2.330.0099 -20.0228 -1.960.0250 -1.650.0495 0.1587 00.5 10.8413 1.650.9505 1.960.975 20.9772 2.330.9901 2.580.9951 30.9987 40.99997 The area under the standard normal curve is tabulated

18 Krisztina Boda INTERREG 18 The use of the normal curve table z(z): proportion of area to the left of z -40.00003 -30.0013 -2.580.0049 -2.330.0099 -20.0228 -1.960.0250 -1.650.0495 0.1587 00.5 10.8413 1.650.9505 1.960.975 20.9772 2.330.9901 2.580.9951 30.9987 40.99997 -1.961.96 0.025 0.95 Find the area under the standard normal curve to the left of z=-1.96, that is, find the probablity that z<-1.96 P(z<-0.96)=0.025

19 Krisztina Boda INTERREG 19 Standardization If a random variable X has normal distribution N(,) distribution, than the variable has is standard normal distribution N(0,1).

20 Krisztina Boda INTERREG 20 The use of the normal curve table z(z): proportion of area to the left of z -40.00003 -30.0013 -2.580.0049 -2.330.0099 -20.0228 -1.960.0250 -1.650.0495 0.1587 00.5 10.8413 1.650.9505 1.960.975 20.9772 2.330.9901 2.580.9951 30.9987 40.99997 Example. It was found that the weights of a certain population of laboratory rats are normally distributed with  = 14 ounces and σ=2 ounces. In such a population, what percentage of rats would we expect to weigh below 10 ounces? Solution. 1. Standardize X 2. Find the area to the left of -2

21 Krisztina Boda INTERREG 21 Distribution of sample means h ttp://www.ruf.rice.edu/~lane/stat_sim/index.html

22 Krisztina Boda INTERREG 22 Distribution of sample means The original population is not normally distributed

23 Krisztina Boda INTERREG 23 The central limit theorem Let’s draw samples from a population with mean µ and standard deviation σ. Examine the distribuiton of the sample means. If the sample size n is large (say, at least 30), then the population of all possible sample means approximately has a normal distribution with mean µ and standard deviation σ no matter what probability describes the population sampled.

24 Krisztina Boda INTERREG 24 The standard error of mean (SE or SEM) is called the standard error of mean Meaning: the dispersion of the sample means around the (unknown) population mean. Estimation of SE from the sample:

25 Krisztina Boda INTERREG 25 Review questions and exercises Problems to be solved by hand- calculations ..\Handouts\Problems hand III.doc..\Handouts\Problems hand III.doc Solutions ..\Handouts\Problems hand III solutions.doc..\Handouts\Problems hand III solutions.doc Problems to be solved using computer - none

26 Krisztina Boda INTERREG 26 Useful WEB pages  http://www-stat.stanford.edu/~naras/jsm http://www-stat.stanford.edu/~naras/jsm  http://www.ruf.rice.edu/~lane/rvls.html http://www.ruf.rice.edu/~lane/rvls.html  http://my.execpc.com/~helberg/statistics.html http://my.execpc.com/~helberg/statistics.html  http://www.math.csusb.edu/faculty/stanton/m26 2/index.html http://www.math.csusb.edu/faculty/stanton/m26 2/index.html


Download ppt "Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the."

Similar presentations


Ads by Google