Presentation is loading. Please wait.

Presentation is loading. Please wait.

V pátek 9. 10. nebude přednáška. Cvičení v tomto týdnu bude.

Similar presentations


Presentation on theme: "V pátek 9. 10. nebude přednáška. Cvičení v tomto týdnu bude."— Presentation transcript:

1 V pátek 9. 10. nebude přednáška. Cvičení v tomto týdnu bude.

2 Last lecture summary Mode Distribution Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier

3 SDA girls – histogram of heights 2014 n = 48 or N = 48 bin size = 3.8

4 SDA girls – all previous years + actual n = 69 bin size = 3.8

5 MEASURES OF VARIABILITY

6 Navození atmosféry www.udacity.com – Introduction to statistics

7 QUESTION Mean1 Mean2 Mode1 Mode2 Median1 Median2 www.udacity.com – Statistics n = 1000

8 range (variační rozpětí) MAX - min www.udacity.com – Statistics n = 1000

9 Range Range changes when we add new data into dataset Always Sometimes Never www.udacity.com – Statistics n = 1000

10 Adding Mark Zuckerberg www.udacity.com – Statistics n = 1000

11 Cut off data IQR, mezikvartilové rozpětí www.udacity.com – Statistics n = 1000

12 Interquartile range, IQR Let’ take this quiz, answer yes or no. 1. About 50% of the data fall within the IQR. 2. The IQR is affected by every value in the data set. 3. The IQR is not affected by outliers. 4. The mean is always between Q1 and Q3. 0 1 1 1 2 2 2 2 2 3 3 3 90 Q2Q1=1 Q3=3 www.udacity.com – Statistics průměr = 8.62 n = 13

13 Define the outlier Sample (n=10) $38,946 $43,420 $49,160 $50,430 $50,557 $52,580 $53,595 $54,160 $60,181 $10,000,000 What values are outliers for this data set? 1.$60,000 2.$80,000 3.$100,000 4.$200,000 www.udacity.com – Statistics

14 Problem with IQR normal bimodal uniform www.udacity.com – Statistics

15 Options for measuring variability Find the average distance between all pairs of data values. Find the average distance between each data value and either the max or the min. Find the average distance between each data value and the mean. www.udacity.com – Statistics

16 Average distance from mean Sample 10 5 3 2 19 1 7 11 1 1

17 Average distance from mean Sample 104 5 3-3 2-4 1913 1-5 71 115 1-5 1 Find the average distance between each data value and the mean.

18 Preventing cancellation How can we prevent the negative and positive deviations from cancelling each out? 1. Ignore (i.e. delete) the negative sign. 2. Multiply each deviation by two. 3. Square each deviation. 4. Take absolute value of each deviation.

19 Average absolute deviation Sample 1044 51 3-33 2-44 1913 1-55 711 1155 1-55 1 5 avg. absolute deviation = 4.6

20 Average absolute deviation

21 Squared deviations Sample 10416 51 3-39 2-416 1913169 1-525 711 11525 1-525 1-525 avg. square deviation = 31.2

22 Variance Average square devation has a special name – variance (rozptyl). www.udacity.com – Statistics

23 Standard deviation

24 What is so great about the standard deviation? Why don’t we just find the average absolute deviation? More on absolute vs. standard deviation: http://www.leeds.ac.uk/educol/documents/00003759.htm 1.SD is used because of tradition 2.It is easier to work with power of two than with absolute value. 3.SD has very nice interpretation in Gaussian distribution.

25 Standard deviation – empirical rule

26

27

28 Empirical rule – well behaved distribution

29 Empirical rule – not-so-well behaved distribution

30 Statistical inference The goal of statistics: make rational conclusions or decisions based on the incomplete information we have in our data. This process is known as statistical inference. In inferential statistics we want to answer 1. Is some relationship in data due to chance? Or is it a real difference? 2. If the effect is real, can it be generalized to a larger group?

31 Statistical jargon Population – the group we are interested in making conclusions about. Census – a collection of data on the entire population. Sample – if we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions about that population.

32 Statistical jargon population (census) vs. sample parameter (population) vs. statistic (sample)

33

34 Statistical inference A statistic is a value calculated from our observed data (sample). A parameter is a value that describes the population. We want to be able to generalize what we observe in our data to our population. In order to this, the sample needs to be representative. How to select a representative sample? Use randomization.

35 Random sampling Simple Random Sampling (SRS) – each possible sample from the population is equally likely to be selected. Stratified Sampling – simple random sample from subgroups of the population subgroups: gender, age groups, … Cluster sampling – divide the population into non- overlapping groups (clusters), sample is a randomly chosen cluster example: population are all students in an area, randomly select schools and create a sample from students of the given school

36 Simple random sampling sampling with replacement (WR) výběr s navrácením Generates independent samples Two sample values are independent if that what we get on the first one doesn't affect what we get on the second. sampling without replacement (WOR) výběr bez navrácení Deliberately avoid choosing any member of the population more than once. This type of sampling is not independent, however it is more common. The error is small as long as 1. the sample is large 2. the sample size is no more than 10% of population size

37 Bias If a sample is not representative, it can introduce bias into our results. bias – zkreslení, odchylka A sample is biased if it differs from the population in a systematic way. The Literary Digest poll, 1936, U. S. presidential election surveyed 10 mil. people – subscribers 2.3 mil. responded predicting (3:2) a Republican candidate to win a Democrat candidate won What went wrong? only wealthy people were surveyed (selection bias) survey was voluntary response (nonresponse bias) – angry people or people who want a change


Download ppt "V pátek 9. 10. nebude přednáška. Cvičení v tomto týdnu bude."

Similar presentations


Ads by Google