Presentation is loading. Please wait.

Presentation is loading. Please wait.

MEASURES OF CENTRALITY. Last lecture summary Which graphs did we meet? scatter plot (bodový graf) bar chart (sloupcový graf) histogram pie chart (koláčový.

Similar presentations


Presentation on theme: "MEASURES OF CENTRALITY. Last lecture summary Which graphs did we meet? scatter plot (bodový graf) bar chart (sloupcový graf) histogram pie chart (koláčový."— Presentation transcript:

1 MEASURES OF CENTRALITY

2 Last lecture summary Which graphs did we meet? scatter plot (bodový graf) bar chart (sloupcový graf) histogram pie chart (koláčový graf) How do they work, what are their advantages and/or disadvantages?

3 SDA women – histogram of heights 2014 n = 48 or N = 48 bin size = 3.8

4 Distributions negatively skewed skewed to the left positively skewed skewed to the left http://turnthewheel.org/free-textbooks/street-smart-stats/ e.g., life expectancye.g., body heighte.g., income

5 STATISTICS IS BEATIFUL new stuff

6 Life expectancy data Watch TED talk by Hans Rosling, Gapminder Foundation: http://www.ted.com/talks/hans_rosling_shows_the_best_s tats_you_ve_ever_seen.html http://www.ted.com/talks/hans_rosling_shows_the_best_s tats_you_ve_ever_seen.html

7 STATISTICS IS DEEP

8 UC Berkeley Though data are fake, the paradox is the same Simpson’s paradox www.udacity.com – Introduction to statistics

9 Male AppliedAdmittedRate [%] MAJOR A900450 MAJOR B10010 www.udacity.com – Introduction to statistics

10 Male AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 www.udacity.com – Introduction to statistics

11 Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B900180 www.udacity.com – Introduction to statistics

12 Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020 www.udacity.com – Introduction to statistics

13 Gender bias What do you think, is there a gender bias? Who do you think is favored? Male or female? AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020 www.udacity.com – Introduction to statistics

14 Gender bias AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 Both100046046 AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020 Both100026026 male female www.udacity.com – Introduction to statistics

15 Gender bias Rate [%] MAJOR A50 MAJOR B10 Both46 Rate [%] MAJOR A80 MAJOR B20 Both26 male female www.udacity.com – Introduction to statistics

16 Statistics is ambiguous This example ilustrates how ambiguous the statistics is. In choosing how to graph your data you may majorily impact what people believe to be the case. “I never believe in statistics I didn’t doctor myself.” “Nikdy nevěřím statistice, kterou si sám nezfalšuji.” Who said that? Winston Churchill www.udacity.com – Introduction to statistics

17 What is statistics? Statistics – the science of collecting, organizing, summarizing, analyzing and interpreting data Goal – use imperfect information (our data) to infer facts, make predictions, and make decisions Descriptive statistic – describing and summarising data with numbers or pictures Inferential statistics – making conclusions or decisions based on data

18 Variables variable – a value or characteristics that can vary from individual to individual example: favorite color, age How variables are classified? quantitative variable – numerical values, often with units of measurement, arise from the how much/how many question, example: age, annual income, number children continuous (spojitá proměnná), example: height, weight discrete (diskrétní proměnná), example: number of children continuous variables can be discretized

19 Variables categorical (qualitative) variables categories that have no particular order example: favorite color, gender, nationality ordinal they are not numerical but their values have a natural order example: tempterature low/medium/high

20 variable (proměnná) quantitative (kvantitativní) categorical (kategorická) continuous (spojitá) discrete (diskrétní) ordinal (ordinální) Variables

21 Choosing a profession ChemistryGeography 50 000 – 60 00040 000 – 55 000 www.udacity.com – Statistics

22 Choosing a profession We made an interval estimate. But ideally we want one number that describes the entire dataset. This allows us to quickly summarize all our data. www.udacity.com – Statistics

23 Choosing a profession 1. The value at which frequency is highest. 2. The value where frequency is lowest. 3. Value in the middle. 4. Biggest value of x-axis. 5. Mean ChemistryGeography www.udacity.com – Statistics

24 Three big M’s The value at which frequency is highest is called the mode. i.e. the most common value is the mode. The value in the middle of the distribution is called the median. The mean is the mean (average is the synonymum). ChemistryGeography www.udacity.com – Statistics

25 Quick quiz What is the mode in our data? 2 5 6 5 2 6 9 8 5 2 3 5 www.udacity.com – Statistics

26 Mode in negatively skewed distribution www.udacity.com – Statistics

27 Mode in uniform distribution www.udacity.com – Statistics

28 Multimodal distribution www.udacity.com – Statistics

29 Mode in categorical data www.udacity.com – Statistics

30 More of mode True or False? 1. The mode can be used to describe any type of data we have, whether it’s numerical or categorical. 2. All scores in the dataset affect the mode. 3. If we take a lot of samples from the same population, the mode will be the same in each sample. 4. There is an equation for the mode. Ad 3. http://onlinestatbook.com/stat_sim/sampling_dist/ http://www.shodor.org/interactivate/activities/Histogram/ - mode changes as you change a bin size. http://www.shodor.org/interactivate/activities/Histogram/ Because 3. is not true, we can’t use mode to learn something about our population. Mode depends on how you present the data. www.udacity.com – Statistics

31 Life expectancy data www.coursera.org – Statistics: Making Sense of Data

32 Minimum Sierra Leone minimum = 47.8 www.coursera.org – Statistics: Making Sense of Data

33 Maximum Japan maximum = 84.3 www.coursera.org – Statistics: Making Sense of Data

34 Life expectancy data all countries www.coursera.org – Statistics: Making Sense of Data

35 Life expectancy data 1 197 Egypt 99 73.2 half larger half smaller www.coursera.org – Statistics: Making Sense of Data

36 Life expectancy data Minimum = 47.8 Maximum = 83.4 Median = 73.2 www.coursera.org – Statistics: Making Sense of Data

37 Q1 1 197 Sao Tomé & Príncipe 50 (¼ way) 1 st quartile = 64.7 www.coursera.org – Statistics: Making Sense of Data

38 Q1 ¾ larger¼ smaller 1 st quartile = 64.7 www.coursera.org – Statistics: Making Sense of Data

39 Q3 1 197 Netherland Antilles 148 (¾ way) 3 rd quartile = 76.7 www.coursera.org – Statistics: Making Sense of Data

40 Q3 3 rd quartile = 76.7 ¾ smaller¼ larger www.coursera.org – Statistics: Making Sense of Data

41 Life expectancy data Minimum = 47.8 Maximum = 83.4 Median = 73.2 1 st quartile = 64.7 3 rd quartile = 76.7 www.coursera.org – Statistics: Making Sense of Data

42 Box Plot www.coursera.org – Statistics: Making Sense of Data

43 Box plot 1 st quartile 3 rd quartile median minimum maximum

44 Modified box plot IQR interquartile range 1.5 x IQR outliers

45 Quartiles, median – how to do it? 79, 68, 88, 69, 90, 74, 87, 93, 76 Find min, max, median, Q1, Q3 in these data. Then, draw the box plot. www.coursera.org – Statistics: Making Sense of Data

46

47 Another example Min. 1st Qu. Median 3rd Qu. Max. 68.00 75.00 81.00 88.50 93.00 78, 93, 68, 84, 90, 74

48 Percentiles věk [roky] http://www.rustovyhormon.cz/on-line-rustove-grafy

49 3 rd M – Mean

50 Salary of 25 players of the American football (NY red Bulls) in 2012. 33 750 44 000 45 566 65 000 95 000 103 500 112 495 138 188 141 666 181 500 185 000 190 000 194 375 195 000 205 000 292 500 301 999 4 600 000 5 600 000 median = 112 495 mean = 518 311 Mean is not a robust statistic. Median is a robust statistic. Robust statistic

51 10% trimmed mean … eliminate upper and lower 10% of data Trimmed mean is more robust. Trimmed mean 33 750 44 000 45 566 65 000 95 000 103 500 112 495 138 188 141 666 181 500 185 000 190 000 194 375 195 000 205 000 292 500 301 999 4 600 000 5 600 000 median = 112 495 mean = 518 311 10% trimmed mean = 128 109


Download ppt "MEASURES OF CENTRALITY. Last lecture summary Which graphs did we meet? scatter plot (bodový graf) bar chart (sloupcový graf) histogram pie chart (koláčový."

Similar presentations


Ads by Google