Download presentation
Presentation is loading. Please wait.
Published byDenis Dean Modified over 9 years ago
1
MEASURES OF CENTRALITY
2
Last lecture summary Which graphs did we meet? scatter plot (bodový graf) bar chart (sloupcový graf) histogram pie chart (koláčový graf) How do they work, what are their advantages and/or disadvantages?
3
SDA women – histogram of heights 2014 n = 48 or N = 48 bin size = 3.8
4
Distributions negatively skewed skewed to the left positively skewed skewed to the left http://turnthewheel.org/free-textbooks/street-smart-stats/ e.g., life expectancye.g., body heighte.g., income
5
STATISTICS IS BEATIFUL new stuff
6
Life expectancy data Watch TED talk by Hans Rosling, Gapminder Foundation: http://www.ted.com/talks/hans_rosling_shows_the_best_s tats_you_ve_ever_seen.html http://www.ted.com/talks/hans_rosling_shows_the_best_s tats_you_ve_ever_seen.html
7
STATISTICS IS DEEP
8
UC Berkeley Though data are fake, the paradox is the same Simpson’s paradox www.udacity.com – Introduction to statistics
9
Male AppliedAdmittedRate [%] MAJOR A900450 MAJOR B10010 www.udacity.com – Introduction to statistics
10
Male AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 www.udacity.com – Introduction to statistics
11
Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B900180 www.udacity.com – Introduction to statistics
12
Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020 www.udacity.com – Introduction to statistics
13
Gender bias What do you think, is there a gender bias? Who do you think is favored? Male or female? AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020 www.udacity.com – Introduction to statistics
14
Gender bias AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 Both100046046 AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020 Both100026026 male female www.udacity.com – Introduction to statistics
15
Gender bias Rate [%] MAJOR A50 MAJOR B10 Both46 Rate [%] MAJOR A80 MAJOR B20 Both26 male female www.udacity.com – Introduction to statistics
16
Statistics is ambiguous This example ilustrates how ambiguous the statistics is. In choosing how to graph your data you may majorily impact what people believe to be the case. “I never believe in statistics I didn’t doctor myself.” “Nikdy nevěřím statistice, kterou si sám nezfalšuji.” Who said that? Winston Churchill www.udacity.com – Introduction to statistics
17
What is statistics? Statistics – the science of collecting, organizing, summarizing, analyzing and interpreting data Goal – use imperfect information (our data) to infer facts, make predictions, and make decisions Descriptive statistic – describing and summarising data with numbers or pictures Inferential statistics – making conclusions or decisions based on data
18
Variables variable – a value or characteristics that can vary from individual to individual example: favorite color, age How variables are classified? quantitative variable – numerical values, often with units of measurement, arise from the how much/how many question, example: age, annual income, number children continuous (spojitá proměnná), example: height, weight discrete (diskrétní proměnná), example: number of children continuous variables can be discretized
19
Variables categorical (qualitative) variables categories that have no particular order example: favorite color, gender, nationality ordinal they are not numerical but their values have a natural order example: tempterature low/medium/high
20
variable (proměnná) quantitative (kvantitativní) categorical (kategorická) continuous (spojitá) discrete (diskrétní) ordinal (ordinální) Variables
21
Choosing a profession ChemistryGeography 50 000 – 60 00040 000 – 55 000 www.udacity.com – Statistics
22
Choosing a profession We made an interval estimate. But ideally we want one number that describes the entire dataset. This allows us to quickly summarize all our data. www.udacity.com – Statistics
23
Choosing a profession 1. The value at which frequency is highest. 2. The value where frequency is lowest. 3. Value in the middle. 4. Biggest value of x-axis. 5. Mean ChemistryGeography www.udacity.com – Statistics
24
Three big M’s The value at which frequency is highest is called the mode. i.e. the most common value is the mode. The value in the middle of the distribution is called the median. The mean is the mean (average is the synonymum). ChemistryGeography www.udacity.com – Statistics
25
Quick quiz What is the mode in our data? 2 5 6 5 2 6 9 8 5 2 3 5 www.udacity.com – Statistics
26
Mode in negatively skewed distribution www.udacity.com – Statistics
27
Mode in uniform distribution www.udacity.com – Statistics
28
Multimodal distribution www.udacity.com – Statistics
29
Mode in categorical data www.udacity.com – Statistics
30
More of mode True or False? 1. The mode can be used to describe any type of data we have, whether it’s numerical or categorical. 2. All scores in the dataset affect the mode. 3. If we take a lot of samples from the same population, the mode will be the same in each sample. 4. There is an equation for the mode. Ad 3. http://onlinestatbook.com/stat_sim/sampling_dist/ http://www.shodor.org/interactivate/activities/Histogram/ - mode changes as you change a bin size. http://www.shodor.org/interactivate/activities/Histogram/ Because 3. is not true, we can’t use mode to learn something about our population. Mode depends on how you present the data. www.udacity.com – Statistics
31
Life expectancy data www.coursera.org – Statistics: Making Sense of Data
32
Minimum Sierra Leone minimum = 47.8 www.coursera.org – Statistics: Making Sense of Data
33
Maximum Japan maximum = 84.3 www.coursera.org – Statistics: Making Sense of Data
34
Life expectancy data all countries www.coursera.org – Statistics: Making Sense of Data
35
Life expectancy data 1 197 Egypt 99 73.2 half larger half smaller www.coursera.org – Statistics: Making Sense of Data
36
Life expectancy data Minimum = 47.8 Maximum = 83.4 Median = 73.2 www.coursera.org – Statistics: Making Sense of Data
37
Q1 1 197 Sao Tomé & Príncipe 50 (¼ way) 1 st quartile = 64.7 www.coursera.org – Statistics: Making Sense of Data
38
Q1 ¾ larger¼ smaller 1 st quartile = 64.7 www.coursera.org – Statistics: Making Sense of Data
39
Q3 1 197 Netherland Antilles 148 (¾ way) 3 rd quartile = 76.7 www.coursera.org – Statistics: Making Sense of Data
40
Q3 3 rd quartile = 76.7 ¾ smaller¼ larger www.coursera.org – Statistics: Making Sense of Data
41
Life expectancy data Minimum = 47.8 Maximum = 83.4 Median = 73.2 1 st quartile = 64.7 3 rd quartile = 76.7 www.coursera.org – Statistics: Making Sense of Data
42
Box Plot www.coursera.org – Statistics: Making Sense of Data
43
Box plot 1 st quartile 3 rd quartile median minimum maximum
44
Modified box plot IQR interquartile range 1.5 x IQR outliers
45
Quartiles, median – how to do it? 79, 68, 88, 69, 90, 74, 87, 93, 76 Find min, max, median, Q1, Q3 in these data. Then, draw the box plot. www.coursera.org – Statistics: Making Sense of Data
47
Another example Min. 1st Qu. Median 3rd Qu. Max. 68.00 75.00 81.00 88.50 93.00 78, 93, 68, 84, 90, 74
48
Percentiles věk [roky] http://www.rustovyhormon.cz/on-line-rustove-grafy
49
3 rd M – Mean
50
Salary of 25 players of the American football (NY red Bulls) in 2012. 33 750 44 000 45 566 65 000 95 000 103 500 112 495 138 188 141 666 181 500 185 000 190 000 194 375 195 000 205 000 292 500 301 999 4 600 000 5 600 000 median = 112 495 mean = 518 311 Mean is not a robust statistic. Median is a robust statistic. Robust statistic
51
10% trimmed mean … eliminate upper and lower 10% of data Trimmed mean is more robust. Trimmed mean 33 750 44 000 45 566 65 000 95 000 103 500 112 495 138 188 141 666 181 500 185 000 190 000 194 375 195 000 205 000 292 500 301 999 4 600 000 5 600 000 median = 112 495 mean = 518 311 10% trimmed mean = 128 109
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.