Download presentation
Presentation is loading. Please wait.
Published byCecily Quinn Modified over 8 years ago
1
STATISTICKÁ ANALÝZA DAT Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT
2
Informace přednášky – základy, opáčko, cvičení – R http://ich.vscht.cz/~svozil/teaching.html Další literatura D. J. Rumsey, Statistics for Dummies, 2011 D. J. Rumsey, Intermediate Statistics for Dummies, 2007 zápočet, zkouška – bude oznámeno
3
Valuing houses SIZE [ft 2 ]COST [$] 1 400112 000 2 400192 000 1 800144 000 1 900152 000 1 300104 000 1 10088 000 How much money should you expect to pay for 1 300 ft 2 house? 104 000 $ Same question now with 1 800 ft 2 ? 144 000 $
4
Valuing houses SIZE [ft 2 ]COST [$] 1 400112 000 2 400192 000 1 800144 000 1 900152 000 1 300104 000 1 10088 000 How much money should you expect to pay for 2 100 ft 2 house? 168 000 $ 21 is just a half between 18 and 24. Same question now with 1 500 ft 2 ? 120 000 $
5
What a statistician does? Look at data Program computers Run statistics software Drink beer
6
Linear relationship SIZE [ft 2 ]COST [$] 1 40098 000 2 400168 000 1 800126 000 1 900133 000 1 40091 000 1 10077 000 Is there a fixed amount per square foot? No What if I change 1 400 to 1 300? What is the answer now? Yes
7
Scatter plots Please, take a pen and a paper and draw a scatter plot of these data. SIZE [ft 2 ]COST [$] 1 40098 000 2 400168 000 1 800126 000 1 900133 000 1 30091 000 1 10077 000 SIZE PRICE
8
Scatter plots SIZE [ft 2 ]COST [$] 1 70053 000 2 10065 000 1 90059 000 1 30041 000 1 60050 000 2 20068 000 Do we believe there is a fixed price per square foot? No
9
Scatter plots SIZE [ft 2 ]COST [$]$/ft 2 1 70053 00031.1765 2 10065 00030.9524 1 90059 00031.0526 1 30041 00031.5385 1 60050 00031.2500 2 20068 00030.9091 What do you think, is the data linear? Let’s make a scatter plot. Surprisingly, the data is linear, even if there is no fixed price per square foot! PRICE = ???? x SIZE + ???? PRICE = 30 x SIZE + 2 000
10
Scatter plots SIZE [ft 2 ]COST [$] 1 70053 000 2 10044 000 1 90059 000 1 30082 000 1 60050 000 2 20068 000 Draw scatterplot and tell me if these data are linear (i.e., do they lie in a line?). outliers
11
Bar chart SIZE [ft 2 ]COST [$] 1 30088 000 1 40072 000 1 60094 000 1 90086 000 2 100112 000 2 30098 000 Warm up. Are these data linear? No How much to pay for a 2 200 ft 2 house? Just simply interpolate. 105 000 Do you have trust in this number?
12
Bar chart Take your data and pull them together.
13
Bar chart SIZE [ft 2 ]COST [$] 1 30088 000 1 40072 000 1 60094 000 1 90086 000 2 100112 000 2 30098 000
14
Bar chart Much finer representation of the data Bar chart allows you to understand global trends Statistician uses cumulative tools (such as bar graph) to gain the understanding of the underlying data. Let me ask you Are bar charts cool?
15
Histograms Special case of bar chart. Bar chart looks at 2D data, histogram to 1D data. That is the main difference. 132 784 137 192 122 177 147 121 143 000 126 010 129 200 124 312 128 132
16
Age distribution Draw a histogram at the paper with the bins by 10 years (i.e. 0- 10, 11-20, …) 21 17 9 27 12 39 4 32 14 38 9 21 14 3 8 31 29 15 33 29
17
Věková pyramida věková pyramida (strom života) grafické znázornění věkové struktury obyvatelstva source: http://cs.wikipedia.org/wiki/V%C4%9Bkov%C3%A1_pyramidahttp://cs.wikipedia.org/wiki/V%C4%9Bkov%C3%A1_pyramida
18
Pie charts koláčový graf elections Party A – 50% Party B – 50% Party A – 724 000 votes Party B – 181 000 votes Party A – 175 000 Party B – 50 000 Party C – 25 000 Party D – 50 000
19
Male AppliedAdmittedRate [%] MAJOR A900450 MAJOR B10010
20
Male AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010
21
Male AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010
22
Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B900180
23
Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B900180
24
Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020
25
Gender bias What do you think, is there a gender bias? Who do you think is favored? Male or female? AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020
26
Gender bias Look at the data independent of major. AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020
27
Statistics is ambiguous This example ilustrates how ambiguous the statistics is. In choosing how to graph your data you may majorily impact what people believe to be the case. “I never believe in statistics I didn’t doctor myself.” Who said that? Winston Churchill
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.