Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATISTICKÁ ANALÝZA DAT Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT.

Similar presentations


Presentation on theme: "STATISTICKÁ ANALÝZA DAT Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT."— Presentation transcript:

1 STATISTICKÁ ANALÝZA DAT Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT

2 Informace přednášky – základy, opáčko, cvičení – R http://ich.vscht.cz/~svozil/teaching.html Další literatura D. J. Rumsey, Statistics for Dummies, 2011 D. J. Rumsey, Intermediate Statistics for Dummies, 2007 zápočet, zkouška – bude oznámeno

3 Valuing houses SIZE [ft 2 ]COST [$] 1 400112 000 2 400192 000 1 800144 000 1 900152 000 1 300104 000 1 10088 000 How much money should you expect to pay for 1 300 ft 2 house? 104 000 $ Same question now with 1 800 ft 2 ? 144 000 $

4 Valuing houses SIZE [ft 2 ]COST [$] 1 400112 000 2 400192 000 1 800144 000 1 900152 000 1 300104 000 1 10088 000 How much money should you expect to pay for 2 100 ft 2 house? 168 000 $ 21 is just a half between 18 and 24. Same question now with 1 500 ft 2 ? 120 000 $

5 What a statistician does? Look at data Program computers Run statistics software Drink beer

6 Linear relationship SIZE [ft 2 ]COST [$] 1 40098 000 2 400168 000 1 800126 000 1 900133 000 1 40091 000 1 10077 000 Is there a fixed amount per square foot? No What if I change 1 400 to 1 300? What is the answer now? Yes

7 Scatter plots Please, take a pen and a paper and draw a scatter plot of these data. SIZE [ft 2 ]COST [$] 1 40098 000 2 400168 000 1 800126 000 1 900133 000 1 30091 000 1 10077 000 SIZE PRICE

8 Scatter plots SIZE [ft 2 ]COST [$] 1 70053 000 2 10065 000 1 90059 000 1 30041 000 1 60050 000 2 20068 000 Do we believe there is a fixed price per square foot? No

9 Scatter plots SIZE [ft 2 ]COST [$]$/ft 2 1 70053 00031.1765 2 10065 00030.9524 1 90059 00031.0526 1 30041 00031.5385 1 60050 00031.2500 2 20068 00030.9091 What do you think, is the data linear? Let’s make a scatter plot. Surprisingly, the data is linear, even if there is no fixed price per square foot! PRICE = ???? x SIZE + ???? PRICE = 30 x SIZE + 2 000

10 Scatter plots SIZE [ft 2 ]COST [$] 1 70053 000 2 10044 000 1 90059 000 1 30082 000 1 60050 000 2 20068 000 Draw scatterplot and tell me if these data are linear (i.e., do they lie in a line?). outliers

11 Bar chart SIZE [ft 2 ]COST [$] 1 30088 000 1 40072 000 1 60094 000 1 90086 000 2 100112 000 2 30098 000 Warm up. Are these data linear? No How much to pay for a 2 200 ft 2 house? Just simply interpolate. 105 000 Do you have trust in this number?

12 Bar chart Take your data and pull them together.

13 Bar chart SIZE [ft 2 ]COST [$] 1 30088 000 1 40072 000 1 60094 000 1 90086 000 2 100112 000 2 30098 000

14 Bar chart Much finer representation of the data Bar chart allows you to understand global trends Statistician uses cumulative tools (such as bar graph) to gain the understanding of the underlying data. Let me ask you Are bar charts cool?

15 Histograms Special case of bar chart. Bar chart looks at 2D data, histogram to 1D data. That is the main difference. 132 784 137 192 122 177 147 121 143 000 126 010 129 200 124 312 128 132

16 Age distribution Draw a histogram at the paper with the bins by 10 years (i.e. 0- 10, 11-20, …) 21 17 9 27 12 39 4 32 14 38 9 21 14 3 8 31 29 15 33 29

17 Věková pyramida věková pyramida (strom života) grafické znázornění věkové struktury obyvatelstva source: http://cs.wikipedia.org/wiki/V%C4%9Bkov%C3%A1_pyramidahttp://cs.wikipedia.org/wiki/V%C4%9Bkov%C3%A1_pyramida

18 Pie charts koláčový graf elections Party A – 50% Party B – 50% Party A – 724 000 votes Party B – 181 000 votes Party A – 175 000 Party B – 50 000 Party C – 25 000 Party D – 50 000

19 Male AppliedAdmittedRate [%] MAJOR A900450 MAJOR B10010

20 Male AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010

21 Male AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010

22 Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B900180

23 Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B900180

24 Female AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020

25 Gender bias What do you think, is there a gender bias? Who do you think is favored? Male or female? AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020

26 Gender bias Look at the data independent of major. AppliedAdmittedRate [%] MAJOR A90045050 MAJOR B10010 AppliedAdmittedRate [%] MAJOR A10080 MAJOR B90018020

27 Statistics is ambiguous This example ilustrates how ambiguous the statistics is. In choosing how to graph your data you may majorily impact what people believe to be the case. “I never believe in statistics I didn’t doctor myself.” Who said that? Winston Churchill


Download ppt "STATISTICKÁ ANALÝZA DAT Daniel Svozil Laboratoř informatiky a chemie, FCHT Vojtěch Spiwok, ÚBM, FPBT."

Similar presentations


Ads by Google