Download presentation
Presentation is loading. Please wait.
Published byLenard Clark Modified over 9 years ago
1
R INSTALLATION R is an open source software package for statistical data analysis
2
R Installation R download: http://www.r-project.org/http://www.r-project.org/ Germany, Stefan Drees Bonn: http://cran.r-mirror.de/http://cran.r-mirror.de/ This is the main program! RStudio (Auxiliary program for editing R files): http://www.rstudio.com/ide/download/ http://www.rstudio.com/ide/download/ Install the programs (R should be there already).
3
Working with the R command prompt (without RStudio) Practical session
4
Working with RStudio Start RStudio „File -> New -> R Script“ Lets you edit a R command or R script (= small programme = several consecutive commands)
5
Working with RStudio New R files The command prompt Select Files Plots Packages (for advanced analyses) Help
6
Working with RStudio Commands and programs can be stored in R files Execute one command line: Ctrl+Enter or Button „Run“ Execute several lines: Mark lines and use „Ctrl+Enter“ or „Run“ button
7
WHAT IS STATISTICS?
8
What is statistics? Statistics is a means to connect empirical knowledge and theory and is constituted as follows: Data representation (Empirics) Methods for description, analysis, and interpretation of data, in order to allow predictions, conclusions and decisions (Statistical Theory)
9
What is statistics? I. Descriptive Statistics II. Probability theory III. Test theory
10
DESCRIPTIVE STATISTICS
11
Descriptive Statistics
12
Descriptive statistics quantitative Observational objects discrete continuous Attributes / traits Values qualitative patients, blood samples, DNA samples, houses, atoms blood pressure, weight, age, blood group, number of siblings, marital status, rent Blood group, marital status Number of siblings blood pressure, weight, age, rent
13
Descriptive statistics Scaling: Nominal scale: attribute values that are not directly comparable (sex, subject of studies, country of origin) (qualitative) Ordinal scale: attribute values that have a „natural“ order (grades, font sizes: tiny-small-medium-large-huge) Interval scale: difference between attribute values is interpretable (temperature in °C) (quantitative) To be distinguished: Discrete attributes: Attribute values can be counted Continuous attributes: All real numbers, or at least all numbers from an interval, are possible
14
Descriptive statistics Frequencies: n Absolute frequency n i : i Number of obersvations with attribute value i (counts) Relative frequency h i : i Portion of elements with attribute value i Nn i / N To be computed as absolute frequency devided by total number of objects N: n i / N Relative frequencies lie between 0 and 1 Relative frequencies have to add up to 1 (<- can be used to check computation)
15
Descriptive statistics AB0 blood group 1.00N = 50 0 1 2 5 6 19 17 1.00 0.00 0.01 0.03 0.09 0.10 0.38 0.39 Köln other A2B A1B B A2 A1 0 value 7 6 5 4 3 2 1 n i absolute frequency n i 200 0 2 6 18 20 76 78 KölnBonn h i relative frequency h i Bonn 0.00 0.02 0.04 0.10 0.12 0.38 0.34 tally sheet
16
Descriptive statistics
17
height classification: complete disjoint (each value belongs to only one class) class limits: (160; 170] contains all values, that are > 160 but 170. 150160170180190200 150200 height [cm] ( ] ( ] (]( ](] (
18
Descriptive statistics height [cm] h i relative h i 1,00 0.00 0.05 0.25 0.35 0.30 0.05 0.00 frequency n i absolute n i N=100 0 5 25 35 30 5 0 Cumulative frequency H i relative H i 1.00 0.95 0.70 0.35 0.05 0.00 N i absolute N i 100 95 70 35 5 0 7 6 5 4 3 2 1 Class number i > 200 (190; 200] (180; 190] (170; 180] (160; 170] (150; 160] 150 Class limits (ai-1; ai] Tally sheet
19
DESCRIPTIVE STATISTICS Graphical representation
20
Descriptive statistics Pie chart (R function: pie() ) Shows absolute frequencies Example: blood groups
21
Descriptive statistics Bar chart (R function: barplot() ) Shows relative frequencies Example: blood groups
22
Descriptive statistics Representation of cumulative frequencies with empirical distribution function F Discrete trait: Number of Children 1.00 0.96 0.90 0.80 0.50 0.10 >4 4 3 2 1 0 Number of children 6 5 4 3 2 1 N = 50 2 3 5 15 20 5 h i relative h i n i absolute n i Frequencies 50 48 45 40 25 5 Cumulative frequencies N i absolute N i relative H i 0.04 0.06 0.10 0.30 0.40 0.10 1.00 Tally sheet
23
Descriptive statistics H 0.0 0.2 0.4 0.6 0.8 1.0 01234>4 i Number of children hihi hihi h i 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4>4 Bar chart F F:Empirical distribution function Since the attribute is quantitative discrete, we obtain a step function
24
Descriptive statistics Histogramms (R function: hist() ) Construction: Data is subdevided into classes Surface area of columns is proportional to the respective frequencies Columns are neighbouring since classes are neighbouring
25
Descriptive statistics Example: Height [cm] 0 0,2 0,4 0,6 0,8 1 150160170180190200 height [cm ] hihi Histogram
26
Descriptive statistics 0 0,2 0,4 0,6 0,8 150160170180190 200 height [cm] F 0 0,2 0,4 0,6 0,8 150160170180190 200 height [cm] f empirical density function f empirical distribution function F (for continuous trait)
27
Descriptive statistics 0 0,2 0,4 0,6 0,8 150160170180190 200 height [cm] f empirical density function f 0 0,2 0,4 0,6 0,8 150160170180190 200 height [cm] F empirical distribution function F hihi hihi
28
Descriptive Statistics Note: Slides 23 and 26 both show empirical distribution functions. In the first case, we obtain a step function since the trait under investigation is discrete.
29
DESCRIPTIVE STATISTICS Measures of central tendency, dispersion and spread
30
Descriptive statistics Measures of central tendency: A number to characterize the „center“ of the data Most important: Mean Median
31
Descriptive statistics x 8 =7 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sampleranks x (8) =19 x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =7 x (6) =8 x (7) =9 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sampleranks x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =8 x (6) =9 x (7) =19
32
Descriptive statistics
33
i 12000 1500 25000150002000 34000 2500 41500 4000 52500 5000 / 15000
34
Descriptive statistics x outlier How to treat outliers? Yes!2) Check value and correct No!1) Discard
35
Descriptive statistics Measure the amount of variation of the data! x sample Asample B x The mean (or median) is not sufficent to describe a sample
36
Descriptive statistics Measures of dispersion and spread: Numbers to characterize the amount variation around the center (= mean) Most important: Minimum, maximum, range (dispersion) Empirical variance (spread) Empirical standard deviation (spread)
37
Descriptive statistics ranks n=7 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sample x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =8 x (6) =9 x (7) =19 minimum:min = x (1) maximum:max = x (n) range:R = x (n) – x (1) range:
38
Descriptive statistics
39
= =n =x4x4 =x3x3 =x2x2 =x1x1 Example: 100 4 270 2 75 x 4 =53 is not free, but given by other values when the mean is known. s 2 has (n-1) degrees of freedom (f)
40
Data If you have data you want to analyse, please bring it along!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.