R INSTALLATION R is an open source software package for statistical data analysis
R Installation R download: Germany, Stefan Drees Bonn: This is the main program! RStudio (Auxiliary program for editing R files): Install the programs (R should be there already).
Working with the R command prompt (without RStudio) Practical session
Working with RStudio Start RStudio „File -> New -> R Script“ Lets you edit a R command or R script (= small programme = several consecutive commands)
Working with RStudio New R files The command prompt Select Files Plots Packages (for advanced analyses) Help
Working with RStudio Commands and programs can be stored in R files Execute one command line: Ctrl+Enter or Button „Run“ Execute several lines: Mark lines and use „Ctrl+Enter“ or „Run“ button
WHAT IS STATISTICS?
What is statistics? Statistics is a means to connect empirical knowledge and theory and is constituted as follows: Data representation (Empirics) Methods for description, analysis, and interpretation of data, in order to allow predictions, conclusions and decisions (Statistical Theory)
What is statistics? I. Descriptive Statistics II. Probability theory III. Test theory
DESCRIPTIVE STATISTICS
Descriptive Statistics
Descriptive statistics quantitative Observational objects discrete continuous Attributes / traits Values qualitative patients, blood samples, DNA samples, houses, atoms blood pressure, weight, age, blood group, number of siblings, marital status, rent Blood group, marital status Number of siblings blood pressure, weight, age, rent
Descriptive statistics Scaling: Nominal scale: attribute values that are not directly comparable (sex, subject of studies, country of origin) (qualitative) Ordinal scale: attribute values that have a „natural“ order (grades, font sizes: tiny-small-medium-large-huge) Interval scale: difference between attribute values is interpretable (temperature in °C) (quantitative) To be distinguished: Discrete attributes: Attribute values can be counted Continuous attributes: All real numbers, or at least all numbers from an interval, are possible
Descriptive statistics Frequencies: n Absolute frequency n i : i Number of obersvations with attribute value i (counts) Relative frequency h i : i Portion of elements with attribute value i Nn i / N To be computed as absolute frequency devided by total number of objects N: n i / N Relative frequencies lie between 0 and 1 Relative frequencies have to add up to 1 (<- can be used to check computation)
Descriptive statistics AB0 blood group 1.00N = Köln other A2B A1B B A2 A1 0 value n i absolute frequency n i KölnBonn h i relative frequency h i Bonn tally sheet
Descriptive statistics
height classification: complete disjoint (each value belongs to only one class) class limits: (160; 170] contains all values, that are > 160 but height [cm] ( ] ( ] (]( ](] (
Descriptive statistics height [cm] h i relative h i 1, frequency n i absolute n i N= Cumulative frequency H i relative H i N i absolute N i Class number i > 200 (190; 200] (180; 190] (170; 180] (160; 170] (150; 160] 150 Class limits (ai-1; ai] Tally sheet
DESCRIPTIVE STATISTICS Graphical representation
Descriptive statistics Pie chart (R function: pie() ) Shows absolute frequencies Example: blood groups
Descriptive statistics Bar chart (R function: barplot() ) Shows relative frequencies Example: blood groups
Descriptive statistics Representation of cumulative frequencies with empirical distribution function F Discrete trait: Number of Children > Number of children N = h i relative h i n i absolute n i Frequencies Cumulative frequencies N i absolute N i relative H i Tally sheet
Descriptive statistics H >4 i Number of children hihi hihi h i >4 Bar chart F F:Empirical distribution function Since the attribute is quantitative discrete, we obtain a step function
Descriptive statistics Histogramms (R function: hist() ) Construction: Data is subdevided into classes Surface area of columns is proportional to the respective frequencies Columns are neighbouring since classes are neighbouring
Descriptive statistics Example: Height [cm] 0 0,2 0,4 0,6 0, height [cm ] hihi Histogram
Descriptive statistics 0 0,2 0,4 0,6 0, height [cm] F 0 0,2 0,4 0,6 0, height [cm] f empirical density function f empirical distribution function F (for continuous trait)
Descriptive statistics 0 0,2 0,4 0,6 0, height [cm] f empirical density function f 0 0,2 0,4 0,6 0, height [cm] F empirical distribution function F hihi hihi
Descriptive Statistics Note: Slides 23 and 26 both show empirical distribution functions. In the first case, we obtain a step function since the trait under investigation is discrete.
DESCRIPTIVE STATISTICS Measures of central tendency, dispersion and spread
Descriptive statistics Measures of central tendency: A number to characterize the „center“ of the data Most important: Mean Median
Descriptive statistics x 8 =7 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sampleranks x (8) =19 x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =7 x (6) =8 x (7) =9 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sampleranks x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =8 x (6) =9 x (7) =19
Descriptive statistics
i / 15000
Descriptive statistics x outlier How to treat outliers? Yes!2) Check value and correct No!1) Discard
Descriptive statistics Measure the amount of variation of the data! x sample Asample B x The mean (or median) is not sufficent to describe a sample
Descriptive statistics Measures of dispersion and spread: Numbers to characterize the amount variation around the center (= mean) Most important: Minimum, maximum, range (dispersion) Empirical variance (spread) Empirical standard deviation (spread)
Descriptive statistics ranks n=7 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sample x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =8 x (6) =9 x (7) =19 minimum:min = x (1) maximum:max = x (n) range:R = x (n) – x (1) range:
Descriptive statistics
= =n =x4x4 =x3x3 =x2x2 =x1x1 Example: x 4 =53 is not free, but given by other values when the mean is known. s 2 has (n-1) degrees of freedom (f)
Data If you have data you want to analyse, please bring it along!