R INSTALLATION R is an open source software package for statistical data analysis.

Slides:



Advertisements
Similar presentations
**ESTABLISHING PATTERNS OR TRENDS IN THE DATA COLLECTED** BY DR. ARTEMIO P. SEATRIZ MMSU-CTE LAOAG CITY.
Advertisements

Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
Random Sampling and Data Description
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Frequency Distribution and Variation Prepared by E.G. Gascon.
Statistical Tests Karen H. Hagglund, M.S.
Statistics.
QUANTITATIVE DATA ANALYSIS
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection Getting Ready for Data Collection The Data Collection Process The Data Collection.
QM Spring 2002 Statistics for Decision Making Descriptive Statistics.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Social Research Methods
Introduction to Statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Statistical Techniques in Hospital Management QUA 537
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Statistics 300: Introduction to Probability and Statistics Section 2-2.
Statistical Analysis I have all this data. Now what does it mean?
TOPIC 1 STATISTICAL ANALYSIS
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Data Presentation.
Biostatistics ZMP 602 E_Mail:
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.
Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc Department of Surgery Department of Clinical Epidemiology and Biostatistics March 18, 2009.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Statistical Analysis I have all this data. Now what does it mean?
Dr. Asawer A. Alwasiti.  Chapter one: Introduction  Chapter two: Frequency Distribution  Chapter Three: Measures of Central Tendency  Chapter Four:
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Areej Jouhar & Hafsa El-Zain Biostatistics BIOS 101 Foundation year.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Biostatistics, statistical software I. Basic statistical concepts Krisztina Boda PhD Department of Medical Informatics, University of Szeged.
Math 3680 Lecture #1 Graphical Representation of Data.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Type of data FETP India Describing. Competency to be gained from this lecture Identify the different types of data to use appropriate methods to describe.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
The field of statistics deals with the collection,
Statistical Analysis I Mosuk Chow, PhD Senior Scientist and Professor Department of Statistics December 8, 2015 CTSI BERD Research Methods Seminar Series.
Descriptive Statistics – Graphic Guidelines
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
24 Nov 2007Data Management1 Data Summarization and Exploratory Data Analysis Objective: Describe or Examine Data Sets in Term of Key Characteristics.
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”
Introduction to Biostatistics Lecture 1. Biostatistics Definition: – The application of statistics to biological sciences Is the science which deals with.
GROUPED DATA LECTURE 5 OF 6 8.DATA DESCRIPTIVE SUBTOPIC
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
SESSION 1 & 2 Last Update 15 th February 2011 Introduction to Statistics.
Descriptive Statistics
Statistical Methods Michael J. Watts
Measurements Statistics
Chapter 6 Introductory Statistics and Data
Different Types of Data
Statistical Methods Michael J. Watts
مقدمة في الإحصاء الحيوي مع تطبيقات برنامج الحزم الإحصائية SPSS
8.DATA DESCRIPTIVE.
Module 6: Descriptive Statistics
Chapter 5 STATISTICS (PART 1).
Basic Statistics Overview
PROBABILITY AND STATISTICS
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Sexual Activity and the Lifespan of Male Fruitflies
Descriptive Statistics
Chapter 6 Introductory Statistics and Data
Biostatistics Lecture (2).
Introductory Statistics
Presentation transcript:

R INSTALLATION R is an open source software package for statistical data analysis

R Installation R download: Germany, Stefan Drees Bonn: This is the main program! RStudio (Auxiliary program for editing R files): Install the programs (R should be there already).

Working with the R command prompt (without RStudio) Practical session

Working with RStudio Start RStudio „File -> New -> R Script“ Lets you edit a R command or R script (= small programme = several consecutive commands)

Working with RStudio New R files The command prompt Select Files Plots Packages (for advanced analyses) Help

Working with RStudio Commands and programs can be stored in R files Execute one command line: Ctrl+Enter or Button „Run“ Execute several lines: Mark lines and use „Ctrl+Enter“ or „Run“ button

WHAT IS STATISTICS?

What is statistics? Statistics is a means to connect empirical knowledge and theory and is constituted as follows: Data representation (Empirics) Methods for description, analysis, and interpretation of data, in order to allow predictions, conclusions and decisions (Statistical Theory)

What is statistics? I. Descriptive Statistics II. Probability theory III. Test theory

DESCRIPTIVE STATISTICS

Descriptive Statistics

Descriptive statistics quantitative Observational objects discrete continuous Attributes / traits Values qualitative patients, blood samples, DNA samples, houses, atoms blood pressure, weight, age, blood group, number of siblings, marital status, rent Blood group, marital status Number of siblings blood pressure, weight, age, rent

Descriptive statistics Scaling: Nominal scale: attribute values that are not directly comparable (sex, subject of studies, country of origin) (qualitative) Ordinal scale: attribute values that have a „natural“ order (grades, font sizes: tiny-small-medium-large-huge) Interval scale: difference between attribute values is interpretable (temperature in °C) (quantitative) To be distinguished: Discrete attributes: Attribute values can be counted Continuous attributes: All real numbers, or at least all numbers from an interval, are possible

Descriptive statistics Frequencies: n Absolute frequency n i : i Number of obersvations with attribute value i (counts) Relative frequency h i : i Portion of elements with attribute value i Nn i / N To be computed as absolute frequency devided by total number of objects N: n i / N Relative frequencies lie between 0 and 1 Relative frequencies have to add up to 1 (<- can be used to check computation)

Descriptive statistics AB0 blood group 1.00N = Köln other A2B A1B B A2 A1 0 value n i absolute frequency n i KölnBonn h i relative frequency h i Bonn tally sheet

Descriptive statistics

height classification: complete disjoint (each value belongs to only one class) class limits: (160; 170] contains all values, that are > 160 but  height [cm] ( ] ( ] (]( ](] (

Descriptive statistics height [cm] h i relative h i 1, frequency n i absolute n i N= Cumulative frequency H i relative H i N i absolute N i Class number i > 200 (190; 200] (180; 190] (170; 180] (160; 170] (150; 160]  150 Class limits (ai-1; ai] Tally sheet

DESCRIPTIVE STATISTICS Graphical representation

Descriptive statistics Pie chart (R function: pie() ) Shows absolute frequencies Example: blood groups

Descriptive statistics Bar chart (R function: barplot() ) Shows relative frequencies Example: blood groups

Descriptive statistics Representation of cumulative frequencies with empirical distribution function F Discrete trait: Number of Children > Number of children N = h i relative h i n i absolute n i Frequencies Cumulative frequencies N i absolute N i relative H i Tally sheet

Descriptive statistics H >4 i Number of children hihi hihi h i >4 Bar chart F F:Empirical distribution function Since the attribute is quantitative discrete, we obtain a step function

Descriptive statistics Histogramms (R function: hist() ) Construction: Data is subdevided into classes Surface area of columns is proportional to the respective frequencies Columns are neighbouring since classes are neighbouring

Descriptive statistics Example: Height [cm] 0 0,2 0,4 0,6 0, height [cm ] hihi Histogram

Descriptive statistics 0 0,2 0,4 0,6 0, height [cm] F 0 0,2 0,4 0,6 0, height [cm] f empirical density function f empirical distribution function F (for continuous trait)

Descriptive statistics 0 0,2 0,4 0,6 0, height [cm] f empirical density function f 0 0,2 0,4 0,6 0, height [cm] F empirical distribution function F hihi hihi

Descriptive Statistics Note: Slides 23 and 26 both show empirical distribution functions. In the first case, we obtain a step function since the trait under investigation is discrete.

DESCRIPTIVE STATISTICS Measures of central tendency, dispersion and spread

Descriptive statistics Measures of central tendency: A number to characterize the „center“ of the data Most important: Mean Median

Descriptive statistics x 8 =7 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sampleranks x (8) =19 x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =7 x (6) =8 x (7) =9 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sampleranks x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =8 x (6) =9 x (7) =19

Descriptive statistics

i / 15000

Descriptive statistics x outlier How to treat outliers? Yes!2) Check value and correct No!1) Discard

Descriptive statistics Measure the amount of variation of the data! x sample Asample B x  The mean (or median) is not sufficent to describe a sample

Descriptive statistics Measures of dispersion and spread: Numbers to characterize the amount variation around the center (= mean) Most important: Minimum, maximum, range (dispersion) Empirical variance (spread) Empirical standard deviation (spread)

Descriptive statistics ranks n=7 x 7 =6 x 6 =4 x 5 =19 x 4 =8 x 3 =3 x 2 =9 x 1 =5 sample x (1) =3 x (2) =4 x (3) =5 x (4) =6 x (5) =8 x (6) =9 x (7) =19 minimum:min = x (1) maximum:max = x (n) range:R = x (n) – x (1) range:

Descriptive statistics

= =n =x4x4 =x3x3 =x2x2 =x1x1 Example: x 4 =53 is not free, but given by other values when the mean is known. s 2 has (n-1) degrees of freedom (f)

Data If you have data you want to analyse, please bring it along!