Sihua Peng, PhD Shanghai Ocean University

Slides:



Advertisements
Similar presentations
Introduction to MATLAB for Biomedical Engineering BME 1008 Introduction to Biomedical Engineering FIU, Spring 2015 Lesson 2: Element-wise vs. matrix operations.
Advertisements

Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!
Chapter 5 Some Important Discrete Probability Distributions
Chapter 5 Discrete Random Variables and Probability Distributions
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Engineering experiments involve the measuring of the dependent variable as the independent one has been altered, so as to determine the relationship between.
Review of Basic Probability and Statistics
Chapter 4 Discrete Random Variables and Probability Distributions
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Introduction to MATLAB Northeastern University: College of Computer and Information Science Co-op Preparation University (CPU) 10/22/2003.
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Introduction to R A. Di Bucchianico. Introduction to R2 Types of statistical software command-line software –requires knowledge of syntax of commands.
Chapter 11 Multiple Regression.
2. Random variables  Introduction  Distribution of a random variable  Distribution function properties  Discrete random variables  Point mass  Discrete.
Continuous Random Variables and Probability Distributions
EGR 105 Foundations of Engineering I Session 3 Excel – Basics through Graphing Fall 2008.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Separate multivariate observations
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Moment Generating Functions 1/33. Contents Review of Continuous Distribution Functions 2/33.
Chapter 14: Nonparametric Statistics
Overall agenda Part 1 and 2  Part 1: Basic statistical concepts and descriptive statistics summarizing and visualising data describing data -measures.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Data Objects in R Vector1 dimensionAll elements have the same data types Data types: numeric, character logic, factor Matrix2 dimensions Array2 or more.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
Discrete Probability Distributions. Random Variable Random variable is a variable whose value is subject to variations due to chance. A random variable.
COMP 170 L2 L17: Random Variables and Expectation Page 1.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Beginning Statistics Table of Contents HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
Statistics and Probability Theory Lecture 01 Fasih ur Rehman.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Business Statistics,
Chap 5-1 Chapter 5 Discrete Random Variables and Probability Distributions Statistics for Business and Economics 6 th Edition.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Introduction to R user-friendly and absolutely free
Sihua Peng, PhD Shanghai Ocean University
BINARY LOGISTIC REGRESSION
Arrays Chapter 7.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Sihua Peng, PhD Shanghai Ocean University
Sihua Peng, PhD Shanghai Ocean University
Review 1. Describing variables.
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Sihua Peng, PhD Shanghai Ocean University
Chapter Six Normal Curves and Sampling Probability Distributions
DATA MANAGEMENT MODULE: Getting Data Into and Out of R
Chapter 5 Hypothesis Testing
Simple Linear Regression - Introduction
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Lecture Slides Elementary Statistics Twelfth Edition
Sihua Peng, PhD Shanghai Ocean University
Arrays Chapter 7.
Basics of R, Ch Functions Help Managing your Objects
CSCI N317 Computation for Scientific Applications Unit R
Introduction to MATLAB
Sihua Peng, PhD Shanghai Ocean University
Bernoulli Trials Two Possible Outcomes Trials are independent.
Producing good data through sampling and experimentation
Introductory Statistics
Presentation transcript:

Sihua Peng, PhD Shanghai Ocean University 2018.10 Modern Biostatistics 2. Data sets Sihua Peng, PhD Shanghai Ocean University 2018.10

Contents Introduction to R Data sets Introductory Statistical Principles Sampling and experimental design with R Graphical data presentation Simple hypothesis testing Introduction to Linear models Correlation and simple linear regression Single factor classification (ANOVA) Nested ANOVA Factorial ANOVA Simple Frequency Analysis

R Function Each function performs a specific function, followed by brackets, for example: mean(): average value sum(): Summation plot(): Plotting sort(): Sorting log(); log2; log10(): log10; exp(); sin(); cos();sd()

Data frames: An example

Data frames: An example Firstly, generate the three variables (excluding the site labels as they are not variables) separately: > HABITAT <- factor(c("Mixed", "Gipps.Manna", "Gipps.Manna", "Gipps.Manna", "Mixed", "Mixed", "Mixed", "Mixed")) > GST <- c(3.4, 3.4, 8.4, 3, 5.6, 8.1, 8.3, 4.6) > EYR <- c(0, 9.2, 3.8, 5, 5.6, 4.1, 7.1, 5.3)

Data frames: An example Next, use the names of the vectors as arguments in the data.frame() function to amalgamate the three separate variables into a single data frame (data set) which we will call MACNALLY. > MACNALLY <- data.frame(HABITAT, GST, EYR)

Data frames: An example Notice that each vector (variable) becomes a column in the data frame and that each row represents a single sampling unit. By default, the rows are named using numbers corresponding to the number of rows in the data frame. However, these can be altered to reflect the names of the sampling units by assigning a list of alternative names to the row.names() property of the data frame.

Data frames: An example > row.names(MACNALLY) <- c("Reedy Lake", "Pearcedale", "Warneet", "Cranbourne", "Lysterfield", "Red Hill", "Devilbend", "Olinda")

Access the data in a data frame MACNALLY$HABITAT access the Column 1 MACNALLY$GST access the Column 2 MACNALLY$EYR access the Colum 3 MACNALLY[1,]  First row MACNALLY[,3]  Third column MACNALLY[3,2]  Element of third row and second column i=1:4; MACNALLY[i,]  rows from 1 to 4 MACNALLY[,2:3] cloumns from 2 to 3

Importing (reading) data > MACNALLY <- read.table( + 'macnally.csv', header=T, + row.names=1, sep=‘,') > MACNALLY <- read.table( + 'macnally.txt', header=T, + row.names=1, sep='\t')

Reviewing a data frame - fix() A data frame can also be viewed as a simple spreadsheet in a separate window by using the name of the data frame as an argument in the fix() function. The fix() function also enables simple editing of the data frame. >fix(MACNALLY)

Saving and loading of R objects Any object in R (including data frames) can also be saved into a native R workspace image file (*.RData) either individually, or as a collection of objects using the save() function. For example; > save(MACNALLY, file='macnally.RData') The saved object(s) can be loaded during subsequent sessions by providing the name of the saved workspace image file as an argument to the load() function. For example; > load("macnally.RData")

Exporting (writing) data The write.table() function is used to save data frames. > write.table(MACNALLY, "macnally.csv", quote = F, row.names = T, sep = ",")

Dummy data sets - generating random data Normal > # generate 5 random numbers from a normal > # distribution with a mean of 10 and a standard > # deviation of 1 > rnorm(5,mean=10,sd=1) [1] 11.564555 9.732885 8.357070 8.690451 12.272846 Log-Normal > # generate 5 random numbers from a log-normal > # distribution whose logarithm has a mean of 2 and a > # standard deviation of 1 > rlnorm(5,mean=2,sd=1) [1] 8.157636 30.914781 20.175299 5.071559 16.364014

Dummy data sets - generating random data Poisson > # generate 5 random numbers from a Poisson > # distribution with a lambda parameter of 4 > rpois(5,min=1,max=10) [1] 4 4 2 6 1 Binomial > # generate 5 random numbers from a binomial > # distribution based on 10 Bernoulli trials and > # a prob. of 0.5 > rbinom(5,size=10,prob=.5) [1] 4 4 1 4 6

Manipulating data sets Subsets of data frames – data frame indexing > #extract all the bird densities from sites that have GST values greater than 3 > subset(MACNALLY, GST>3)

The %in% matching operator Subset the MACNALLY dataset according to those rows that correspond to HABITAT 'Montane Forest' or 'Foothills Woodland' > MACNALLY[MACNALLY$HABITAT %in% c("Montane Forest", "Foothills Woodland"),]

Sorting datasets > MACNALLY[order(MACNALLY$HABITAT, MACNALLY$GST), ]