Sihua Peng, PhD Shanghai Ocean University

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
An introduction to R Honors 207 Cognitive Science (These Slides were Shamelessly Stolen from Dr. Pablo Gomez, DePaul University)
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia.
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
MATLAB Lecture One Monday 4 July Matlab Melvyn Sim Department of Decision Sciences NUS Business School
MATLAB Tutorials Session I Introduction to MATLAB Rajeev Madazhy Dept of Mechanical Engineering LSU.
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
General Computer Science for Engineers CISC 106 Lecture 02 Dr. John Cavazos Computer and Information Sciences 09/03/2010.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Chapter 6 Review: User Defined Functions Introduction to MATLAB 7 Engineering 161.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Introduction to MATLAB 7 for Engineers William J. Palm.
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
STAT 534: Statistical Computing Hari Narayanan
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
MA/CS 375 Fall 2002 Lecture 2. Motivation for Suffering All This Math and Stuff Try the Actor demo from
PROGRAMMING IN R Introduction to R. In this session I will: Introduce you to the R program and windows Show how to install R Write basic programs in R.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Pinellas County Schools
Introduction to R Chris Free. Introduction to R Free! Superior (if not comparable) to commercial alternatives Available on all platforms Not just for.
Sihua Peng, PhD Shanghai Ocean University Sihua Peng, PhD Shanghai Ocean University
Chapter 6 JavaScript: Introduction to Scripting
EEE 161 Applied Electromagnetics
Programming in R Intro, data and programming structures
R programming language
Introduction to R Samal Dharmarathna.
Sihua Peng, PhD Shanghai Ocean University
Introduction to Matlab
Introduction to R Carolina Salge March 29, 2017.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Sihua Peng, PhD Shanghai Ocean University
Introduction to MATLAB
Naomi Altman Department of Statistics (Based on notes by J. Lee)
DEPARTMENT OF COMPUTER SCIENCE
2) Platform independent 3) Predefined functions
R Programming Language
JavaScript: Functions.
PHP Introduction.
INTRODUCTION TO BASIC MATLAB
MATLAB DENC 2533 ECADD LAB 9.
Use of Mathematics using Technology (Maltlab)
Sihua Peng, PhD Shanghai Ocean University
Sihua Peng, PhD Shanghai Ocean University
Lecture Notes 8/24/04 (part 2)
Sihua Peng, PhD Shanghai Ocean University
Communication and Coding Theory Lab(CS491)
funCTIONs and Data Import/Export
CSCI N317 Computation for Scientific Applications Unit R
HYPERTEXT PREPROCESSOR BY : UMA KAKKAR
Spreadsheets, Modelling & Databases
CSCI N207 Data Analysis Using Spreadsheet
Sihua Peng, PhD Shanghai Ocean University
Simulation And Modeling
R Course 1st Lecture.
Data analysis with R and the tidyverse
One-Factor Experiments
General Computer Science for Engineers CISC 106 Lecture 03
Using R for Data Analysis and Data Visualization
Introduction to Matrices
Presentation transcript:

Sihua Peng, PhD Shanghai Ocean University 2016.10 Modern Biostatistics Sihua Peng, PhD Shanghai Ocean University 2016.10

Four VIPs in statistics Gosset Pearson Fisher Neyman

William Sealy Gosset William Sealy Gosset (1876 –1937) was an English statistician. He published under the pen name Student, and developed the Student's t-distribution.

Karl Pearson Karl Pearson (1857 –1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. In 1911 he founded the world's first university statistics department at University College London. Many familiar statistical terms such as standard deviation, component analysis, and chi-square test were proposed by him.

Ronald Fisher Sir Ronald Aylmer Fisher (1890 –1962), was an English statistician, and biologist. Many familiar statistical terms such as F-distribution, Fisher's linear discriminant, Fisher exact Test, Fisher's permutation test, and Von Mises–Fisher distribution were proposed by him. F-distribution arises frequently as the null distribution of a test statistic, most notably in the analysis of variance.

Jerzy Neyman Jerzy Neyman (1894 – 1981), was a Polish mathematician and statistician who spent most of his professional career at the University of California, Berkeley. Neyman was the first to introduce the modern concept of a confidence interval into statistical hypothesis testing.

References

Dr. Murray Logan He is the author of our text book, and he is an associate lecturer within the School of Biological Sciences, Monash University, Australia. http://users.monash.edu.au/~murray/index.html The data sets in this book: http://users.monash.edu.au/~murray/BDAR/index.html

Contents Introduction to R Data sets Introductory Statistical Principles Sampling and experimental design with R Graphical data presentation Simple hypothesis testing Introduction to Linear models Correlation and simple linear regression Single factor classification (ANOVA) Nested ANOVA Factorial ANOVA Simple Frequency Analysis

1. Introduction to R R: initially written by Ross Ihaka and Robert Gentleman at Dep. of Statistics of U of Auckland, New Zealand during 1990s.

VIPs of R Ross Ihaka Robert Gentleman https://www.stat.auckland.ac.nz/~ihaka/ https://en.wikipedia.org/wiki/Robert_Gentleman_(statistician)

What R does and does not data handling and storage: numeric, textual matrix algebra hash tables and regular expressions high-level data analytic and statistical functions classes (“OO”) graphics programming language: loops, branching, subroutines is not a database, but connects to DBMSs language interpreter can be very slow, but allows to call own C/C++ code no spreadsheet view of data, but connects to Excel/MsOffice no professional / commercial support

Download R https://www.r-project.org/

Download R

Install R

The R environment After installed, you can run R.

The R environment Object: R is an object oriented language and everything in R is an object. For example, a single number is an object, a variable is an object, output is an object, a data set is an object that is itself a collection of objects, etc. Vector : A collection of one or more objects of the same type (e.g. all numbers or all characters etc). Function A set of instructions carried out on one or more objects. Functions are typically used to perform specific and common tasks that would otherwise require many instructions.

The R environment Parameter : The kind of information that can be passed to a function. Argument : The specific information passed to a function to determine how the function should perform its task. Operator : Is a symbol that has a pre-defined meaning. Familiar operators include + - * and /, which respectively perform addition, subtraction, multiplication and division.

Expressions, Assignment and Arithmetic >2+3 ←an expression [1] 5 ←the evaluated output > VAR1 <- 2 + 3 ←assign expression to the object VAR1 >VAR2 <-9 ← assign expression to object VAR2 > VAR2 - 1 ←print the contents of VAR2 minus 1 [1] 8 > ANS1 <- VAR1 * VAR2 ←evaluated expression assigned to ANS1 > ANS1 ←print the contents of ANS1 the evaluated output [1] 40

Expressions, Assignment and Arithmetic Objects can be concatenated (joined together) to create objects with multiple entries using the c() (concatenation) function. > c(1, 2, 6) ←concatenate 1, 2 and 6 [1] 1 2 6 ←printed output > c(VAR1, ANS1) ←concatenate VAR1 and ANS1 contents [1] 5 25 ←printed output

R workspaces > ls() ←list current objects in R environment [1] "ANS1" "VAR1" "VAR2“ > rm(VAR1, VAR2) ←remove the VAR1 and VAR2 objects rm(list = ls()) ←remove all user defined objects Workspaces: Throughout an R session, all objects that have been added are stored within the R global environment, called the workspace.

R workspaces getwd() To displays the current working folder save.image()  to save the workspace and thus all those objects (vectors, functions, etc) load()  to load the a previously saved workspace and thus all those objects. q()  to quite R. getwd() To displays the current working folder setwd() To set the working folder help() >help(mean) >?mean

Vectors - variables The basic data storage unit in R is called a vector. A vector is a collection of one or more entries of the same class (type).

Factors To properly accommodate factorial (categorical) variables, R has an additional class of vector called a factor which stores the vector along with a list of the levels of the factorial variable. The factor() function converts a vector into a factor vector. >SHADE <- c("no", "no", "no", "no", "no", "full", "full", "full", "full", "full") > SHADE [1] "no" "no" "no" "no" "no" "full" "full" "full" [9] "full" "full“ >SHADE <- factor(SHADE) [1] no no no no no full full full full full Levels: full no

Matrices A vector has only a single dimension – it has length. However, a vector can be converted into a matrix (2 dimensional array). X <- c(16.92, 24.03, 7.61, 15.49, 11.77) Y <- c(8.37, 12.93, 16.65, 12.2, 13.12) XY1 <- cbind(X, Y) XY2 <- rbind(X, Y)

To access the data in Matrices XY1[1,]  First Row XY1[,2] Second column XY[2,2]  the value in second row and second column XY1[1:3,] Rows from 1 to3 XY1[,1:2] Columns from 1 to 2

Data frames Data frames are generated by combining multiple vectors together such that each vector becomes a separate column in the data frame. In this way, a data frame is similar to a matrix in which each column can represent a different vector type. We will discuss Data Frame in details in the next chapter.

Working with scripts A collection of one or more commands is called a script. In R, a script is a plain text file with a separate command on each line and can be created and read in any text editor. A script is read into R by providing the full filename of the script file as an argument in the source() function. >source("filename.R")

A typical script may look like the following:

References Biostatistical Design and Analysis Using R: A Practical Guide. By Murray Logan. WILEY-BLACKWELL. Introduction to Data Analysis and Graphical Presentation in Biostatistics with R. By Thomas W. MacFarland. Springer.