Using R with Python.

Slides:



Advertisements
Similar presentations
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 5 = More of chapter 3 Agenda:
Advertisements

Chapter 23: Inferences About Means
4.3 NORMAL PROBABILITY DISTRIBUTIONS
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Lab 1: Using data output from Qiime, transformations, quality control
An Introduction to R: Logic & Basics. The R language Command line Can be executed within a terminal Within Emacs using ESS (Emacs Speaks Statistics)
R for Macroecology Aarhus University, Spring 2011.
R for Macroecology Functions and plotting. A few words on for  for( i in 1:10 )
Welcome to the R intro Workshop Before we begin, please download the “SwissNotes.csv” and “cardiac.txt” files from the ISCC website, under the R workshop.
Introduction to Matlab
Jack Davis Andrew Henrey FROM N00B TO PRO. PURPOSE Create a simulator from scratch that: Generates data from a variety of distributions Makes a response.
R: A Statistics Program For Teaching & Research Josué Guzmán 11 Nov. 2007
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
1 The Chi squared table provides critical value for various right hand tail probabilities ‘A’. The form of the probabilities that appear in the Chi-Squared.
The Normal Curve and Sampling A.A sample will always be different from the true population B.This is called “sampling error” C.The difference between a.
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
R – a brief introduction Johannes Freudenberg Cincinnati Children’s Hospital Medical Center
Discussing the student measurements of building height. Letting them originate concepts for: Multiple measures Mean Standard Deviation Outliers / identifying.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Lesson Means and Variances of Random Variables.
Copyright © 2012 Pearson Education, Inc. All rights reserved. Chapter 3 Simple Linear Regression.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 8-6 Testing a Claim About a Standard Deviation or Variance.
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
Sébastien Lê Agrocampus Rennes A very short introduction to “R” The “Rcmdr” package and its environment.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
I❤RI❤R Kin Wong (Sam) Game Plan Intro R Import SPSS file Descriptive Statistics Inferential Statistics GraphsQ&A.
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
R Programming Yang, Yufei. Normal distribution.
Exploratory Data Analysis Observations of a single variable.
Chapter 2 Analysis using R. Few Tips for R Commands included here CANNOT ALWAYS be copied and pasted directly without alteration. –One major reason is.
Sampling Error SAMPLING ERROR-SINGLE MEAN The difference between a value (a statistic) computed from a sample and the corresponding value (a parameter)
Meta-Study. Representation of the Sampling Distribution of Y̅
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
1 Lecture 5 Post-Graduate Students Advanced Programming (Introduction to MATLAB) Code: ENG 505 Dr. Basheer M. Nasef Computers & Systems Dept.
SOL Review Algebra I.
Computer Programming for Biologists Class 4 Nov 14 th, 2014 Karsten Hokamp
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
R tutorial Stat 140 Linjuan Qian
Revision of topics for CMED 305 Final Exam. The exam duration: 2 hours Marks :25 All MCQ’s. (50 questions) You should choose the correct answer. No major.
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
MathematicalMarketing Slide 4b.1 Distributions Chapter 4: Part b – The Multivariate Normal Distribution We will be discussing  The Multivariate Normal.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Data Management Data Types R basic 11/11/2013. Data Types 1. Vector 2. Factor 3. Matrix 4. Array 5. List 6. Dataframe.
Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.
Arko Barman COSC 6335 Data Mining Fall  Free, open source statistical analysis software  Competitor to commercial softwares like MATLAB and SAS.
Introduction to R.
Lecture Slides Elementary Statistics Twelfth Edition
R Software We Will Use: Can be downloaded from
Introduction to R Samal Dharmarathna.
Arko Barman COSC 6335 Data Mining Fall 2014
Random Variables Random variables assigns a number to each outcome of a random circumstance, or equivalently, a random variable assigns a number to each.
Lecture 2: Programming in R
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Statistical Programming Using the R Language
Confidence Intervals and Hypothesis Tests for Variances for One Sample
Lecture 2: Programming in R
Do data match an expected ratio
Chapter 9 Hypothesis Testing.
Introduction to Python
Matlab review Matlab is a numerical analysis system
Introduction to ANOVA.
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Data Analysis Module: Chi Square
MIS2502: Data Analytics Introduction to R and RStudio
Stat 251 (2009, Summer) Lab 2 TA: Yu, Chi Wai.
Math 10, Spring 2019 Introductory Statistics
Presentation transcript:

Using R with Python

Python commands: import rpy2.robjects as robjects r = robjects.r pi = r.pi x = r.c(1,2,3,4,5,6) y = r.seq(1,10) m = r.matrix(y, nrow=5) n = r.matrix(y, ncol=5) f = r("read.table('RandomDistribution.tsv',sep='\t')") f_matrix = r.matrix(f, ncol=7) mean_first_col = r.mean(f_matrix[0]) Equivalent R commands: > pi [1] 3.141593 > x = c(1,2,3,4,5) > y = seq(1,10) > m = matrix(y, nrow=5) > n = matrix(y, ncol=5) > f = read.table('RandomDistribution.tsv',sep='\t') > f_matrix = matrix(f, ncol=7) > mean_first_col = mean(f[,1])

R command to get the pi constant [1] 3.141593 In Python, you can get pi by typing: >>> import rpy2.robjects as robjects >>> r = robjects.r >>> r.pi <FloatVector - Python:0x10c096950 / R:0x7fd1da546e18> [3.141593]

Notice that if you use the print statement, the result will look a bit different: >>> print r.pi [1] 3.141593 And, since r.pi is a vector of length 1, if you want to get its numerical value, you have to use indexing: >>> r.pi[0] 3.141592653589793

Accessing an R object from Python >>> import rpy2.robjects as robjects >>> r = robjects.r Accessing an R object as an attribute of the r object using the dot syntax >>> r.pi Accessing an R object using the [] operator on r like you would use a dictionary >>> pi = r['pi'] Calling r like you would do with a function passing the R object as an argument. >>> pi = r('pi')

Creating vectors >>> print r.c(1, 2, 3, 4, 5, 6) [1] 1 2 3 4 5 6 >>> print r['c'](1,2,3,4,5,6) [1] 1 2 3 4 5 6 >>> c = r['c'] >>> print c(1,2,3,4,5,6) [1] 1 2 3 4 5 6 >>> print r('c(1,2,3,4,5,6)') [1] 1 2 3 4 5 6

How to deal with function arguments that contain a dot > f = read.table('RandomDistribution.tsv', sep='\t') > m = mean(f[,7], trim = 0, na.rm = FALSE) Would become in Python: >>> f = r("read.table('RandomDistribution.tsv',sep='\t')") >>> r.mean(f[3], trim = 0, na_rm = 'FALSE') <FloatVector - Python:0x106c82cb0 / R:0x7fb41f887c08> [38.252747]

Running a Chi2 test R session: > h=read.table("Chi-square_input.txt",header=TRUE,sep="\t") > names(h) [1] "SAMPLE" "GENE1" "GENE2" > chisq.test(table(h$GENE1,h$GENE2))   Pearson's Chi-squared test with Yates' continuity correction data: table(h$GENE1, h$GENE2) X-squared = 5.8599, df = 1, p-value = 0.01549 Warning message: In chisq.test(table(h$GENE1, h$GENE2)) : Chi-squared approximation may be incorrect

Running a Chi2 test Corresponding Python session: >>> import rpy2.robjects as ro >>> r = ro.r >>>T = r("read.table('Chi-square_input.txt', header=TRUE, sep='\t')") >>> print r.names(T) [1] "SAMPLE" "GENE1" "GENE2" >>> cont_table = r.table(T[1],T[2]) >>> chitest = r['chisq.test'] >>> print chitest(T[1],T[2])   Pearson's Chi-squared test with Yates' continuity correction ... X-squared = 5.8599, df = 1, p-value = 0.01549

Calculating mean, standard deviation, z-score and p-value of a set of numbers R session: > f = read.table("RandomDistribution.tsv", sep= "\t") > m = mean(f[,3], trim = 0, na.rm = FALSE) > sdev = sd(f[,3], na.rm = FALSE) > value = 0.01844 > z = (m -value)/sdev > p = pnorm(-abs(z)) > p [1] 0.3841792

Calculating mean, standard deviation, z-score and p-value of a set of numbers Corresponding Python session: >>> import rpy2.robjects as ro >>> r = ro.r >>> f = r("read.table('RandomDistribution.tsv',sep='\t')") >>> m = r.mean(f[2], trim = 0, na_rm = 'FALSE') >>> sdev = r.sd(f[2], na_rm = 'FALSE') >>> value = 0.01844 >>> z = (m[0] - value)/sdev[0] >>> x = r.abs(z) >>> p = r.pnorm(-x[0]) >>> p[0] 0.3841792080560442

Calculating mean, standard deviation, z-score and p-value of a set of numbers Using a program: import rpy2.robjects as ro r = ro.r f = r("read.table('RandomDistribution.tsv',sep='\t')") m = r.mean(f[2], trim = 0, na_rm = 'FALSE') sdev = r.sd(f[2], na_rm = 'FALSE') value = 0.01844 z = (m[0] - value)/sdev[0] x = r.abs(z) p = r.pnorm(-x[0]) print m[0], sdev[0], z, p[0]

Plot random numbers to a file import rpy2.robjects as ro from rpy2.robjects.packages import importr r = ro.r grdevices = importr('grDevices') grdevices.png(file="RandomPlot.png", width=512, height=512) r.plot(r.rnorm(100), ylab = "random”) grdevices.dev_off()

Plot data data from a file import rpy2.robjects as ro from rpy2.robjects.packages import importr r = ro.r f = r("read.table('RandomDistribution.tsv',sep='\t')") grdevices = importr('grDevices') grdevices.png(file="Plot.png", width=512, height=512) r.plot(f[1],f[2], xlab = "x", ylab = "y") grdevices.dev_off() grdevices.png(file="Histogram.png", width=512, height=512) r.hist(f[4], xlab='x', main = 'Distribution of values')