High Performance Computing with R

Slides:



Advertisements
Similar presentations
C++ Development on Linux Agenda Introduction Editors Debuggers GUI IDEs Make Automake Exploring further.
Advertisements

Computer Science Basics CS 216 Fall Operating Systems interface to the hardware for the user and programs The two operating systems that you are.
Programming Creating programs that run on your PC
The Basic Tools Presented by: Robert E., & Jonathan Chase.
Discussion Section: HW1 and Programming Tips GS540.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC.
Dr. Chris Musselle – Consultant R Meets Julia Dr Chris Musselle.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Stern Center for Research Computing
Cloud Computing Other High-level parallel processing languages Keke Chen.
PHP With Oracle 11g XE By Shyam Gurram Eastern Illinois University.
Stephen Lawe Colin Smith April 4, 2013 Open Source Programming in Transportation Prepared for: 2013 TRB Applications Conference.
Introduction to Interactive Media Interactive Media Tools: Software.
MaterialsHub - A hub for computational materials science and tools.  MaterialsHub aims to provide an online platform for computational materials science.
1.8History of Java Java –Based on C and C++ –Originally developed in early 1991 for intelligent consumer electronic devices Market did not develop, project.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
Implementation - Part 2 CPS 181s March 18, Pieces of the Site-building Puzzle Page 180, figure 4.1.
1 3. Computing System Fundamentals 3.1 Language Translators.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
An Introduction to HDInsight June 27 th,
Introduction to CS520/CS596_026 Lecture Two Gordon Tian Fall 2015.
Computing System Fundamentals 3.1 Language Translators.
Computer Architecture
CREATED BY – UPENDRA SHARMA
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
CSci6702 Parallel Computing Andrew Rau-Chaplin
Beginning JavaScript 4 th Edition. Chapter 1 Introduction to JavaScript and the Web.
Derek Weitzel Grid Computing. Background B.S. Computer Engineering from University of Nebraska – Lincoln (UNL) 3 years administering supercomputers at.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
1 Seattle University Master’s of Science in Business Analytics Key skills, learning outcomes, and a sample of jobs to apply for, or aim to qualify for,
The Art of R Programming Chapter 15 – Writing Fast R Code Chapter 16 – Interfacing R to Other languages Chapter 17 – Parallel R.
Scaling up R computation with high performance computing resources.
Programming 2 Intro to Java Machine code Assembly languages Fortran Basic Pascal Scheme CC++ Java LISP Smalltalk Smalltalk-80.
Basics Components of Web Design & Development Basics, Components, Design and Development.
1 Sections Java Virtual Machine and Byte Code Fundamentals of Java: AP Computer Science Essentials, 4th Edition Lambert / Osborne.
Multiple Sequence Alignment with PASTA Michael Nute Austin, TX June 17, 2016.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Chapter 10 1 Figure 10-1: Database-enabled intranet-internet environment.
Programming C++ in Linux by various IDEs and editors by: Danial Khashabi Master: Dr.B.Taheri November 2008.
How to Get Started With Python
Programming vs. Packaged
Basic Concepts: computer, program, programming …
Why don’t programmers have to program in machine code?
Software Engineering for Data Scientists
Big Data A Quick Review on Analytical Tools
Our Graphics Environment
Fusion Tech Talk 2# (powered by Simplify Digital)
Writing Better R Code WIM Oct 30, 2015 Hui Zhang Research Analytics
Getting Started with R.
Spark Presentation.
R For The SQL Developer Kevin Feasel Manager, Predictive Analytics
DATA MINING Python.
MaterialsHub - A hub for computational materials science and tools.
NGS computation services: APIs and Parallel Jobs
R Programming.
Programming vs. Packaged
Dane Stubben QuintilesIMS Database Manager
Software Programming J. Holvikivi 2014.
Learn about MATLAB Engineers – not sales!
Introduction to Apache
Segurança e auditoria de sistemas
Spark and Scala.
Unit 6 part 3 Test Javascript Test.
Introduction to CUDA.
H2O is used by more than 14,000 companies
Server & Tools Business
DATA MINING Python.
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

High Performance Computing with R For many years, R had a reputation for being slow, unable to process large datasets. This was true until c2012 when its compiler caw switched from `parse tree’ to `byte code’. Its speed is now similar to Python, Java, Matlab & IDL. All of these are slower than Fortran, C and C++. # Benchmarking R: Ten million element vector on MacBook laptop w <- rnorm(10000000) # 0.9 sec wsort <- sort(w) # 1.4 sec wfft <- fft(w) # 1.8 sec foo <- 0 loop <- function(n) { for(i in 1:n) { if(tan(i) > 0.5) foo = foo + atan(i)^{2/3 * w[i]} } } loop(10000000) # 6.6 sec Loop with nonlinear computation quartz() ; plot(w) # 190 sec write(format(w, digits=2)) # 34 sec save(w, file='w.out') # 0.1 sec

R programming tools library(help=‘utils’) & ‘base’ & ‘tools’ Program flow (if, for, else, while, repeat, break, next, stop) Host computer (system, list.files, source, readline, Rscript, pipe) Editors, IDEs and GUIs (Rstudio, Jupyter, edit, emacs, vi) Debugging (debug, browser, try, traceback, Rprof, testthat) LaTeX (xtable, Sweave, knitr) Language interfaces (C, C++, Fortran, Python, Java, Julia, Matlab, SQL, HTML, Oracle, Tcl/Tk, BUGS, JAGS, Stan) Use rpy2 to access R from Python: conda install rpy2 ## on terminal, installs R and rpy2 pip install rpy2 ## in (i)Python, requires R previously installed import rpy2 import rpy2.robjects as robjects R = robjects.r ranGauss = R.rnorm(100) print(ranGauss)

Strategies for speeding up R code Precompile user-created functions User vector/matrix operations where possible Avoid for(i in 1:N) loops where possible Use external C, Fortran or C++ routines for computationally intensive steps Profile your code: Rprof, CRAN rbenchmark, microbenchmark

CRAN packages for parallel processing R is not intrinsically designed for parallel processing but, due to utilities for interaction with the host computer, dozens of CRAN packages are now available to facilitate parallel & distributed computing parallel and foreach functions distributes for loop to resident cores multicore, batch & condor serve multicore computers mclapply applies any function to each element of a vector in parallel h2o facilitates machine learning (e.g. RFs, ANNs) in a parallel environment CRAN HadoopStream & hive serve MapReduce in Hadoop environment CRAN cloudRmpi serves MPI and Gputools, magma & OpenCl serve GPU clusters RevoScaleR integrates R with Microsoft SQL Server datatable, ff & bigmemory treat large out-of-memory datasets

While originally designed for an individual exploring small datasets, R can be pipelined and can treat megadatsets also R scripts are very compact Two Penn State Ph.D. theses completed in 102 lines of code