1 Writing Better R Code Hui Zhang Research Analytics.

Slides:



Advertisements
Similar presentations
Copyright 2001, ActiveState. XSLT and Scripting Languages or…XSLT: what is everyone so hot and bothered about?
Advertisements

compilers and interpreters
Computational Models The exam. Models of computation. –The Turing machine. –The Von Neumann machine. –The calculus. –The predicate calculus. Turing.
Introduction to Matlab Workshop Matthew Johnson, Economics October 17, /13/20151.
By Francisco Morales Carbonell Jaime Rodriguez Maya Jan Sola Ramos Find My Business.
Compiler Construction by Muhammad Bilal Zafar (AP)
Chapter Chapter 4. Think back to any very difficult quantitative problem that you had to solve in some science class How long did it take? How many times.
Recursion. Recursion is a powerful technique for thinking about a process It can be used to simulate a loop, or for many other kinds of applications In.
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
CIS 101: Computer Programming and Problem Solving Lecture 8 Usman Roshan Department of Computer Science NJIT.
16/13/2015 3:30 AM6/13/2015 3:30 AM6/13/2015 3:30 AMIntroduction to Software Development What is a computer? A computer system contains: Central Processing.
M AT L AB IDL &. What are MATLAB and IDL? Data processing and visualization tools –Easy, fast manipulation and processing of complex data –Visualization.
Parallelizing Compilers Presented by Yiwei Zhang.
Programming Introduction November 9 Unit 7. What is Programming? Besides being a huge industry? Programming is the process used to write computer programs.
Language Issues Misunderstimated? Sublimable? Hopefuller? "I know how hard it is for you to put food on your family.” "I know the human being and fish.
Types of software. Sonam Dema..
1 CS101 Introduction to Computing Lecture 19 Programming Languages.
Object Oriented Programming
Linux Operations and Administration
Access Control Problem A primary consideration in object- oriented design is to “separate the things that change from the things that stay.
© 2004 The MathWorks, Inc. 1 MATLAB for C/C++ Programmers Support your C/C++ development using MATLAB’s prebuilt graphics functions and trusted numerics.
Chapter 5. Loops are common in most programming languages Plus side: Are very fast (in other languages) & easy to understand Negative side: Require a.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
1 I-Logix Professional Services Specialist Rhapsody IDF (Interrupt Driven Framework) CPU External Code RTOS OXF Framework Rhapsody Generated.
An intro to programming. The purpose of writing a program is to solve a problem or take advantage of an opportunity Consists of multiple steps:  Understanding.
Word processing. Advantages of word processors 1) It is faster and easier than writing by hand. 2) You can store documents on your computer, which you.
Standard Grade Computing System Software & Operating Systems.
Discussion of Assignment 9 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
Programming Languages: Scratch Intro to Scratch. Lower level versus high level Clearly, lower level languages can be tedious Higher level languages quickly.
CS101 Introduction to Computing Lecture Programming Languages.
Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.
Improving perfomance Matlab’s a pig. Outline Announcements: –Homework I: Solutions on web –Homework II: on web--due Wed. Homework I Performance Issues.
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
Guide to Programming with Python Chapter One Getting Started: The Game Over Program.
Fundamental Programming: Fundamental Programming K.Chinnasarn, Ph.D.
Finding Red Pixels Prof. Ramin Zabih
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
Siena Computational Crystallography School 2005
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
FOUNDATION IN INFORMATION TECHNOLOGY (CS-T-101) TOPIC : INFORMATION SYSTEM – SOFTWARE.
Compiler Construction (CS-636)
CMSC 330: Organization of Programming Languages Theory of Regular Expressions NFAs  DFAs.
CIS 601 Fall 2003 Introduction to MATLAB Longin Jan Latecki Based on the lectures of Rolf Lakaemper and David Young.
CASE Tools CSC 532 : Advance Topics CSC 532 : Advance Topics Software Engineering Software Engineering Dr. box Dr. box Moayad Almohaishi Moayad Almohaishi.
Comments, Conditional Statements Continued, and Loops Engineering 1D04, Teaching Session 4.
Scion Macros How to make macros for Scion The Fast and Easy Guide.
Improving Matlab Performance CS1114
Improving performance Matlab’s a pig. Outline Announcements: –Homework I: Solutions on web –Homework II: on web--due Wed. Homework I Performance Issues.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Lecture 4 Speeding up your code Trevor A. Branch FISH 553 Advanced R School of Aquatic and Fishery Sciences University of Washington.
Programming 2 Intro to Java Machine code Assembly languages Fortran Basic Pascal Scheme CC++ Java LISP Smalltalk Smalltalk-80.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
In Search of the Optimal WHT Algorithm J. R. Johnson Drexel University Markus Püschel CMU
Python – It's great By J J M Kilner. Introduction to Python.
Topic 2: Hardware and Software
Writing Better R Code WIM Oct 30, 2015 Hui Zhang Research Analytics
Lecture 8 Speeding up your code
Matlab Training Session 4: Control, Flow and Functions
Topics Introduction to Repetition Structures
CS101 Introduction to Computing Lecture 19 Programming Languages
A451 Theory – 7 Programming 7A, B - Algorithms.
Matlab Workshop 9/22/2018.
Introduction to MATLAB
Recoding III: Introducing apply()
Recoding III: Introducing apply()
Java for IOI.
(Computer fundamental Lab)
Simulation And Modeling
1.3.7 High- and low-level languages and their translators
Presentation transcript:

1 Writing Better R Code Hui Zhang Research Analytics

2 Outline Approaches for improving the performance of R codes – Some previous knowledge of R is recommended – Some familiarity with C/C++ is also recommended. Topics – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code

3 Loops Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code

4 Loops Writing Better R Code – Loops for while No goto’s or do while’s They are really slow

5 Loops Writing Better R Code – Loops Best Practices – Mostly try to avoid – Evaluate practicality of rewrite (plys, vectorization, compiled code) – Always preallocate (when you can): » Vectors: numeric(n), integer(n), character(n) » Lists: vector(mode=“list”, length=n) » Dataframes: data.frame(col1=numeric(n), …) – If you can’t, try something other than an array/list.

6 Loops

7

8 Ply Fucntions Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code

9 Ply Functions Writing Better R Code – Loops – Ply Functions R has functions that apply other functions to data In a nutshell: loop sugar Typical *ply’s – apply() : apply function over matrix “margin(s)” – lapply() : apply function over list/vector – mapply() : apply function over multiple lists/vectors – sapply() : same as lapply(), but (possibly) nicer output – Plus some other mostly irrelevant ones

10 Ply Functions

11 Ply Functions

12 Ply Functions Writing Better R Code – Loops – Ply Functions Transforming Loops into Ply’s

13 Ply Functions Writing Better R Code – Loops – Ply Functions Most Ply’s are just shorthand/higher expression of loops Generally not much faster (if at all), especially with the compiler Thinking in terms of lapply() can be useful however …

14 Vectorization Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code

15 Vectorization Writing Better R Code – Loops – Ply Functions – Vectorization x + y X[, 1] <- 0 Rnorm(1000)

16 Vectorization Writing Better R Code – Loops – Ply Functions – Vectorization Same in R as in other high-level languages (Matlab, Rython, …) Idea: use pre-existing compiled kernels to avoid interpreter overhead Much faster than loops and plys

17 Vectorization Writing Better R Code – Loops – Ply Functions – Vectorization

18 Vectorization Writing Better R Code – Loops – Ply Functions – Vectorization Best Practices – Vectorize if at all possible – Note that this consumes potentially a lot of memory

19 Ply Fucntions Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code

20 Putting It All Together Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization Loops are slow apply() are just for loops lapply(), sapply(), mapply() are not for loops Ply functions are not vectorized Vectorization is fastest, but often needs a lot of memory

21 Putting It All Together Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization Example: let us compute the square of the number , using – for loop without preallocation – for loop with preallocation – sapply() – vectorization

22 Putting It All Together

23 RCPP Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code

24 RCPP Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code R is mostly a C program R extensions are mostly R programs

25 RCPP Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code Rcpp is: – R interface to compiled code – Package ecosystem (Rcpp, RcppArmadillo, RcppEigen, …) – Utilities to make writing C++ more convenient for R users – A tool which requires C++ knowledge to effectively utilize – GPL licensed (like R)

26 RCPP Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code Rcpp is not – Magic – Automatic R-to-C++ converter – A way around having to learn C++ – A tool to make existing R functionality faster (unless you rewrite it) – As easy to use as R

27 RCPP Writing Better R Code – Loops – Ply Functions – Vectorization – Loop, Plys, and Vectorization – Interfacing to Compiled Code Rcpp’s advantage – Compiled code is fast – Easy to install – Easy to use (comparatively) – Better documented than alternatives – Large, friendly, helpful community

28 RCPP

29 RCPP

30 RCPP

31 RCPP

32 RCPP

33 RCPP

34 Summary Bad R often looks like good C/C++ Vectorize your code as you much as you can Interfacing with compiled code helps