Parallel R Andrew Jaffe Computing Club 4/5/2015. Overview Introduction multicore Array jobs The rest.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

LEUCEMIA MIELOIDE AGUDA TIPO 0
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Chapter 6 Cost and Choice. Copyright © 2001 Addison Wesley LongmanSlide 6- 2 Figure 6.1 A Simplified Jam-Making Technology.
Chapter 4: Vehicular Homicide: Accidental or Intentional Copyright © 2012, Elsevier Inc. All rights reserved. 1.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Factors, Primes & Composite Numbers
Combining Like Terms. Only combine terms that are exactly the same!! Whats the same mean? –If numbers have a variable, then you can combine only ones.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Chapter 11: Structure and Union Types Problem Solving & Program Design.
Using the Set Operators
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.
4.4.1 Generalised Row Echelon Form
Points, Vectors, Lines, Spheres and Matrices
NGS computation services: API's,
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
1 SESSION 5 Graphs for data analysis. 2 Objectives To be able to use STATA to produce exploratory and presentation graphs In particular Bar Charts Histograms.
ZMQS ZMQS
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Copyright 2012, 2008, 2004, 2000 Pearson Education, Inc.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Report Card P Only 4 files are exported in SAMS, but there are at least 7 tables could be exported in WebSAMS. Report Card P contains 4 functions: Extract,
(This presentation may be used for instructional purposes)
ABC Technology Project
© S Haughton more than 3?
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
Progam.-(6)* Write a program to Display series of Leaner, Even and odd using by LOOP command and Direct Offset address. Design by : sir Masood.
1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.
INFO ASSIST REPORT WRITING
Chapter 2 Section 3.
Twenty Questions Subject: Twenty Questions
Linking Verb? Action Verb or. Question 1 Define the term: action verb.
Squares and Square Root WALK. Solve each problem REVIEW:
4 Oracle Data Integrator First Project – Simple Transformations: One source, one target 3-1.
Energy & Green Urbanism Markku Lappalainen Aalto University.
An Introduction to R: Logic & Basics. The R language Command line Can be executed within a terminal Within Emacs using ESS (Emacs Speaks Statistics)
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
More Two-Step Equations
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
CHAPTER 11 FILE INPUT & OUTPUT Introduction to Computer Science Using Ruby (c) 2012 Ophir Frieder et al.
Tutorial 1: Sensitivity analysis of an analytical function
Drill down Reconciliation Analysis Report (RFMFGRCN_RP1) in the Background Instructions Guide June, 2012.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Presentation transcript:

Parallel R Andrew Jaffe Computing Club 4/5/2015

Overview Introduction multicore Array jobs The rest

Introduction Based roughly on: McCallum and Weston. Parallel R (OReilly Book), so consult for the more complicated methods

Introduction Calculate Stat Permute Outcome/ Bootstrap (B times) Find Null Stats Calculate statistical significance

Introduction Calculate Stat 1 core: … … Find Null Stats Permute Outcome/ Bootstrap (B times) Find Null Stats P cores: Combine Null Stats Calculate statistical significance … 1 core:

Introduction Basically two ways of doing parallel jobs – Submit multiple jobs prepared to run in parallel across one or more nodes – each uses one core – Use multiple cores on a given node – note that youre limited by the number of cores on that node

Introduction The computing cluster is a shared resource – be careful when running jobs on multiple cores on one node (and slightly less so for parallel jobs across nodes)

Overview Introduction multicore Array jobs The rest

The multicore R Package library(multicore) This is definitely the easiest/most straightforward way to run things in parallel The easiest function to use is mclapply() - works exactly the same as lapply() Only works on Linux/Mac (!)

The multicore R Package McCallum and Weston. Parallel R. 2012

apply() list apply if you havent used any of the apply functions before, definitely check them out (apply, lapply, sapply, tapply) apply(data, margin [row=1,col=2], function) – Applies function along rows or columns of a matrix or data.frame x = matrix(rnorm(100),nc = 10) apply(x, 1, function(x) mean(x)) – Each row is x, assessed in the function

apply() Some functions dont need to be written like that: mean, length, class, sum, max, min, … apply(x,1,mean) apply(x,1,max)

lapply() Instead of applying a function to every row or column, applies a function to every element of a list returns a list list: collection of elements of different classes and different dimensions – You can have lists of different sized data.frames and matrices – Basically 3D R object (1D = vector, 2D = matrix)

lists > y= list(c(1:5), c(6:21), c(3,7)) > y [[1]] [1] [[2]] [1] [[3]] [1] 3 7 > y[[1]] # select 1 element, now a vector [1] > y[1:2] # select multiple elements [[1]] [1] [[2]] [1]

mclapply() Does lapply(), but splits the work over multiple cores on your node All you need to control/input is how many cores it should use – the function does all the splitting and reassembling mclapply(theList, function, mc.cores)

mclapply()

Enigma/Cluster You need to explicitly request multiple cores on jobs do not use multicore functions if you have not This is a node log-on request for 4 cores, with a total memory max of 32 Gbs, and qrsh -pe local 4 -l mf=32G,h_vmem=3G h_vmem is the upper memory limit when the job dies memory_limit/no_cores^2, so 48G

Enigma/Cluster Works the same as submitting jobs with qsub I just have aliases set up in my ~/.bashrc file alias qsmult='qrsh -pe local 4 –l mf=32G,h_vmem=2G' alias qssmult='qsub -V -pe local 3 -l mf=32G,h_vmem=2G -cwd -b y R CMD BATCH --no-save'

Overview Introduction multicore Array jobs The rest

Array jobs An SGE Array Job is a script that is to be run multiple times. Note that this means EXACTLY the same script is going to be run multiple times, the only difference between each run is a single environment variable, $SGE_TASK_ID, so your script MUST be reasonably intelligent.

Array jobs qsub -t V -l mf=20G,h_vmem=32G -cwd -b y R CMD BATCH --no-save sim1_GO_spikein_v2.R The sim1_GO_spikein_v2.R script is submitted 10 times An incremented environment variable is assigned to each, here from 1 to 10 (-t 1-10) Within each script, I initiate a variable runId = Sys.getenv("SGE_TASK_ID") Which assigns the t value to runId

Array jobs So, I have 10 jobs running, each with a different value of runId At the end of the script, I can use paste() and save the data from each job as separate files: save(whatever, file = paste("results",runId,".rda",sep="")) Then you have to manually (and carefully) append/collect all of the data back together

Array jobs Note that unlike jobs on multiple cores on one node, these jobs are assigned nodes like any other job you create using qsub You are therefore not limited by the number of cores on a node, but rather the number of slots you can use (I think its around 10) Also note that its hard to get more than 4 cores on a node (or even more than 3) Lastly, your 1 array job gets one job ID (see qstat), so you can easily delete it using qdel

Overview Introduction multicore Array jobs The rest

The rest… These are from the Parallel R book, and I havent directly used them: McCallum and Weston. Parallel R. 2012

The rest… Also in multicore package: McCallum and Weston. Parallel R. 2012

The rest…

And on Amazon… McCallum and Weston. Parallel R. 2012

Questions?