Presentation is loading. Please wait.

Presentation is loading. Please wait.

Material for today’s workshop is at:

Similar presentations


Presentation on theme: "Material for today’s workshop is at:"— Presentation transcript:

1 Material for today’s workshop is at:

2 2 days 1.5 hours

3 Module #: Title of Module
3

4 Launching jobs at Compute Canada
David Bujold Epigenomic Data Analysis Your logo here

5 What is HPC? High Performance Computing
“Traditional” in-house servers can quickly get overloaded Compute Canada provides HPC resources available to Canadian academic labs There are also many options in the private sector, such as AWS HPC uses clusters of computers Each individual computer in the cluster is called a node

6 What is Compute Canada? CFI-funded national platform integrating HPC resources at partner consortia across the country, to create a dynamic computational resource ACEnet Calcul Québec SciNet HPCVL SHARCNET WestGrid

7 Concepts connected to CC accounts
Shared resource for Canadian academia An account gives you access to free compute resource You get a yearly allocation Compute time (in core/years) Storage space Once logged in, you can launch compute jobs A job is a software execution Compute jobs use the yearly allocation

8 How to get an account Apply for an account at the Compute Canada website Apply for an account in one of the consortia Log into the CCDB portal, and follow the link "Apply for a Consortium Account" Choose to open an account at, for example, Calcul Québec Log into the Calcul Québec portal, and request access to the desired HPC under the "My Profile" tab

9 Concepts connected to CC accounts
When you log into an HPC, you are on a login node Login nodes are the HPC entry point, by which users will launch commands on the scheduler The scheduler is a queuing system in which computation jobs are waiting for available compute nodes Compute nodes are nodes on which the jobs get executed Resources on login nodes are limited, so jobs should always get launched on the scheduler HPC sysadmins don’t like jobs launched on the login nodes!

10 Scheduler At Compute Canada HPCs, you launch jobs by submitting commands to the scheduler When you launch the job, you can specify: A number of cores (CPUs) A walltime, the maximum amount of time that this job can take (after which it gets killed) It’s important to set those numbers properly Jobs with less walltime get processed quicker, but get killed if going overtime

11 Concepts connected to CC accounts
The time you will wait in the queue depends on many factors: How busy the HPC is Job length Number of cores (CPUs) needed Remaining allocation Etc. You can control things such as job length and the number of cores when submitting jobs to the scheduler In this workshop, we will make abstraction of the scheduler Software will be executed directly using an interactive node

12 Software through GenAP
Bioinformatics software pre-installed on Compute Canada

13 Modules Software is made available in the shape of loadable modules
To load the list of CVMFS modules: module use You can get the list of all available software with module avail To load a module: module load You need to load modules when you launch jobs on the scheduler

14 Ready to see how it works?
Let’s look at the ChIP-seq lab!

15 Module 3 Introduction to WGBS and analysis
Guillaume Bourque Epigenomic Data Analysis

16 Bisulfite treatment Xi and Li, BMC Bioinformatics, 2009

17 Workflow for analyzing BS-data
Processing of bisulfite-sequencing data: Quality control and pre-processing Bisulfite sequence alignment Quantification of absolute DNA methylation Data visualization and statistical analysis Visual inspection in a genome browser of selected regions Visualization of global distribution of methylation values Clustering of samples based on similarity Downstream analysis Identification of Differentially Methylated Regions (DMRs) Global analysis of DMRs

18 Quality metrics Read quality Presence of adapter sequencers
Duplicate rates Conversion rate

19 ENCODE WGBS Standards Experiments should have two or more biological replicates; they may have two technical replicates per biological replicate.  The C to T conversion rate should be ≥98% The CpG quantification should have a Pearson correlation of ≥0.8 for sites with ≥10X coverage. Sequencing may be paired- or single-ended, as long as sequencing type is specified and paired sequences are indicated. The experiment must pass routine metadata audits in order to be released.

20 Bisulphite sequence alignment
Bock, Nat Rev Genet, 2012

21 Bismark

22 Visualizing BS-seq data in IGV

23 GenPipes – Methyl-seq pipeline

24 Ready to see how it works?
Let’s look at the WGBS lab!


Download ppt "Material for today’s workshop is at:"

Similar presentations


Ads by Google