Material for today’s workshop is at:

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
ACCESS GRID / IOCOM VIDEO COLLABORATION NODE AT UNBC YOU QIN WANG / PETER L. JACKSON UNBC High Performance Computing Laboratory.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Ways to Connect to OSG Tuesday afternoon, 3:00 pm Lauren Michael Research Computing Facilitator University of Wisconsin-Madison.
LARK Bringing Distributed High Throughput Computing to the Network Todd Tannenbaum U of Wisconsin-Madison Garhan Attebury
Resource management system for distributed environment B4. Nguyen Tuan Duc.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
Next Generation Sequencing pipeline: a joint LONI – BIRN [UCLA – UCI] collaborative project F. Macciardi – March 16, 2011.
EDACC Quality Characterization for Various Epigenetic Assays
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CNGrid GOS 3.0 Practice OMII-Euro & CNGrid Joint Training Material QiaoJian Jan
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Contribution of Epigenetic Variation to Expression Changes Among Tissues and Genotypes Steve Eichten – Springer Lab PAG iPlant Workshop 1/17/12.
Genomics Core Facility at UNH: High-Throughput Sequencing on the Illumina HiSeq 2500 Platform Project Consultation Sample Submission Library Creation Illumina.
Faucets Queuing System Presented by, Sameer Kumar.
Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
westgrid.ca Canada’s HPC Research Future Jonathan Schaeffer Vice Provost and Associate Vice President (IT) University of Alberta On behalf.
Canadian Bioinformatics Workshops
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Ways to Connect to OSG Tuesday, Wrap-Up Lauren Michael, CHTC.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Dataset S1 – KvDMR bisulphite cloning analysis Bisulphite PCR products containing 23 CpG dinucleotides were cloned and sequenced from human EB trophoblast.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Advanced Computing Facility Introduction
Canadian Bioinformatics Workshops
Auburn University
Canadian Bioinformatics Workshops
Open OnDemand: Open Source General Purpose HPC Portal
Cancer Genomics Core Lab
What is HPC? High Performance Computing (HPC)
Assumptions What are the prerequisites? … The hands on portion of the workshop will be on the command-line. If you are not familiar with the command.
Student IT induction.
CyVerse Tools and Services
Get to know SQL Manager SQL Server administration done right 
Short Read Sequencing Analysis Workshop
Genome Sequence Annotation Server
Architecture & System Overview
Discovery of Multiple Differentially Methylated Regions
Introduction to XSEDE Resources HPC Workshop 08/21/2017
ChIP-Seq Analysis – Using CLCGenomics Workbench
An easier path? Customizing a “Global Solution”
Choosing your elective
Bioinformatic analysis using Jetstream, a cloud computing environment
Class project by Piyush Ranjan Satapathy & Van Lepham
Student IT induction.
Shared Research Computing Policy Advisory Committee (SRCPAC)
Helix - HPC/SLURM Tutorial
LGC Website, Software updates, Documentation, and Videos
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Advanced Computing Facility Introduction
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Genetics and Genomics Analysis Platform
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
Introduction to High Performance Computing Using Sapelo2 at GACRC
MMG: from proof-of-concept to production services at scale
NETCHEM Remote Access Laboratory Guide
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to RNA-Seq & Transcriptome Analysis
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Genomic & RNA Profiling Core Facility
Short Read Sequencing Analysis Workshop

Presentation transcript:

Material for today’s workshop is at: http://www.c3g.ca/computational-epigenetics-workshop/

2 days 1.5 hours

Module #: Title of Module 3

Launching jobs at Compute Canada David Bujold Epigenomic Data Analysis Your logo here

What is HPC? High Performance Computing “Traditional” in-house servers can quickly get overloaded Compute Canada provides HPC resources available to Canadian academic labs There are also many options in the private sector, such as AWS HPC uses clusters of computers Each individual computer in the cluster is called a node

What is Compute Canada? CFI-funded national platform integrating HPC resources at partner consortia across the country, to create a dynamic computational resource ACEnet Calcul Québec SciNet HPCVL SHARCNET WestGrid

Concepts connected to CC accounts Shared resource for Canadian academia An account gives you access to free compute resource You get a yearly allocation Compute time (in core/years) Storage space Once logged in, you can launch compute jobs A job is a software execution Compute jobs use the yearly allocation

How to get an account Apply for an account at the Compute Canada website https://www.computecanada.ca/research-portal/apply-for-an-account/ Apply for an account in one of the consortia Log into the CCDB portal, and follow the link "Apply for a Consortium Account" Choose to open an account at, for example, Calcul Québec Log into the Calcul Québec portal, and request access to the desired HPC under the "My Profile" tab https://portail.calculquebec.ca/accounts/login/

Concepts connected to CC accounts When you log into an HPC, you are on a login node Login nodes are the HPC entry point, by which users will launch commands on the scheduler The scheduler is a queuing system in which computation jobs are waiting for available compute nodes Compute nodes are nodes on which the jobs get executed Resources on login nodes are limited, so jobs should always get launched on the scheduler HPC sysadmins don’t like jobs launched on the login nodes!

Scheduler At Compute Canada HPCs, you launch jobs by submitting commands to the scheduler When you launch the job, you can specify: A number of cores (CPUs) A walltime, the maximum amount of time that this job can take (after which it gets killed) It’s important to set those numbers properly Jobs with less walltime get processed quicker, but get killed if going overtime

Concepts connected to CC accounts The time you will wait in the queue depends on many factors: How busy the HPC is Job length Number of cores (CPUs) needed Remaining allocation Etc. You can control things such as job length and the number of cores when submitting jobs to the scheduler In this workshop, we will make abstraction of the scheduler Software will be executed directly using an interactive node

Software through GenAP Bioinformatics software pre-installed on Compute Canada https://www.genap.ca/ http://www.computationalgenomics.ca/cvmfs-modules/ http://www.computationalgenomics.ca/cvmfs-genomes/

Modules Software is made available in the shape of loadable modules To load the list of CVMFS modules: module use You can get the list of all available software with module avail To load a module: module load You need to load modules when you launch jobs on the scheduler

Ready to see how it works? Let’s look at the ChIP-seq lab!

Module 3 Introduction to WGBS and analysis Guillaume Bourque Epigenomic Data Analysis

Bisulfite treatment Xi and Li, BMC Bioinformatics, 2009

Workflow for analyzing BS-data Processing of bisulfite-sequencing data: Quality control and pre-processing Bisulfite sequence alignment Quantification of absolute DNA methylation Data visualization and statistical analysis Visual inspection in a genome browser of selected regions Visualization of global distribution of methylation values Clustering of samples based on similarity Downstream analysis Identification of Differentially Methylated Regions (DMRs) Global analysis of DMRs

Quality metrics Read quality Presence of adapter sequencers Duplicate rates Conversion rate

ENCODE WGBS Standards Experiments should have two or more biological replicates; they may have two technical replicates per biological replicate.  The C to T conversion rate should be ≥98% The CpG quantification should have a Pearson correlation of ≥0.8 for sites with ≥10X coverage. Sequencing may be paired- or single-ended, as long as sequencing type is specified and paired sequences are indicated. The experiment must pass routine metadata audits in order to be released. https://www.encodeproject.org/wgbs/

Bisulphite sequence alignment Bock, Nat Rev Genet, 2012

Bismark

Visualizing BS-seq data in IGV https://www.broadinstitute.org/igv

GenPipes – Methyl-seq pipeline http://www.computationalgenomics.ca/genpipes/

Ready to see how it works? Let’s look at the WGBS lab!