Canadian Bioinformatics Workshops

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information managementwww.ukoln.ac.uk Brian Kelly & Marieke Guy UKOLN University of Bath Bath, UK UKOLN is supported by:
Advertisements

UKOLN is supported by: This work is licensed under a Attribution- NonCommercial-ShareAlike 2.0 licence This excludes images B3: The Economical.
1 Cloud Computing with Amazon and Oracle Lewis Cunningham TUSC, Sr Datawarehouse Consultant
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Ken Birman. Massive data centers We’ve discussed the emergence of massive data centers associated with web applications and cloud computing Generally.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Virtual Clusters Supporting MapReduce in the Cloud Jonathan Klinginsmith School of Informatics and Computing.
Cloud Computing الحوسبة السحابية. subject History of Cloud Before the cloud Cloud Conditions Definition of Cloud Computing Cloud Anatomy Type of Cloud.
Adam Leidigh Brandon Pyle Bernardo Ruiz Daniel Nakamura Arianna Campos.
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
UKOLN is supported by: Metrics and Social Web Services: Quantitative Evidence for their Use & Impact Welcome Brian Kelly UKOLN University of Bath Bath,
Presented by: Mostafa Magdi. Contents Introduction. Cloud Computing Definition. Cloud Computing Characteristics. Cloud Computing Key features. Cost Virtualization.
| nectar.org.au NECTAR TRAINING Module 1 Overview of cloud computing and NeCTAR services.
Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
Web Technologies Lecture 13 Introduction to cloud computing.
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops bioinformatics.ca.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Canadian Bioinformatics Workshops
Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie.
SEMINAR ON.  OVERVIEW -  What is Cloud Computing???  Amazon Elastic Cloud Computing (Amazon EC2)  Amazon EC2 Core Concept  How to use Amazon EC2.
WHAT IS CLOUD COMPUTING? Pierce County Library System.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. CLOUD.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
CLOUD COMPUTING Presented to Graduate Students Mechanical Engineering Dr. John P. Abraham Professor, Computer Engineering UTPA.
Cloud Computing % of us use some form of cloud coumputing.
Canadian Bioinformatics Workshops
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Lecture 6: Cloud Computing
Network customization
Canadian Bioinformatics Workshops
Cloud Computing for Science
11. Looking Ahead.
Chapter 6: Securing the Cloud
Containers as a Service with Docker to Extend an Open Platform
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
A Cloudy Future Panel at CCGSC ‘08
Give Your Data the Edge A Scalable Data Delivery Platform
CyVerse Tools and Services
Tools and Services Workshop
A session on the adjacent possible
Joslynn Lee – Data Science Educator
Cloud Computing Cloud computing: (the Internet represents the Cloud).
What is Cloud Computing - How cloud computing help your Business?
An Introduction to Cloud Computing
Cloud Computing & ANalytics
ELIXIR: Potential areas for collaboration with e-Infrastructures
Amazon Web Services Submitted By- Section - B Group - 4
Andrew McCombs March 10th, 2011
Bioinformatic analysis using Jetstream, a cloud computing environment
AWS. Introduction AWS launched in 2006 from the internal infrastructure that Amazon.com built to handle its online retail operations. AWS was one of the.
Dr. John P. Abraham Professor, Computer Engineering UTPA
Virtualization Techniques
Brandon Hixon Jonathan Moore
Different types of Linux installation
Emerging technologies-
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Java Programming Introduction
Computer Services Business challenge
IST346: Virtualization and Containerization
Azure Container Service
Gus Hunt Chief Technology Officer CIA
Presentation transcript:

Canadian Bioinformatics Workshops www.bioinformatics.ca

Module #: Title of Module 2

This presentation. Provided that: You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at; http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

Module 1 Intro to Cloud Computing and Virtual Machines BF Francis Ouellette Bioinformatics on Big Data: Computing on the Human Genome September 29 – September 30, 2016 http://durtridingurl.blogspot.ca/2011/04/cloud-kingdom.html

Disclaimer I do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention.

E-mail francis@oicr.on.ca @bffo #CBWBD16

http://bioinformatics.ca/

https://bioinformatics https://bioinformatics.ca/workshops/2016/bioinformatics-cancer-genomics-2016

Workshops planned for 2017: http://bioinformatics.ca/workshops Bioinformatics for Cancer Genomics High-throughput Biology: From Sequence to Networks (CSHL) Introduction to R Exploratory Analysis of Biological Data using R Informatics for RNA-sequence Analysis Informatics on High Throughput Sequencing Data Pathway and Network Analysis of -omics Data Informatics and Statistics for Metabolomics Analysis of Metagenomic Data Bioinformatics for Big Data Epigenomic Data Analysis (other workshop? Stay tuned)

http://bioinformatics.ca/workshops/2015

New for CBW: all on GitHub! http://bioinformatics-ca.github.io/

E-mail: course_info@bioinformatics.ca Web: http://bioinformatics.ca Workshop announcement mailing list: http://bioinformatics.ca/mailman/listinfo/announce

Soap-Box! Open Source Open Access Open Data Opencourseware Open Access, Open Data and Open Source are essential for good Science. Openness is a responsibility, an obligation, and something that comes with the privilege of doing publicly funded work. Open Source Open Access Open Data Opencourseware

http://goo.gl/Jc5TK

http://ncbi.nlm.nih.gov/

from the National Centre for Biotechnology Information

from the National Centre for Biotechnology Information

from the National Centre for Biotechnology Information PANIC!

Learning Objectives of Module 1 Participants will get introduced to and understand: Scope of the “Bioinformatics on Big Data: Computing on the Human Genome” workshop Why we need to be computing in the cloud What we should be concerned about when doing so What it means to be working in the cloud What it means to be using a virtual machine

“Big Data” is a relative term! This is what a 5MB hard drive looked like in 1956 This is what a 5 TB (1 million times more) looks like in 2016 http://goo.gl/f1PkV

https://goo.gl/3RhFkH

https://goo.gl/r5TfVA Wikipedia cheat sheet https://goo.gl/3RhFkH

What is driving this data growth? Technology! 2001 (Whitehead) 2016 (Illumina) http://goo.gl/8lEMA https://goo.gl/ATvTT

HiSeq X Sequencing Systems: 18,000 Whole Human Genome per year 1,800 years to sequence everybody in Canada 1.5 month to sequence all genomes from the PCAWG project https://goo.gl/6Xg9bN

Cloud computing … and new software paradigm Data sets are in the Petabyte and soon Exabyte scale. Data (and the security rules that come with it) will be somewhere (not in your own data centre), and you will move your software to it. Software development paradigm will change: no more reading of files into RAM, processing, and then writing output: you need to think about processing streaming data coming from a sequencing machine somewhere on the net.

Disk Capacity vs Sequencing Capacity, 1990-2009 Disk Storage (Mbytes/$) DNA Sequencing (bp/$) 1,000,000 1,000,000,000 Nextgen sequencing (bp/$) Doubling time=4 mo0 100,000,000 100,000 Hard disk storage (MB/$) Doubling time=14 mo 10,000,000 10,000 1,000,000 1,000 100,000 Pre-nextgen sequencing (bp/$) Doubling time=19 mo 10,000 100 1,000 10 100 1 10 1 1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012

About DNA and computers We now have ~ $1000 genome, but now need to think more about the cost of the analysis. The doubling time of the reduction of sequencing in cost is in the “many months” range. The doubling time of storage and network bandwidth is “very small number of years” range. The doubling time of CPU speed is 18 months. The cost of sequencing a base pair will equal the cost of storing a base pair by in the next “very small number” of years.

What is the general biomedical scientists to do? Lots of data Inadequate IT infrastructure in most labs Where do they go? Write more grants? Get more hardware? Look to the sky?

Genomic companies already there! Typical sequencing company pipeline: ACGTACGTAAGTTCGGATGGCGTAGTCCCTTTTTGGGGTGTAGTGAGGCGCTGATTCGGAGAG All of the hard work done here!

Most people already there! Google docs Dropbox Netflix Twitter Oxford Nanopore Illumina

https://goo.gl/El3r96

Amazon Web Services (AWS) Infinite storage (scalable): S3 (simple storage service) Compute per hour: EC2 (elastic cloud computing) Ready when you are High Performance Computing Multiple football fields of HPC throughout the world HPC are expanded at one contained at a time: http://goo.gl/7PVAl

Some of the challenges with cloud computing: Not cheap! (https://aws.amazon.com/ec2/pricing/) Getting files to (free) and from (not free) there Not the best solution for everybody Standardization PHI: personal health information & security concerns It is a US company, so need to deal with the “Patriot act”.

Academic clouds: Compute Canada Vision To make Canada a world leader in the use of advanced computing for research, discovery and innovation. Mission To enable excellence in research and innovation for the benefit of Canada by effectively, efficiently and sustainably deploying a state-of-the-art advanced research computing network supported by world-class expertise. To use this network to support a growing base of excellent researchers, and to serve them as a national voice for advanced research computing. https://www.computecanada.ca/

http://www.cancercollaboratory.org/

Compute Canada infrastructure Usually only available to people from Canada Usable by all in this workshop Cancer Genome Collaboratory is developing an alternative sustainability model: data there, but you pay for compute cycles.

How to interact with the cloud? Think of it as an High Performance Computing system that somebody else is taking care. The AWS touted concept of “elasticity” is also very useful: you use what you need, and then, turn it off when you are done.

Virtual Machine Monitor App OS App OS App OS App OS App OS App OS Application(s) Operating Systems Virtual Machine Monitor Hardware Hardware Traditional Computer Virtual Machine

Virtual Machine vs Docker

https://www.docker.com/ Docker containers wrap a piece of software Use a complete filesystem that contains everything needed to run This guarantees that the software will always run the same, regardless of its environment.

Human Data Personal health information, and things that can identify you are private. That also includes genomic sequences that can identify you. In the research community, society has provide a way for scientists to use this data, but scientists have to agree to some important rules.

In this workshop: You will learn about the ethics and rules allowing one to use human data. You will learn about VMs You will learn about docker You will learn about the Cancer Genome Collaboratory You will learn about PCAWG