Canadian Bioinformatics Workshops www.bioinformatics.ca.

Slides:



Advertisements
Similar presentations
Cloud Computing Computer Science Innovations, LLC.
Advertisements

Cloud Computing COMP 1631, Winter 2011 Yanggang Chen.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
The Golden Age of Biology DNA -> RNA -> Proteins -> Metabolites Genomics Technologies MECHANISMS OF LIFE Health Care Diagnostics Medicines Animal Products.
Mgt 240 Lecture Exam Review February 1, Homework Three Due Friday 2/4 at 5pm Due Friday 2/4 at 5pm Any questions? Any questions? Posted on course.
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Presented by Amlan B Dey.  Access control is the traditional center of gravity of computer security.  It is where security engineering meets computer.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
Component 4: Introduction to Information and Computer Science Unit 4: Application and System Software Lecture 3 This material was developed by Oregon Health.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
-- Don Preuss NCBI/NLM/NIH
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
FILE MANAGEMENT Computer Basics 1.3. FILE EXTENSIONS.txt.pdf.jpg.bmp.png.zip.wav.mp3.doc.docx.xls.xlsx.ppt.pptx.accdb.
Introduction to Cloud Computing What is cloud? Use of computing resources (hardware and software) that are delivered as a service over a network Why cloud.
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops bioinformatics.ca.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Canadian Bioinformatics Workshops
Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie.
SEMINAR ON.  OVERVIEW -  What is Cloud Computing???  Amazon Elastic Cloud Computing (Amazon EC2)  Amazon EC2 Core Concept  How to use Amazon EC2.
WHAT IS CLOUD COMPUTING? Pierce County Library System.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Course: Cluster, grid and cloud computing systems Course author: Prof
Canadian Bioinformatics Workshops
ICT II Unit 6 Networking.
MASS Java Documentation, Verification, and Testing
Introduction to Bioinformatics and Functional Genomics
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
Microprocessor Systems Design I
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Amazon Web Services Submitted By- Section - B Group - 4
Andrew McCombs March 10th, 2011
JMC CGEMS SUMMER GENOMICS TRAINING WORKSHOPS
Tools and Services Workshop
Tools and Services Workshop Overview of the iPlant Data Store
Methodology Overview 2 basics in user studies Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Notice: some material in this.
Cloud based Open Source Backup/Restore Tool
Bioinformatic analysis using Jetstream, a cloud computing environment
Shared Research Computing Policy Advisory Committee (SRCPAC)
Haiyan Meng and Douglas Thain
SDMX: A brief introduction
Chapter 2: System Structures
Brandon Hixon Jonathan Moore
Information Technology Ms. Abeer Helwa
Different types of Linux installation
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
LO3 – Understand Business IT Systems
MCBIOS 2016 – University of Memphis, TN
Next Generation Sequencing Market. Report Description and Highlights According to Renub Research market research report “Next Generation Sequencing (NGS)
Presentation transcript:

Canadian Bioinformatics Workshops

Cold Spring Harbor Laboratory & New York Genome Center In collaboration with

3Module #: Title of Module

Module 1 bioinformatics.ca

Module 1 bioinformatics.ca You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.ccZero Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;

Module 1 #CBW15

Module 1 bioinformatics.ca I do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention. Disclaimer

Module 1.1 Overview of Workshop

Module 1 bioinformatics.ca Bioinformatics History of bioinformatics.ca Cloud computing Getting on Amazon Web Services Outline

Module 1 bioinformatics.ca What biologist do: Make observations Make hypothesis Test them Challenge them Conclude things Write papers

Module 1 bioinformatics.ca RNA-Seq Protein MS

Module 1 bioinformatics.ca Interaction and Pathway Space

Module 1 bioinformatics.ca Central Dogma RNA protein DNA

Module 1 bioinformatics.ca Central Dogma RNA protein DNA Then you write a paper about it

Some of the things we do when we try and understand the cell … We do experiments Some of these are bioinformatics experiments We all want these to be reproducible We want people to find our data We want people to find our methods … and we want them to be able to rerun our experiments, validate our work, move the science forward.

Module 2 bioinformatics.ca Bioinformatics experiments: 16 BLAST searchSequence Alignment Reagents: Sequence Databases Method: P-PBLASTP N-PBLASTX P-NTBLASTN N-NBLASTN N (P) – N (P)TBLASTX Interpretation: Similarity Hypothesis testing Know your reagents Know your methods Do your controls

Module 1 bioinformatics.ca Think – Pair – Share! What is Bioinformatics? 17Introduction 1.0

Module 1 bioinformatics.ca Bioinformatics is about integrating biological themes together with the help of computer tools and biological databases, and gaining new knowledge about the system in study.

Module 1 bioinformatics.ca 1998

Module 1 bioinformatics.ca 1999 –2007 Bioinformatics Developing the Tools Genomics Proteomics

Module 1 bioinformatics.ca 2008– present

Module 1 bioinformatics.ca Analysis of Metagenomic Data - 3 Bioinformatics of Cancer Genomics - 5 Exploratory Analysis of Biological Data using R - 2 High-throughput Biology: From Sequence to Networks - 7 Informatics and Statistics for Metabolomics - 2 Informatics for RNA-seq Analysis- 2 Informatics on High-Throughput Sequencing Data– 2 Introduction to R – 1 Microarray Expression Analysis - 2 Pathway and Network Analysis of -omic Data – 3

Module 1 bioinformatics.ca

Web: Workshop announcement mailing list:

Soap-Box time! Open Access, Open Data and Open Source are essential for Science. Openness is a responsibility, an obligation, and something that comes with the privilege of doing publicly funded work. Open Access Open Source Open Data Opencourseware

Module 1 bioinformatics.ca If databases get it wrong, the onus is on on the user to let the databases know that it is wrong!

Module 1 bioinformatics.ca If databases get it wrong, the onus is on on the user to let the databases know that it is wrong! any db ……………………………………………..…..

Module 1 bioinformatics.ca Q: Why do we have Bioinformatics? A: Open Data from Genomic and Proteomics Technologies

Module 1.2 Overview of Cloud Computing

Cloud computing … and new software paradigm Data sets are reaching the Petabyte scale. Data (and the security rules that come with it) will be somewhere, and you will move your software to it. Software development paradigm will change: no more reading of files into RAM, processing, and then writing output: you need to think about processing streaming data coming from a sequencing machine somewhere on the net.

Module 1 bioinformatics.ca ,000 10, ,000 1,000, ,000 10, ,000 1,000,000 10,000, ,000,000 1,000,000,000 Disk Capacity vs Sequencing Capacity, Disk Storage (Mbytes/$) DNA Sequencing (bp/$) Hard disk storage (MB/$) Doubling time=14 mo Hard disk storage (MB/$) Doubling time=14 mo Pre-nextgen sequencing (bp/$) Doubling time=19 mo Pre-nextgen sequencing (bp/$) Doubling time=19 mo Nextgen sequencing (bp/$) Doubling time=4 mo0 Nextgen sequencing (bp/$) Doubling time=4 mo0

Module 1 bioinformatics.ca We now have ~ $1000 genome, but now need to think more about the cost of the analysis. The doubling time of the reduction of sequencing in cost is in the “many months” range. The doubling time of storage and network bandwidth is “very small number of years” range. The doubling time of CPU speed is 18 months. The cost of sequencing a base pair will equal the cost of storing a base pair by in the next “very small number” of years. About DNA and computers

Module 1 bioinformatics.ca Too much data and not enough computer infrastructure in most labs –Where do they go? –Write more grants? –Get more hardware? –Look to the sky? What is the general biomedical scientists to do?

Module 1 bioinformatics.ca Typical sequencing company pipeline: Genomic companies already there! ACGTACGTAA GTTCGGATGG CGTAGTCCCT TTTTGGGGTG TAGTGAGGC GCTGATTCGG AGAG All of the hard work done here! All of the hard work done here!

Module 1 bioinformatics.ca Google docs Dropbox Netflix Twitter Most people already there!

Module 1 bioinformatics.ca Amazon Web Services (AWS) Infinite storage (scalable): S3 (simple storage service) Compute per hour: EC2 (elastic cloud computing) Ready when you are High Performance Computing Multiple football fields of HPC throughout the world HPC are expanded at one contained at a time:

Module 1 bioinformatics.ca Not cheap! Getting files to and from there Not the best solution for everybody Standardization PHI: personal health information & security concerns In the USA: Patriot act Some of the challenges with cloud computing:

Module 1 bioinformatics.ca At the CBW: we received a grant from Amazon, so supported by ‘AWS in Education grant award. There are better ways of transferring large files, and now AWS makes it free to upload files. A number of datasets exist on AWS (e.g genome data). Many useful bioinformatics AMI’s (Amazon Machine Images) exist on AWS: e.g. cloudbiolinux & CloudMan (Galaxy) Many flavors of cloud available, not just AWS Some of the advantages with cloud computing:

Module 1 bioinformatics.ca Some tools (data) are on your computer on the web on the cloud. You will become efficient at traversing these various spaces, and finding resources you need, and using what is best for you. There are different ways of using the cloud: 1.Command line (like your own very powerful Unix box) 2.With a web-browser (e.g. Galaxy): not in this workshop In this workshop:

Module 1 bioinformatics.ca This is what a 5MB hard drive looked like in 1956! What will it be in 2056? “Big Data” is a relative term!

MinION from Oxford Nanopore

Module 1 bioinformatics.ca Loaded data files to an AWS We brought up an Ubuntu (Linux) instance, and loaded a whole bunch of software for NGS analysis. We then cloned this, and made separate instances for everybody in the class. We’ve simplified the security: you basically all have the same login and and file access, and opened ports. In your own world you would be more secure. Things we have set up:

Module 1 bioinformatics.ca For this workshop: all on Wiki! Login: FirstnameLastname Password: guest

Module 1 bioinformatics.ca

Module 1 bioinformatics.ca

Module 1 bioinformatics.ca

Module 1 bioinformatics.ca CBWNY.pem On Mac: Control+

Module 1 bioinformatics.ca CBWNY.pem

Module 1 bioinformatics.ca ls -l (long listing) drwx francis staff May 21:25../ 1 francis staff May 21:31 CBWNY.pem rwx : owner rwx : group rwx: world r read (4) w write (2) x execute (1) Which ever way you add these 3 numbers, you know which integers were used (6 is always 4+2, 5 is 4+1, 4 is by itself, 0 is none of them etc …) So, when you have: chmod 600 It is “rw” for the the file owner only

Module 1 bioinformatics.ca Logging in to AWS

Module 1 bioinformatics.ca Windows

Module 1 bioinformatics.ca 1

Module 1 bioinformatics.ca 2 3

Module 1 bioinformatics.ca 4 5

Module 1 bioinformatics.ca

Module 1 bioinformatics.ca Your laptop is ready for the workshop If it is not, you know where to get the information you need You know how to use the wiki for this workshop You know where all of the lectures are You have read all of the pre-lecture material If not, you know where the papers are, and you are a speed reader You know how to login to AWS So, at this point:

Module 1 bioinformatics.ca We are on a Coffee Break & Networking Session