Presentation is loading. Please wait.

Presentation is loading. Please wait.

Canadian Bioinformatics Workshops bioinformatics.ca.

Similar presentations


Presentation on theme: "Canadian Bioinformatics Workshops bioinformatics.ca."— Presentation transcript:

1 Canadian Bioinformatics Workshops bioinformatics.ca

2 2Module #: Title of Module

3 Module 1 bioinformatics.ca You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.ccZero Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at; http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

4 Module 1 bioinformatics.ca E-mail: course_info@bioinformatics.ca Web: http://bioinformatics.ca Workshop announcement mailing list: http://bioinformatics.ca/mailman/listinfo/announce

5 Module 1 Cloud Computing with AWS Zhibin Lu & BF Francis Ouellette High-throughput Sequencing June 10-11, 2015 http://durtridingurl.blogspot.ca/2011/04/cloud-kingdom.html

6 Module 1 bioinformatics.ca Learning Learning Objectives Introduction to cloud computing Use of wiki in this workshop How to log into the cloud Amazon AWS management console

7 Module 1 bioinformatics.ca Cloud computing … and new software paradigm Data sets are reaching the Petabyte scale. Data (and the security rules that come with it) will be somewhere, and you will move your software to it. Software development paradigm will change: no more reading of files into RAM, processing, and then writing output: you need to think about processing streaming data coming from a sequencing machine somewhere on the net.

8 19901992 1994 199619982000200320042006200820102012 0 1 10 100 1,000 10,000 100,000 1,000,000 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 Disk Capacity vs Sequencing Capacity, 1990-2009 Disk Storage (Mbytes/$) DNA Sequencing (bp/$) Hard disk storage (MB/$) Doubling time=14 mo Hard disk storage (MB/$) Doubling time=14 mo Pre-nextgen sequencing (bp/$) Doubling time=19 mo Pre-nextgen sequencing (bp/$) Doubling time=19 mo Nextgen sequencing (bp/$) Doubling time=4 mo0 Nextgen sequencing (bp/$) Doubling time=4 mo0

9 Module 1 bioinformatics.ca We now have ~ $1000 genome, but now need to think more about the cost of the analysis. The doubling time of the reduction of sequencing in cost is in the “many months” range. The doubling time of storage and network bandwidth is “very small number of years” range. The doubling time of CPU speed is 18 months. The cost of sequencing a base pair will equal the cost of storing a base pair by in the next “very small number” of years. About DNA and computers

10 Module 1 bioinformatics.ca What is the general biomedical scientists to do? Lots of data Poor IT infrastructure in many labs Where do they go? Write more grants? Get bigger hardware? Look to the sky?

11 Module 1 bioinformatics.ca Typical sequencing company pipeline: Genomic companies already there! ACGTACGT AAGTTCGG ATGGCGTA GTCCCTTT TTGGGGTG TAGTGAGG CGCTGATT CGGAGAG All of the hard work done here! All of the hard work done here!

12 Module 1 bioinformatics.ca Most people already there! Google docs Dropbox Netflix Twitter

13 Module 1 bioinformatics.ca Amazon Web Services (AWS) Infinite storage (scalable): S3 (simple storage service) Compute per hour: EC2 (elastic cloud computing) Ready when you are High Performance Computing Multiple football fields of HPC throughout the world HPC are expanded at one contained at a time: http://goo.gl/7PVAl

14 Module 1 bioinformatics.ca Some of the challenges with cloud computing: Not cheap! Getting files to and from there Not the best solution for everybody Standardization PHI: personal health information & security concerns In the USA: Patriot act

15 Module 1 bioinformatics.ca Some of the advantages with cloud computing: At the CBW: we received a grant from Amazon, so supported by ‘AWS in Education’ grant award. There are better ways of transferring large files, and now AWS makes it free to upload files. A number of datasets exist on AWS (e.g. 1000 genome data). Many useful bioinformatics AMI’s (Amazon Machine Images) exist on AWS: e.g. cloudbiolinux & CloudMan (Galaxy) – CBW AMI Many flavors of cloud available, not just AWS

16 Module 1 bioinformatics.ca In this workshop: Some tools (data) are on your computer on the web on the cloud. You will become efficient at traversing these various spaces, and finding resources you need, and using what is best for you. There are different ways of using the cloud: 1.Command line (like your own very powerful Unix box) 2.With a web-browser (e.g. Galaxy)

17 Module 1 bioinformatics.ca This is what a 5MB hard drive looked like in 1956! What will it be in 2056? “Big Data” is a relative term! http://goo.gl/f1PkV

18 Module 1 bioinformatics.ca Things we have set up: Loaded data files to an AWS We brought up an Ubuntu (Linux) instance, and loaded a whole bunch of software for NGS analysis. We then cloned this, and made separate instances for everybody in the class. We’ve simplified the security: you basically all have the same login and file access, and opened ports. In your own world you would be more secure.

19 Module 1 bioinformatics.ca SSH (Secure Shell) A encrypted network protocol To connect to remote machine/server Server fingerprint Public key authentication – Public key, Private key

20 Module 1 bioinformatics.ca For this workshop: all on Wiki! http://bioinformatics.ca/workshop_wiki/ Login: FirstnameLastname Password: guest

21 Module 1 bioinformatics.ca

22 Module 1 bioinformatics.ca Logging into cloud

23 Module 1 bioinformatics.ca Mac Windows

24 Module 1 bioinformatics.ca

25 Module 1 bioinformatics.ca Mac/Linux

26 Module 1 bioinformatics.ca Windows

27 Module 1 bioinformatics.ca Mac/Linux On Mac: Control+

28 Module 1 bioinformatics.ca Windows

29 Module 1 bioinformatics.ca Mac/Linux

30 Module 1 bioinformatics.ca Module 1 bioinformatics.ca ls -l (long listing) drwx------+ 67 francis staff 2278 22 May 21:25../ -rw-r--r--@ 1 francis staff 1696 22 May 21:31 CBWNY.pem rwx : owner rwx : group rwx: world r read (4) w write (2) x execute (1) Which ever way you add these 3 numbers, you know which integers were used (6 is always 4+2, 5 is 4+1, 4 is by itself, 0 is none of them etc …) So, when you have: chmod 600 It is “rw” for the the file owner only

31 Module 1 bioinformatics.ca Mac/Linux Windows ssh -i CBWNY.pem

32 Module 1 bioinformatics.ca Mac/Linux Windows ssh -i CBWNY.pem ubuntu

33 Module 1 bioinformatics.ca Mac/Linux Windows ssh -i CBWNY.pem ubuntu@cbw#.dyndns.info

34 Module 1 bioinformatics.ca Mac/Linux Windows From now on, just double-click CBW to login.

35 Module 1 bioinformatics.ca http://cbw#.dyndns.info/

36 Module 1 bioinformatics.ca Amazon AWS Management Console https://aws.amazon.com

37 Module 1 bioinformatics.ca So, at this point: Your laptop is ready for the workshop You know how to load and view files into IGV You know where to get the information you need You know how to use the wiki for this workshop You know where all of the lectures are You have read all of the pre-lecture material If not, you know where the papers are, and you are a speed reader You know how to login to AWS

38 Module 1 bioinformatics.ca We are on a Coffee Break & Networking Session Wish you were here? Register for the Canadian Bioinformatics Workshops: http://bioinformatics.ca


Download ppt "Canadian Bioinformatics Workshops bioinformatics.ca."

Similar presentations


Ads by Google