Presentation is loading. Please wait.

Presentation is loading. Please wait.

Powering up your graduate experience A survey of computational tools and approaches for biologists https://onish.web.unc.edu/firstyeargrads/ Erin Osborne.

Similar presentations


Presentation on theme: "Powering up your graduate experience A survey of computational tools and approaches for biologists https://onish.web.unc.edu/firstyeargrads/ Erin Osborne."— Presentation transcript:

1 Powering up your graduate experience A survey of computational tools and approaches for biologists https://onish.web.unc.edu/firstyeargrads/ Erin Osborne Nishimura 1

2 Summer Workshop Series 2

3 Powering up your graduate experience What types of computational tools and approaches are available? How can I use some of the resources available at UNC? –Hands-on introduction to Killdevil Benefits of honing your computational prowess How can I learn the next steps? 3

4 WHAT TYPES OF COMPUTATIONAL TOOLS AND APPROACHES ARE AVAILABLE? 4

5 What are we talking about when we talk about computational biology? 5

6 Computational thinking is generalizable outside of the field of genomics Deconstructing –Breaking big projects into little steps Organizing –Stereotypic record keeping Abstracting –Reproducible systems that can be repurposed for multiple tasks Computing –Creating specialized strategies tailor made for each project –Automating tasks 6

7 What hardware do we use to get this done? Brain Physical notes & notebooks Computers –Local computers –Virtual computers –Clusters –Cloud –Software 7

8 Computer Resources Get a physical computer Computers –Local computers –Virtual computers –Clusters –Cloud –Software UNC Student Store Sells Computers http://store.unc.edu/ Buy a computer at a discount. Get lifetime support from UNC. 8

9 Computer Resources Use a virtual comptuer Computers –Local computers –Virtual computers –Clusters –Cloud –Software Research Computing hosts Virtual Computing Lab: https://vcl.unc.edu/index.php?mode=selectauth You can virtually use Microsoft and Linux computers and install tailored software for individual or group use. 9

10 Computer Resources Use a high throughput linux cluster Computers –Local computers –Virtual computers –Clusters –Cloud –Software UNC ITS (Information Technology Services) Manages two main computational clusters: Killdevil (new) and Kure (old) http://its.unc.edu/service/compute-servers-clusters/ 10

11 Computer Resources Get on the cloud Computers –Local computers –Virtual computers –Clusters –Cloud –Software Google https://cloud.google.com/genomics/what-is-google-genomics https://cloud.google.com/genomics/what-is-google-genomics Amazon http://aws.amazon.com/health/life-sciences/ 11

12 Software is available to buy, borrow or obtain for free Bioinfomatics has site licenses available for checkout. Share spendy software. http://bioinformatics.unc.edu/software/ ITS has a lot of software for free or for purchase at a discounted price. (Endnote, discounted; MatLab, free; SecureShell, free) http://software.sites.unc.edu/software/ ITS Virtual Lab Virtually use spendy software for free (latest Adobe, Mathmatica, SPSS, etc) https://virtuallab.unc.edu/Citrix/ITSLabsSFWeb/ Kure and Killdevil come with loadable modules 12

13 A HANDS-ON INTRO TO KILLDEVIL How can I use some of these resources? 13

14 Computer Resources Computers –Local computers –Clusters –Cloud –Software UNC ITS (Information Technology Services) Manages two main computational clusters: Killdevil (new) and Kure (old) http://its.unc.edu/service/compute-servers-clusters/ 14

15 What is Killdevil? 15

16 No seriously, what is Killdevil? A high performance computer cluster Linux operating system 1 login node 774 compute nodes – 48 – 96 GB memory per node. – 12 – 16 CPU’s cores per node. 2 large memory nodes (1 TB) 12 Graphics Processors (GPUs) nodes File systems for storage 16

17 No seriously, what is Killdevil? 17

18 Getting onto Killdevil MAC OS & Linux machines: –Link to killdevil through “Terminal” –Open “Terminal” (in Applications -> Utilities) –Type this: ssh @killdevil.unc.edu -- Add password when prompted PC – Open SSH Secure Shell Client – Click on “Quick Connect” – Hostname = killdevil.unc.edu – Username = – Port Number = 22 – Add password when prompted $ ssh erinosb@killdevil.unc.edu 18

19 Getting onto Killdevil -- demo 19

20 Keeping a computational notebook 20

21 Navigating UNIX: getting oriented Commands Manuals Your first two commands: –whoami –date Getting help with manuals: –man –“spacebar” to scroll –Type “q” to exit $ whoami erinosb $ date Thu Apr 9 13:24:09 EDT 2015 $ man whoami q 21

22 Navigating UNIX – paths and directories pwd – Print Working Directory cd – Change Directory cd ls – List Contents $ pwd $ cd /nas02/home/ $ ls 22

23 The file structure Directories and sub-directories are “folders” Some important directories on Killdevil http://help.unc.edu/help/getting-started-on-killdevil/#P63_6342 http://help.unc.edu/help/getting-started-on-killdevil/#P63_6342 –ms/ –netscr/ –~ Making a new directory mkdir Removing a directory rm –ri $ mkdir 1_courses 23

24 A few key tips and tricks Naming conventions Auto complete with TAB What if I get stuck? –CTRL+C Get me out of here –Q –CTRL+C –CTRL+D –quit –logout –logoff –logout() –bye –quit() –q() –exit What if I need help? – man – -h – --help – GOOGLE it! – Use language name in search 24

25 Exercise 1: move up and down paths A) Type the following command: $ cd Where are you right now? Write down this exact location in your notebook. B) Enter the following command: $ cd / Now where are you? C) List the contents of this directory. Do you see the directory nas02? Change into that directory. D) Use cd and ls to navigate down the file structure back to your original location in Step #1. E) Type this command: $ cd.. Now where are you? What did cd.. do? F) Use cd.. to go back up to / G) Now type this command. Where are you now? $ cd H) Now type: $ cd – Where are you now? I) Navigate back down to your home directory through each directory in your path. This time, try typing after typing the first three letters of each directory name to initiate autocomplete. What happens? 25

26 Making and Removing files Making a file touch Removing a file rm –i -i is an option $ command [-OPTIONS] $ touch testfile1.txt $ rm –i testfile1.txt 26

27 Exercise 2: Creating a directory tree 1)Make a directory structure in your home directory that you can use. If you already have a home directory structure you like, you can skip this and just create the course directory (150413_FirstYearGradCourse) and the subdirectories. $ cd #This will move you into your home directory. Try this command: $ tree What do you see? 2) Use mkdir to create directories and subdirectories within your home directory so that tree will generate a “map” of your files that looks like this:. | -- 1_courses | -- 150413_FirstYearGradCourse |-- exercise03 |-- exercise05 | -- 2_projects 3) Put a file in the ~/1_courses/exercise3/ directory labeled 0_exercise03_README.txt 4) Put a file in the ~/1_courses/exercise5/ directory called 0_exercise05_README.txt 27

28 Getting files onto and off of Kure sftp clients –Cyberduck, Mozilla –SSH/SFTP Set it up, then drag and drop scp scp @killdevil.unc.edu:/path/ $ pwd mylaptop/erin/ $ scp TFs.tar.gz erinosb@killdevil.unc.edu:/nas02/home/e/r/erinosb/1_courses/150413_firstY earGrads/exercise03/ 28

29 Decompressing directories with tar and gzip tar To “extract” a directory: tar –zxvf To “create” a directory: tar –zcvf 29

30 Exercise 3: Transcription Factors A) Download the a zipped directory of transcription factors called “TFs.tar.gz” from https://onish.web.unc.edu/firstyeargrads/ and save it somewhere on your local computer. B) Upload the file to Killdevil. ~/1_courses/150413_FirstYearGrads/exercise03 directory. MAC users – Use scp or an SFTP client. PC users – Use Secure shell or an SFTP client. C) Did you put your zipped directory in the wrong place? See if you can figure out how to use the mv (move) command to put it in the right place. D) Expand your zipped TF directory using tar (see slide 29). E) See what just happened using ls. Now navigate into your directory using cd and see what is in there. What’s inside? F) more, less and head allow you to look inside files. Use the man pages of these commands to figure out what they do and how they work. Peek into one or two of the enclosed csv files. What do they look like? G) Now try this command: $ wc Athaliana_TFs.csv What do you see? What does wc do? H) Chain commands together. There are many ways to string multiple commands together and execute them simultaneously. What do the following commands do: $ wc Athaliana_TFs.csv; wc Celegans_TFs.csv; wc Dmelanogaster_TFs.csv $ head Athaliana_TFs.csv | wc I)Count all the.csv files using: $ wc *.csv J) Start your first program. Make an empty file called “wordCounter.sh” using touch. 30

31 Our first program bash is the linux shell. Writing things in bash is called shell scripting. All bash scripts –Contain the file extension.sh –Start with shebang: #!/bin/bash –Have pseudocode –Are tested every 2 – 3 lines Are executed with: $ bash 31

32 Exercise 4: Writing your first script We don’t necessarily want to count the words in every.csv file we have. Let’s say we want just our three favorite species. Let’s make a wordCounter.sh program to do just this. Use your SFTP client to copy wordCounter.sh onto your local computer or open and edit it interactively. Type in the shebang on the top line. Write an entry in your computational notebook that you started this code. Include today’s date and what you want this code to do. Type in some pseudocode indicating what you want to do. Type in a quick shell script to count word in three of your specific files. Test your code. Does it work? Now try adding a line into your code (anywhere) that says the following: echo –e “Wow, I love learning shell scripting.” What happened? Use echo to personalize your output message. Now open that readme file you made back in Exercise #2. Enter in what your script is and what it does. 32

33 A few follow up comments on Killdevil If your command takes more than 5 seconds to execute, cancel it!!!! – –Learn about LSF (bsub) for computationally intensive jobs. –http://help.unc.edu/help/getting-started-on-killdevil/http://help.unc.edu/help/getting-started-on-killdevil/ If your project is big, get space! –~ directory is only 12 GB large –Light users use /netscr for temporary jobs and /ms for storage. –/netscr will be deleted after 21 days!!! –For anything bigger, get your own dedicated space. –http://help.unc.edu/help/getting-started-on-killdevil/http://help.unc.edu/help/getting-started-on-killdevil/ Off campus? VPN into UNC first before logging on. Modules are available on Killdevil and Kure. 33

34 Load Sharing Facility (LSF) 34

35 THE BENEFITS OF HONING YOUR COMPUTATIONAL PROWESS 35

36 Examples of common computational projects Genomics Microscopy image processing Structural biology processing Melding and merging datasets Re-naming files in batch Grabbing specific information out of files Tailor-made specialized scripts Automation 36

37 Why are computational approaches important to first year graduate students? Publishing Fast & efficient Reproducible Collaborative Employable 37

38 PROPELLING YOURSELF THROUGH THE NEXT STEPS How do I learn? 38

39 Common reasons why interested students do not continue to use computational tools “I took a few linux workshops, but I couldn’t get anything to work on my computer.” “I see the utility of using computational tools but it is impossible to keep track of what I’m doing and I feel disorganized.” “Everything takes so long to learn. It seems really inefficient.” “I took a few linux workshops but then I didn’t use it and I forgot everything I learned.” 39

40 There are many programming languages to learn 40 http://carlcheo.com/startcoding

41 Oft encountered languages and environments Linux – An operating system, an environment, a lifestyle Shell scripting – great for pipelines Python – A general purpose, high-level programming language. Highly readable and writable. Perl – A general purpose, high-level programming language. Great with text files. Javascript – a general purpose, high-level programming language. Specialized for web applications, apps, commonly used in plugins (ImageJ). R – A high level programming language and software environment specialized for statistics and large data management. MATLAB – A computing environment and programming language. Costs money. MYSQL – Database organization Others? 41 LINUX PROGRAMMING LANGUAGES MATH

42 UNC Workshops & Courses IT Workshops –Linux, Killdevil & Kure, Python, R, SciPy, Tarheel Linux –http://reg.abcsignup.com/view/view_month.aspx?as=52&wp=887&aid= UNC-ITShttp://reg.abcsignup.com/view/view_month.aspx?as=52&wp=887&aid= UNC-ITS Basic Bioinformatics Tools Workshops (Hemant Kelkar) –Linux, Killdevil, LSF, BLAST, Genomics, RNA-seq, PyMol, ENSEMBL –http://guides.lib.unc.edu/c.php?g=8359&p=43018http://guides.lib.unc.edu/c.php?g=8359&p=43018 –YOUTUBE! RNA-seq Workshop Summer Workshop Bioinformatics and Computational Biology Series –http://www.bcb.unc.edu/training.htm#courseworkhttp://www.bcb.unc.edu/training.htm#coursework Learn R videos at the Odum Institute –http://www.odum.unc.edu/odum/contentSubpage.jsp?nodeid=670http://www.odum.unc.edu/odum/contentSubpage.jsp?nodeid=670 42

43 MOOCs Coursera –Data Science (9 x 4 week modules, starts May 4) https://www.coursera.org/specialization/jhudatascience/1?ut m_medium=courseDescripTophttps://www.coursera.org/specialization/jhudatascience/1?ut m_medium=courseDescripTop –Python (10 weeks, starts June 1) https://www.coursera.org/course/pythonlearn –Statistics EdX Codeacademy Udacity Khan Academy Software Carpentry 43

44 Other resources Google O’Reilly e-book library –http://eresources.lib.unc.edu/ ebook/index.php?letter=ALL &expand=plushttp://eresources.lib.unc.edu/ ebook/index.php?letter=ALL &expand=plus Lynda –http://software.sites.unc.edu/l ynda/http://software.sites.unc.edu/l ynda/ 44

45 Travel and Learn Cold Spring Harbor Labs Training courses –http://meetings.cshl.edu/courses.htmlhttp://meetings.cshl.edu/courses.html MSU Michigan State University Summer Workshop –http://ged.msu.edu/angus/tutorials-2013/http://ged.msu.edu/angus/tutorials-2013/ –http://bioinformatics.msu.edu/ngs-summer- course-2015http://bioinformatics.msu.edu/ngs-summer- course-2015 Software Carpentry –http://software- carpentry.org/workshops/index.htmlhttp://software- carpentry.org/workshops/index.html 45

46 A public service announcement on backups “If your data does not exist in triplicate, spanning at least two tectonic plates, it does not exist” -- Greg Wilson 46

47 What do you want to get out of your graduate experience? 47


Download ppt "Powering up your graduate experience A survey of computational tools and approaches for biologists https://onish.web.unc.edu/firstyeargrads/ Erin Osborne."

Similar presentations


Ads by Google