Powering up your graduate experience A survey of computational tools and approaches for biologists https://onish.web.unc.edu/firstyeargrads/ Erin Osborne.

Slides:



Advertisements
Similar presentations
Lab III – Linux at UMBC.
Advertisements

Linux, it's not Windows A short introduction to the sub-department's computer systems Gareth Thomas.
Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.
Jump to first page Unix Commands Monica Stoica Jump to first page Introduction to Unix n Unix was born in 1969 at Bell Laboratories, a research subdivision.
CSCI 1411 FUNDAMENTALS OF COMPUTING LAB Lab Introduction 1 Shane Transue MSCS.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
George Blank University Lecturer. Creating A Web Site at NJIT Professor Blank.
CS1020: Intro Workshop. Topics CS1020Intro Workshop Login to UNIX operating system 2. …………………………………… 3. …………………………………… 4. …………………………………… 5. ……………………………………
Course Introduction and Getting Started with C 1 USF - COP C for Engineers Summer 2008.
Very Quick & Basic Unix Steven Newhouse Unix is user-friendly. It's just very selective about who its friends are.
Virtual Machine and UNIX. What is a VM? VM stands for Virtual Machine. It is a software emulation of hardware. By using a VM, you can have the same hardware.
Linux & Shell Scripting Small Group Lecture 4 How to Learn to Code Workshop group/ Erin.
“Linux at the Command Line” Don Johnson of BU IS&T.
A crash course in njit’s Afs
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Introduction to Linux Workshop February Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.
Introduction to UNIX/Linux Exercises Dan Stanzione.
MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.
Powering up your graduate experience A survey of computational tools and approaches for biologists Erin Osborne.
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people.
UNIX command line. In this module you will learn: What is the computer shell What is the command line interface (or Terminal) What is the filesystem tree.
Chapter 9 Part II Linux Command Line Access to Linux Authenticated login using a Linux account is required to access a Linux system. The Linux prompt will.
Welcome to Linux & Shell Scripting Small Group How to learn how to Code Workshop small-group/
Presented by Chad Kafka This Month’s Topic: Wikispaces Advanced Today’s session is an introduction to what a WIKI is and how they can be used in education.
Linux environment ● Graphical interface – X-window + window manager ● Text interface – terminal + shell.
CIS 90 - Lesson 15 Lesson Module Status Slides – draft Properties - done Flash cards – 1 st Minute quiz – NA Web calendar summary – done Web book pages.
Essential Unix at ACEnet Joey Bernard, Computational Research Consultant.
Basic unix commands that everyone should know (Even if you have a mac) Slightly more advanced:
Computer Programming for Biologists Oct 30 th – Dec 11 th, 2014 Karsten Hokamp  Fill out.
1 Working with MS SQL Server Textbook Chapter 14.
Creating and Publishing Your own web site PC Version SEAS 001 Professor Ahmadi.
Carnegie Mellon Linux Boot Camp Jenna MacCarley, Peter Pearson, Shashank Goyal 9/19/2015.
PROGRAMMING PROJECT POLICIES AND UNIX INTRO Sal LaMarca CSCI 1302, Fall 2009.
Session 2 Wharton Summer Tech Camp Basic Unix. Agenda Cover basic UNIX commands and useful functions.
Agenda Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Review next lab assignments Break Out Problems.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Usage of Workstation Lecturer: Yu-Hao( 陳郁豪 ) Date:
Linux & Shell Scripting Small Group Lecture 3 How to Learn to Code Workshop group/ Erin.
Lesson 2-Touring Essential Programs. Overview Development of UNIX and Linux. Commands to execute utilities. Communicating instructions to the shell. Navigating.
BIF713 Basic Unix/Linux Commands Getting Help with Commands.
Unix and Samba By: IC Labs (Raj Kidambi). What is Unix?  Unix stands for UNiplexed Information and Computing System. (It was originally spelled "Unics.")
Intro to Programming Environment 1. Today You Will Learn how to connect to a machine remotely with “nomachine NX client” Learn how to create a new “source.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Introduction to Programming Using C An Introduction to Operating Systems.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
CS 245 – Part 1 Using Operating Systems and Networks for Programmers Jiang Guo Dept. of Computer Science California State University Los Angeles.
Lecture 02 File and File system. Topics Describe the layout of a Linux file system Display and set paths Describe the most important files, including.
1 CS3695 – Network Vulnerability Assessment & Risk Mitigation – Introduction to Unix & Linux.
Introduction to Linux Workshop February 15, 2016.
+ Vieques and Your Computer Dan Malmer & Joey Azofeifa.
CS 120 Extra: The CS1 Server Tarik Booker CS 120.
Unix Lab Fall Shell Scripting ●Through the shell (LXTerminal) you can: ●Run programs. ●Interact with the file system. ●Change settings. ●Send/receive.
Learning Unix/Linux Based on slides from: Eric Bishop.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Introduction to LINUX command line Allegra Via IBBE, National Research Council, Italy Sapienza Università di Roma, Italy EMBO Practical Course on Computational.
Advanced Computing Facility Introduction
Linux & Joker – An Introduction
GRID COMPUTING.
CS1010: Intro Workshop.
Navigating the Filing System
Linux Commands Help HANDS ON TRAINING Author: Muhammad Laique
Andy Wang Object Oriented Programming in C++ COP 3330
Part 3 – Remote Connection, File Transfer, Remote Environments
Assignment Preliminaries
College of Engineering
File Management File Explorer © EIT, Author Gay Robertson, 2017.
Andy Wang Object Oriented Programming in C++ COP 3330
UNIX/LINUX Commands Using BASH Copyright © 2017 – Curt Hill.
Yung-Hsiang Lu Purdue University
SLIS Technology Orientation
Presentation transcript:

Powering up your graduate experience A survey of computational tools and approaches for biologists Erin Osborne Nishimura 1

Summer Workshop Series 2

Powering up your graduate experience What types of computational tools and approaches are available? How can I use some of the resources available at UNC? –Hands-on introduction to Killdevil Benefits of honing your computational prowess How can I learn the next steps? 3

WHAT TYPES OF COMPUTATIONAL TOOLS AND APPROACHES ARE AVAILABLE? 4

What are we talking about when we talk about computational biology? 5

Computational thinking is generalizable outside of the field of genomics Deconstructing –Breaking big projects into little steps Organizing –Stereotypic record keeping Abstracting –Reproducible systems that can be repurposed for multiple tasks Computing –Creating specialized strategies tailor made for each project –Automating tasks 6

What hardware do we use to get this done? Brain Physical notes & notebooks Computers –Local computers –Virtual computers –Clusters –Cloud –Software 7

Computer Resources Get a physical computer Computers –Local computers –Virtual computers –Clusters –Cloud –Software UNC Student Store Sells Computers Buy a computer at a discount. Get lifetime support from UNC. 8

Computer Resources Use a virtual comptuer Computers –Local computers –Virtual computers –Clusters –Cloud –Software Research Computing hosts Virtual Computing Lab: You can virtually use Microsoft and Linux computers and install tailored software for individual or group use. 9

Computer Resources Use a high throughput linux cluster Computers –Local computers –Virtual computers –Clusters –Cloud –Software UNC ITS (Information Technology Services) Manages two main computational clusters: Killdevil (new) and Kure (old) 10

Computer Resources Get on the cloud Computers –Local computers –Virtual computers –Clusters –Cloud –Software Google Amazon 11

Software is available to buy, borrow or obtain for free Bioinfomatics has site licenses available for checkout. Share spendy software. ITS has a lot of software for free or for purchase at a discounted price. (Endnote, discounted; MatLab, free; SecureShell, free) ITS Virtual Lab Virtually use spendy software for free (latest Adobe, Mathmatica, SPSS, etc) Kure and Killdevil come with loadable modules 12

A HANDS-ON INTRO TO KILLDEVIL How can I use some of these resources? 13

Computer Resources Computers –Local computers –Clusters –Cloud –Software UNC ITS (Information Technology Services) Manages two main computational clusters: Killdevil (new) and Kure (old) 14

What is Killdevil? 15

No seriously, what is Killdevil? A high performance computer cluster Linux operating system 1 login node 774 compute nodes – 48 – 96 GB memory per node. – 12 – 16 CPU’s cores per node. 2 large memory nodes (1 TB) 12 Graphics Processors (GPUs) nodes File systems for storage 16

No seriously, what is Killdevil? 17

Getting onto Killdevil MAC OS & Linux machines: –Link to killdevil through “Terminal” –Open “Terminal” (in Applications -> Utilities) –Type this: -- Add password when prompted PC – Open SSH Secure Shell Client – Click on “Quick Connect” – Hostname = killdevil.unc.edu – Username = – Port Number = 22 – Add password when prompted $ ssh 18

Getting onto Killdevil -- demo 19

Keeping a computational notebook 20

Navigating UNIX: getting oriented Commands Manuals Your first two commands: –whoami –date Getting help with manuals: –man –“spacebar” to scroll –Type “q” to exit $ whoami erinosb $ date Thu Apr 9 13:24:09 EDT 2015 $ man whoami q 21

Navigating UNIX – paths and directories pwd – Print Working Directory cd – Change Directory cd ls – List Contents $ pwd $ cd /nas02/home/ $ ls 22

The file structure Directories and sub-directories are “folders” Some important directories on Killdevil –ms/ –netscr/ –~ Making a new directory mkdir Removing a directory rm –ri $ mkdir 1_courses 23

A few key tips and tricks Naming conventions Auto complete with TAB What if I get stuck? –CTRL+C Get me out of here –Q –CTRL+C –CTRL+D –quit –logout –logoff –logout() –bye –quit() –q() –exit What if I need help? – man – -h – --help – GOOGLE it! – Use language name in search 24

Exercise 1: move up and down paths A) Type the following command: $ cd Where are you right now? Write down this exact location in your notebook. B) Enter the following command: $ cd / Now where are you? C) List the contents of this directory. Do you see the directory nas02? Change into that directory. D) Use cd and ls to navigate down the file structure back to your original location in Step #1. E) Type this command: $ cd.. Now where are you? What did cd.. do? F) Use cd.. to go back up to / G) Now type this command. Where are you now? $ cd H) Now type: $ cd – Where are you now? I) Navigate back down to your home directory through each directory in your path. This time, try typing after typing the first three letters of each directory name to initiate autocomplete. What happens? 25

Making and Removing files Making a file touch Removing a file rm –i -i is an option $ command [-OPTIONS] $ touch testfile1.txt $ rm –i testfile1.txt 26

Exercise 2: Creating a directory tree 1)Make a directory structure in your home directory that you can use. If you already have a home directory structure you like, you can skip this and just create the course directory (150413_FirstYearGradCourse) and the subdirectories. $ cd #This will move you into your home directory. Try this command: $ tree What do you see? 2) Use mkdir to create directories and subdirectories within your home directory so that tree will generate a “map” of your files that looks like this:. | -- 1_courses | _FirstYearGradCourse |-- exercise03 |-- exercise05 | -- 2_projects 3) Put a file in the ~/1_courses/exercise3/ directory labeled 0_exercise03_README.txt 4) Put a file in the ~/1_courses/exercise5/ directory called 0_exercise05_README.txt 27

Getting files onto and off of Kure sftp clients –Cyberduck, Mozilla –SSH/SFTP Set it up, then drag and drop scp $ pwd mylaptop/erin/ $ scp TFs.tar.gz earGrads/exercise03/ 28

Decompressing directories with tar and gzip tar To “extract” a directory: tar –zxvf To “create” a directory: tar –zcvf 29

Exercise 3: Transcription Factors A) Download the a zipped directory of transcription factors called “TFs.tar.gz” from and save it somewhere on your local computer. B) Upload the file to Killdevil. ~/1_courses/150413_FirstYearGrads/exercise03 directory. MAC users – Use scp or an SFTP client. PC users – Use Secure shell or an SFTP client. C) Did you put your zipped directory in the wrong place? See if you can figure out how to use the mv (move) command to put it in the right place. D) Expand your zipped TF directory using tar (see slide 29). E) See what just happened using ls. Now navigate into your directory using cd and see what is in there. What’s inside? F) more, less and head allow you to look inside files. Use the man pages of these commands to figure out what they do and how they work. Peek into one or two of the enclosed csv files. What do they look like? G) Now try this command: $ wc Athaliana_TFs.csv What do you see? What does wc do? H) Chain commands together. There are many ways to string multiple commands together and execute them simultaneously. What do the following commands do: $ wc Athaliana_TFs.csv; wc Celegans_TFs.csv; wc Dmelanogaster_TFs.csv $ head Athaliana_TFs.csv | wc I)Count all the.csv files using: $ wc *.csv J) Start your first program. Make an empty file called “wordCounter.sh” using touch. 30

Our first program bash is the linux shell. Writing things in bash is called shell scripting. All bash scripts –Contain the file extension.sh –Start with shebang: #!/bin/bash –Have pseudocode –Are tested every 2 – 3 lines Are executed with: $ bash 31

Exercise 4: Writing your first script We don’t necessarily want to count the words in every.csv file we have. Let’s say we want just our three favorite species. Let’s make a wordCounter.sh program to do just this. Use your SFTP client to copy wordCounter.sh onto your local computer or open and edit it interactively. Type in the shebang on the top line. Write an entry in your computational notebook that you started this code. Include today’s date and what you want this code to do. Type in some pseudocode indicating what you want to do. Type in a quick shell script to count word in three of your specific files. Test your code. Does it work? Now try adding a line into your code (anywhere) that says the following: echo –e “Wow, I love learning shell scripting.” What happened? Use echo to personalize your output message. Now open that readme file you made back in Exercise #2. Enter in what your script is and what it does. 32

A few follow up comments on Killdevil If your command takes more than 5 seconds to execute, cancel it!!!! – –Learn about LSF (bsub) for computationally intensive jobs. – If your project is big, get space! –~ directory is only 12 GB large –Light users use /netscr for temporary jobs and /ms for storage. –/netscr will be deleted after 21 days!!! –For anything bigger, get your own dedicated space. – Off campus? VPN into UNC first before logging on. Modules are available on Killdevil and Kure. 33

Load Sharing Facility (LSF) 34

THE BENEFITS OF HONING YOUR COMPUTATIONAL PROWESS 35

Examples of common computational projects Genomics Microscopy image processing Structural biology processing Melding and merging datasets Re-naming files in batch Grabbing specific information out of files Tailor-made specialized scripts Automation 36

Why are computational approaches important to first year graduate students? Publishing Fast & efficient Reproducible Collaborative Employable 37

PROPELLING YOURSELF THROUGH THE NEXT STEPS How do I learn? 38

Common reasons why interested students do not continue to use computational tools “I took a few linux workshops, but I couldn’t get anything to work on my computer.” “I see the utility of using computational tools but it is impossible to keep track of what I’m doing and I feel disorganized.” “Everything takes so long to learn. It seems really inefficient.” “I took a few linux workshops but then I didn’t use it and I forgot everything I learned.” 39

There are many programming languages to learn 40

Oft encountered languages and environments Linux – An operating system, an environment, a lifestyle Shell scripting – great for pipelines Python – A general purpose, high-level programming language. Highly readable and writable. Perl – A general purpose, high-level programming language. Great with text files. Javascript – a general purpose, high-level programming language. Specialized for web applications, apps, commonly used in plugins (ImageJ). R – A high level programming language and software environment specialized for statistics and large data management. MATLAB – A computing environment and programming language. Costs money. MYSQL – Database organization Others? 41 LINUX PROGRAMMING LANGUAGES MATH

UNC Workshops & Courses IT Workshops –Linux, Killdevil & Kure, Python, R, SciPy, Tarheel Linux – UNC-ITShttp://reg.abcsignup.com/view/view_month.aspx?as=52&wp=887&aid= UNC-ITS Basic Bioinformatics Tools Workshops (Hemant Kelkar) –Linux, Killdevil, LSF, BLAST, Genomics, RNA-seq, PyMol, ENSEMBL – –YOUTUBE! RNA-seq Workshop Summer Workshop Bioinformatics and Computational Biology Series – Learn R videos at the Odum Institute – 42

MOOCs Coursera –Data Science (9 x 4 week modules, starts May 4) m_medium=courseDescripTophttps:// m_medium=courseDescripTop –Python (10 weeks, starts June 1) –Statistics EdX Codeacademy Udacity Khan Academy Software Carpentry 43

Other resources Google O’Reilly e-book library – ebook/index.php?letter=ALL &expand=plushttp://eresources.lib.unc.edu/ ebook/index.php?letter=ALL &expand=plus Lynda – ynda/ ynda/ 44

Travel and Learn Cold Spring Harbor Labs Training courses – MSU Michigan State University Summer Workshop – – course-2015http://bioinformatics.msu.edu/ngs-summer- course-2015 Software Carpentry – carpentry.org/workshops/index.htmlhttp://software- carpentry.org/workshops/index.html 45

A public service announcement on backups “If your data does not exist in triplicate, spanning at least two tectonic plates, it does not exist” -- Greg Wilson 46

What do you want to get out of your graduate experience? 47