Short Read Sequencing Analysis Workshop

Slides:



Advertisements
Similar presentations
Week 6: Chapter 6 Agenda Automation of SQL Server tasks using: SQL Server Agent Scheduling Scripting Technologies.
Advertisements

How Clients and Servers Work Together. Objectives Web Server Protocols Examine how server and client software work Use FTP to transfer files Initiate.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
The Internet Useful Definitions and Concepts About the Internet.
Introduction to Web Interface Technology (CSE2030)
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
1 Chapter Overview Introduction to Windows XP Professional Printing Setting Up Network Printers Connecting to Network Printers Configuring Network Printers.
1 Chapter Overview Transferring and Transforming Data Introducing Microsoft Data Transformation Services (DTS) Transferring and Transforming Data with.
Chapter 6: Hostile Code Guide to Computer Network Security.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Linux & Shell Scripting Small Group Lecture 4 How to Learn to Code Workshop group/ Erin.
Module 2: Using Transact-SQL Querying Tools. Overview SQL Query Analyzer Using the Object Browser Tool in SQL Query Analyzer Using Templates in SQL Query.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Configuring a Web Server. Overview Overview of IIS Preparing for an IIS Installation Installing IIS Configuring a Web Site Administering IIS Troubleshooting.
 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.
Lecture 7 Interaction. Topics Implementing data flows An internet solution Transactions in MySQL 4-tier systems – business rule/presentation separation.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
FTP Server and FTP Commands By Nanda Ganesan, Ph.D. © Nanda Ganesan, All Rights Reserved.
Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
Linux & Shell Scripting Small Group Lecture 3 How to Learn to Code Workshop group/ Erin.
Application Layer Khondaker Abdullah-Al-Mamun Lecturer, CSE Instructor, CNAP AUST.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
FTP Short for File Transfer Protocol, the protocol for exchanging files over the Internet.protocolfilesInternet works in the same way as HTTP for transferring.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
FTP COMMANDS OBJECTIVES. General overview. Introduction to FTP server. Types of FTP users. FTP commands examples. FTP commands in action (example of use).
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
Advanced Computing Facility Introduction
Interacting with the cluster ssh, sftp, & slurm batch scripts
Outline Introduction/Questions
Hands on training session for core skills
GRID COMPUTING.
Welcome to Indiana University Clusters
Distributed Control and Measurement via the Internet
Open OnDemand: Open Source General Purpose HPC Portal
Assumptions What are the prerequisites? … The hands on portion of the workshop will be on the command-line. If you are not familiar with the command.
CS 330 Class 7 Comments on Exam Programming plan for today:
Justin Randall SQLintersection Session: Friday, 10:00am-11:15pm Automating SQL Server Administration Using SQLCMD Justin Randall.
Welcome to Indiana University Clusters
CISC103 Web Development Basics: Web site:
Short Read Sequencing Analysis Workshop
Lawson System Foundation 9.0
ASU Saguaro 09/16/2016 Jung Hyun Kim.
Joker: Getting the most out of the slurm scheduler
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Architecture & System Overview
CommLab PC Cluster (Ubuntu OS version)
Introduction to Programming the WWW I
Bomgar Remote support software
File Transfer Olivia Irving and Cameron Foss
Internet Protocol Mr. Paulk.
Tutorial (4): HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
CISC103 Web Development Basics: Web site:
College of Engineering
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
High Performance Computing in Bioinformatics
William Stallings Data and Computer Communications
Windows Server Administration Fundamentals
Quick Tutorial on MPICH for NIC-Cluster
Short Read Sequencing Analysis Workshop

Maxwell Compute Cluster
Presentation transcript:

Short Read Sequencing Analysis Workshop Day 3 – Using Compute Resources at BioFrontiers Institute In-class Slides – Jonathan DeMasi

Outline For Today Questions about videos???? Brief review of compute cluster Brief review of job scripts and SBATCH headers Modules (Lmod) Transferring or Downloading public data Practical example using rsync Practical example with curl SRAtools Practice!

Learning Objectives Understand why and when we use compute clusters Know how to rsync data from Tortuga to your computer (and the other way around!) Download things from the Internet using curl Submit a job to the scheduler and interpret the output Diagnose job failures

Compute Cluster Resource Allocation Head Node (limited resources) Compute Cluster (Shell and Terminal) Users Compute Nodes Compute resources are group and designated to specific compute nodes

Compute Cluster Resource Allocation Head Node (limited resources) Head node: very limited resources *Use with caution! Users (Shell and Terminal) Compute nodes: extensive compute resources execute all computational tasks in job scripts Jobs scripts are a list of one or more commands or programs to be executed Requires a system to manage compute node resources Compute Nodes

Compute Cluster Resource Management Head Node (limited resources) Slurm Users Manages resources to execute jobs Executs jobs as resources become available (Shell and Terminal) Compute Nodes 6

Why Use Compute Clusters Compute clusters are designed for compute-intensive processes combining the power of multiple servers, clustered systems can tackle large and complex workloads Because workload can be shared we can consolidate and optimize resource utilization Higher availability The ability to provide end users with access to a service for a high percentage of time while reducing unscheduled outages Higher reliability Reduced frequency of system failure Improved scalability Easily add additional nodes/resources to the cluster with minimal disruption to remaining nodes

Running Jobs On The Compute Cluster Head Node (limited resources) Batch script Batch scripts are text files written in bash (or other shell scripting languages) SBATCH Headers specify settings for running our job script Commands list of command(s) to execute Compute Nodes

SBATCH headers #SBATCH -p <queue>  specify job queue to run the job #SBATCH --time=00:00:00  hrs:minutes:seconds #SBATCH --mem=<memory> #SBATCH –-ntasks=  specify how many cores/processors are needed for the job #SBATCH –-job-name=<jobname>  give the job a name #SBATCH –-mail-type=ALL #SBATCH –-mail-user=<you@email.com> #SBATCH –-error=<path/file.err>  specify name and location of error file #SBATCH –-output=<path/file.log>  specify name and location of log file

Understanding Modules Environment variables modify the way we interact with the cluster or help us to “find” things Common environment variables are PATH, PYTHON_PATH, LD_LIBRARY_PATH Modules allow you to easily and dynamically add to and change environment variables All modules are unloaded after you terminate your current session

Module Commands $ module avail List all available modules $ module load <module> Load a specific module $ module list List the modules currently loaded $ module unload <module> Unload a specific module $ module purge Remove all loaded modules

What Causes A Job To Fail Numerous sources of error that can prevent successful job completion Incorrect command-line setup; misuse of parameters Incorrect file formats Incorrect path designations Issues with version compatibility Incorrectly setup job script/SBATCH headers

How To Know When A Job Fails Expected output files are not present or are empty Job finishes much quicker than expected Error messages in your error file

How To Troubleshoot Failed Jobs SBATCH headers #SBATCH --error=<path/file.err> ➢ specify name and location of error file #SBATCH --output=<path/file.log> ➢ specify name and location of log file Error file Captures all STDERR output Log file Will capture STDERR if no location is specified Captures standard out (that is not explicitly captured in your individual commands)

BREAK

Extensive Publically Available Data and Tools Publically available datasets NCBI Gene Expression Omnibus (GEO) NCBI FTP site and SRA database Genbank UCSC Table Browser Ensembl Model organism specific site like SGD Programs/packages SourceForge GitHub

How To Download Data to Server From Internet curl -O – used to download files from the net Usage: $ curl -O <url> ➢ download file at <url> to current directory ftp = internet file transfer program Connect to a remote host (i.e. server) and copy files via file transfer protocol (FTP) Usage: $ ftp <options> <host> Example: $ ftp ftp.ncbi.nih.gov sftp = secure version of ftp command

Let’s practice! In your home directory (/mnt/scratch_nobackup/<IdentiKey>), open a new file with vim Using the day 3 cheat sheet, let’s write a basic sbatch script (just the headers-don’t forget the #!) Assume the tasks you’re running are all only single-core, single thread tasks. (--ntasks=1). Let’s ask for 10gb of RAM. We shouldn’t need this much though. Specify a 10 minute walltime. We want the job to send us mail for any event

Now for the commands... We need to curl -O a file (http://biof- test.colorado.edu/SRR043348_1.fastq.gz) We’re going to run fastqc against this file (even if Mary wouldn’t do this, Jon is going to… :-)) Basic syntax fastqc -t 1 -o /path/to/output/dir <file> The output directory must exist before running fastqc – let’s add a line to create it before the fastqc line Don’t forget to make sure fastqc is in your $PATH environment variable

How do we submit a job? sbatch </path/to/script.sh> To verify that it’s running? qstat -u <your IdentiKey> squeue

The last part... rsync the fastqc output back to your laptop and open the html file.

The End Questions?? Don’t forget the homework. Watch videos for Day 4 Help sessions: 1-3PM in A108 IT Questions? Email bit-help@colorado.edu

Acknowledgements Workshop Coordinators: Mary Allen, Meghan Sloan, Amber Scott Funding: BioFrontiers Institute and Colorado Office of Economic Development and International Trade Additional Acknowledgments Compute Resources: BioFrontiers IT Staff Robin Dowell and Dowell Lab ©2016