Workshop on Microbiome and Health

Slides:

Advertisements

Similar presentations

Facilitator: Richard Bruskiewich

Advertisements

Exploring the UNIX File System and File Security

Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.

The Internet. Telnet Telnet means using your computer as a terminal. All commands you type are sent to the host computer you are connected to and executed.

V Avon High School Tech Crew Agenda Old Business –Delete Files New Business –Week 10 Topics: Coming up: –Yearbook Picture: Feb 7 12:20PM.

© Crown copyright Met Office An Introduction to Linux PRECIS Workshop, University of Reading, 23rd – 27th April 2012.

V Avon High School Tech Crew Agenda Old Business –Delete Files New Business –Week 9 Topics: Coming up: –Yearbook Picture: Feb 7 12:20PM.

CPS120: Introduction to Computer Science Operating Systems Nell Dale John Lewis.

Cobian Backup 7 Tutorial Welcome to Cobian Backup 7. This tutorial will show you how to install the program, configure it and create your first scheduled.

Hosted Virtualization Lab Last Update Copyright Kenneth M. Chipps Ph.D.

UNIX Commands. Why UNIX Commands Are Noninteractive Command may take input from the output of another command (filters). May be scheduled to run at specific.

BIF713 Basic Unix/Linux Commands Getting Help with Commands.

Setting Up your Hosting Account and Installing WordPress and Omeka CCC America Advanced Omeka Training.

Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

Using Local Tools: BLAST

Bioinformatics for biologists

WMarket For Adminstrators Install with Docker or the Automatic Script.

Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”

Linux A practical introduction. 1)Background and Getting Started Linux is an operating system with multiple providers Red Hat/CentOS (our version) Ubuntu.

The Unix File sytem. Introduction Tree structure …

July LJM Introduction to Bioinformatics Lisa Mullan, HGMP-RC.

MATLAB The name of MATLAB stands for matrix laboratory. Starting a MATLAB Session -On Microsoft® Windows® platforms, start the MATLAB program by double-clicking.

Introduction to Unix for FreeSurfer Users

Lecturer: Dalia Mirghani

GRID COMPUTING.

UNIX To do work for the class, you will be using the Unix operating system. Once connected to the system, you will be presented with a login screen. Once.

Data Virtualization Demoette… ODBC Clients

CS1010: Intro Workshop.

Getting started with CentOS Linux

Using Local Tools: BLAST

Navigating the Filing System

How to change the LOGO on PecStarWeb V3.6

WORKSHOP 19 HATCHBACK III

Andy Wang Object Oriented Programming in C++ COP 3330

Version Control overview

Topics Introduction to Repetition Structures

COMP 170 – Introduction to Object Oriented Programming

Useful Linux Commands.

GE3M25: Data Analysis, Class 4

Deploying and Configuring SSIS Packages

Assignment Preliminaries

Ubuntu Working in Terminal

CSE 374 Programming Concepts & Tools

Introduction to Computers

Data File Import / Export

Bomgar Remote support software

Intro to UNIX System and Homework 1

INSTALLING AND SETTING UP APACHE2 IN A LINUX ENVIRONMENT

LING 408/508: Computational Techniques for Linguists

Exploring the UNIX File System and File Security

CSE 390a Lecture 1 introduction to Linux/Unix environment

Getting Started: Amazon AWS Account Creation

Microsoft Official Academic Course, Access 2016

CSE 390a Lecture 1 introduction to Linux/Unix environment

introduction to Linux/Unix environment

UNIFI: Overview Ken Eglinton.

Getting started with CentOS Linux

Andy Wang Object Oriented Programming in C++ COP 3330

Microsoft PowerPoint 2007 – Unit 2

Manipulating and Sharing Data in a Database

Particle Insight Getting Started – Start-up & New Run….

Particle Insight Getting Started – Start-up & New Run….

Using Local Tools: BLAST

Using Local Tools: BLAST

Module 6 Working with Files and Directories

University of Warith AL-Anbiya’a

introduction to Linux/Unix environment

Lab 2: Terminal Basics.

1.3 Given a scenario, apply appropriate Microsoft command line tools

Presentation transcript:

Workshop on Microbiome and Health Hands on: Metagenomics data types, statistics and quality control Esteban Pérez Wohlfeil & Oswaldo Trelles {estebanpw, ortrelles}@uma.es Computer Architecture Department, University of Malaga, Spain Faculte des Sciences; Universite Sidi Mohamed Ben Abdellah 2017

Global Agenda Contents and time distribution AGENDA (1h 00m) Getting to know our Virtual Machine Interacting and exploring the metagenomic samples Quality control step using QTrim Running a sequence comparison with a reference database using BLAST

Getting to know the Virtual Machine The provided VM has Ubuntu 16.04 and several software already installed to facilitate the hands-on. The following software is incorporated already: BLAST suite: BLASTn, BLASTx, BLASTp, … MEGAN Qtrim Trimmomatic METAGECKO EMBOSS toolkit Rstudio with R 3.3.3 Several scripts: FastaQ to Fasta converter, spreadsheets, plotting tools, etc.

Getting to know the Virtual Machine Log into the “metagenomics-pipeline” user with the password: student

Getting to know the Virtual Machine Examples that we will use during the hands-on are located in /home/student/Documents/Example

Getting to know the Virtual Machine These examples include: Folder 454: Lean_TS1.fastq Obese_TS19.fastq Folder calc: Spreadsheet to calculate differential abundance Folder database: A reference database containing several genomes commonly found in gastrointestinal human system Folder results: Empty folder to store processed files

Exploratory analysis

? Exploratory Analysis FastQ Fasta A metagenome can be seen as a long signal (often incomplete and noisy) that requires processing in order to detect anything significant Although not mandatory, it is very recommended that we take a look into our samples always before starting a processing pipeline FastQ Fasta ?

Exploratory Analysis In your Virtual Machine, start by opening a terminal clicking on the black command prompt in the left tab

Exploratory Analysis We can execute commands on the terminal just as if we were double clicking on programs. Lets first open up our metagenomes using the less command. Do as follows: This will open up the lean_TS1.fastq metagenome. Does it look ok? To navigate through the metagenome use the arrow keys. To exit, just press q

Exploratory Analysis (Skip this if you already know it) The terminal is a powerful tool to manage files. There are a few commands that always come in handy: Command Description cp <file to copy> <destination> Copies a file to another place mv <file to move> <location to move> Moves a file to another location rm <file to delete> Deletes a file less <file to read> Reads a text file in the terminal ls Displays the contents of the folder pwd Shows the current working directory cd <folder to enter> Enters a folder. Use cd ../ to go back one level

Exploratory Analysis Now we will check the distribution of lengths of the reads to see that there are no outliers. First convert from fastQ to fasta: And then run the script exploratory.sh with the new fasta file as argument: This will generate a .png image with a histogram of the distribution of length of reads.

Exploratory Analysis Are there any outliers? Does it make sense taking into account the kind of sequencer it comes from? Will it be different after the quality control step? Before QC

Exploratory Analysis We can also check the average length, the number of reads and the maximum length by opening the file that was generated automatically:

Quality Control Step Quality control

Quality Control Step Now we will perform the Quality Control step to trim and filter impurities in the samples. This goes from adapters to errors that have been included in the sequencing process. Lets filter and trim both samples: lean_TS1.fastq and obese_TS19.fastq. To do so, execute the following commands into the terminal: python ~/Qtrim/QTrim_v1_1/QTrim_v1_1.py -m 26 -fastq $DATA/454/lean_TS1.fastq -o $DATA/454/lean_TS1.trimmed.fastq And also: python ~/Qtrim/QTrim_v1_1/QTrim_v1_1.py -m 26 -fastq $DATA/454/obese_TS19.fastq -o $DATA/454/obese_TS19.trimmed.fastq

Quality Control Step Remember to convert both of them to fasta format so we can run them in our pipeline: And also: This will generate the fasta files ready to be processed.

Quality Control Step Now run the exploratory analysis for the new trimmed lean_TS1.trimmed.fasta file and compare previous and new plot Are there any outliers? Does it make sense taking into account the kind of sequencer it comes from? Is it any different after the quality control step? Before QC After QC

Quality Control Step Notes Quality control should be rightly parametrized depending on the preparation libraries used in sequencing and the sequencing instrument A strong biological knowledge is needed Still, a filtering process will usually improve quality The –M parameter can be adjusted for more/less filtering