Workshop on Microbiome and Health

Slides:



Advertisements
Similar presentations
Facilitator: Richard Bruskiewich
Advertisements

Exploring the UNIX File System and File Security
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
The Internet. Telnet Telnet means using your computer as a terminal. All commands you type are sent to the host computer you are connected to and executed.
V Avon High School Tech Crew Agenda Old Business –Delete Files New Business –Week 10 Topics: Coming up: –Yearbook Picture: Feb 7 12:20PM.
© Crown copyright Met Office An Introduction to Linux PRECIS Workshop, University of Reading, 23rd – 27th April 2012.
V Avon High School Tech Crew Agenda Old Business –Delete Files New Business –Week 9 Topics: Coming up: –Yearbook Picture: Feb 7 12:20PM.
CPS120: Introduction to Computer Science Operating Systems Nell Dale John Lewis.
Cobian Backup 7 Tutorial Welcome to Cobian Backup 7. This tutorial will show you how to install the program, configure it and create your first scheduled.
Hosted Virtualization Lab Last Update Copyright Kenneth M. Chipps Ph.D.
UNIX Commands. Why UNIX Commands Are Noninteractive Command may take input from the output of another command (filters). May be scheduled to run at specific.
BIF713 Basic Unix/Linux Commands Getting Help with Commands.
Setting Up your Hosting Account and Installing WordPress and Omeka CCC America Advanced Omeka Training.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Using Local Tools: BLAST
Bioinformatics for biologists
WMarket For Adminstrators Install with Docker or the Automatic Script.
Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”
Linux A practical introduction. 1)Background and Getting Started Linux is an operating system with multiple providers Red Hat/CentOS (our version) Ubuntu.
The Unix File sytem. Introduction Tree structure …
July LJM Introduction to Bioinformatics Lisa Mullan, HGMP-RC.
MATLAB The name of MATLAB stands for matrix laboratory. Starting a MATLAB Session -On Microsoft® Windows® platforms, start the MATLAB program by double-clicking.
Introduction to Unix for FreeSurfer Users
Lecturer: Dalia Mirghani
GRID COMPUTING.
UNIX To do work for the class, you will be using the Unix operating system. Once connected to the system, you will be presented with a login screen. Once.
Data Virtualization Demoette… ODBC Clients
CS1010: Intro Workshop.
Getting started with CentOS Linux
Using Local Tools: BLAST
Navigating the Filing System
How to change the LOGO on PecStarWeb V3.6
WORKSHOP 19 HATCHBACK III
Andy Wang Object Oriented Programming in C++ COP 3330
Version Control overview
Topics Introduction to Repetition Structures
COMP 170 – Introduction to Object Oriented Programming
Useful Linux Commands.
GE3M25: Data Analysis, Class 4
Deploying and Configuring SSIS Packages
Assignment Preliminaries
Ubuntu Working in Terminal
CSE 374 Programming Concepts & Tools
Introduction to Computers
Data File Import / Export
Bomgar Remote support software
Customization
Intro to UNIX System and Homework 1
INSTALLING AND SETTING UP APACHE2 IN A LINUX ENVIRONMENT
LING 408/508: Computational Techniques for Linguists
Exploring the UNIX File System and File Security
CSE 390a Lecture 1 introduction to Linux/Unix environment
Getting Started: Amazon AWS Account Creation
Microsoft Official Academic Course, Access 2016
CSE 390a Lecture 1 introduction to Linux/Unix environment
introduction to Linux/Unix environment
UNIFI: Overview Ken Eglinton.
Getting started with CentOS Linux
Andy Wang Object Oriented Programming in C++ COP 3330
Microsoft PowerPoint 2007 – Unit 2
Manipulating and Sharing Data in a Database
Particle Insight Getting Started – Start-up & New Run….
Particle Insight Getting Started – Start-up & New Run….
Using Local Tools: BLAST
Using Local Tools: BLAST
Module 6 Working with Files and Directories
University of Warith AL-Anbiya’a
introduction to Linux/Unix environment
Lab 2: Terminal Basics.
1.3 Given a scenario, apply appropriate Microsoft command line tools
Presentation transcript:

Workshop on Microbiome and Health Hands on: Metagenomics data types, statistics and quality control Esteban Pérez Wohlfeil & Oswaldo Trelles {estebanpw, ortrelles}@uma.es Computer Architecture Department, University of Malaga, Spain Faculte des Sciences; Universite Sidi Mohamed Ben Abdellah 2017

Global Agenda Contents and time distribution AGENDA (1h 00m) Getting to know our Virtual Machine Interacting and exploring the metagenomic samples Quality control step using QTrim Running a sequence comparison with a reference database using BLAST

Getting to know the Virtual Machine The provided VM has Ubuntu 16.04 and several software already installed to facilitate the hands-on. The following software is incorporated already: BLAST suite: BLASTn, BLASTx, BLASTp, … MEGAN Qtrim Trimmomatic METAGECKO EMBOSS toolkit Rstudio with R 3.3.3 Several scripts: FastaQ to Fasta converter, spreadsheets, plotting tools, etc.

Getting to know the Virtual Machine Log into the “metagenomics-pipeline” user with the password: student

Getting to know the Virtual Machine Examples that we will use during the hands-on are located in /home/student/Documents/Example

Getting to know the Virtual Machine These examples include: Folder 454: Lean_TS1.fastq Obese_TS19.fastq Folder calc: Spreadsheet to calculate differential abundance Folder database: A reference database containing several genomes commonly found in gastrointestinal human system Folder results: Empty folder to store processed files

Exploratory analysis

? Exploratory Analysis FastQ Fasta A metagenome can be seen as a long signal (often incomplete and noisy) that requires processing in order to detect anything significant Although not mandatory, it is very recommended that we take a look into our samples always before starting a processing pipeline FastQ Fasta ?

Exploratory Analysis In your Virtual Machine, start by opening a terminal clicking on the black command prompt in the left tab

Exploratory Analysis We can execute commands on the terminal just as if we were double clicking on programs. Lets first open up our metagenomes using the less command. Do as follows: This will open up the lean_TS1.fastq metagenome. Does it look ok? To navigate through the metagenome use the arrow keys. To exit, just press q

Exploratory Analysis (Skip this if you already know it) The terminal is a powerful tool to manage files. There are a few commands that always come in handy: Command Description cp <file to copy> <destination> Copies a file to another place mv <file to move> <location to move> Moves a file to another location rm <file to delete> Deletes a file less <file to read> Reads a text file in the terminal ls Displays the contents of the folder pwd Shows the current working directory cd <folder to enter> Enters a folder. Use cd ../ to go back one level

Exploratory Analysis Now we will check the distribution of lengths of the reads to see that there are no outliers. First convert from fastQ to fasta: And then run the script exploratory.sh with the new fasta file as argument: This will generate a .png image with a histogram of the distribution of length of reads.

Exploratory Analysis Are there any outliers? Does it make sense taking into account the kind of sequencer it comes from? Will it be different after the quality control step? Before QC

Exploratory Analysis We can also check the average length, the number of reads and the maximum length by opening the file that was generated automatically:

Quality Control Step Quality control

Quality Control Step Now we will perform the Quality Control step to trim and filter impurities in the samples. This goes from adapters to errors that have been included in the sequencing process. Lets filter and trim both samples: lean_TS1.fastq and obese_TS19.fastq. To do so, execute the following commands into the terminal: python ~/Qtrim/QTrim_v1_1/QTrim_v1_1.py -m 26 -fastq $DATA/454/lean_TS1.fastq -o $DATA/454/lean_TS1.trimmed.fastq And also: python ~/Qtrim/QTrim_v1_1/QTrim_v1_1.py -m 26 -fastq $DATA/454/obese_TS19.fastq -o $DATA/454/obese_TS19.trimmed.fastq

Quality Control Step Remember to convert both of them to fasta format so we can run them in our pipeline: And also: This will generate the fasta files ready to be processed.

Quality Control Step Now run the exploratory analysis for the new trimmed lean_TS1.trimmed.fasta file and compare previous and new plot Are there any outliers? Does it make sense taking into account the kind of sequencer it comes from? Is it any different after the quality control step? Before QC After QC

Quality Control Step Notes Quality control should be rightly parametrized depending on the preparation libraries used in sequencing and the sequencing instrument A strong biological knowledge is needed Still, a filtering process will usually improve quality The –M parameter can be adjusted for more/less filtering