RNA-seq analysis case study Anne de Jong 2015

Slides:



Advertisements
Similar presentations
EGAN Tutorial: Loading Network Data October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center
Advertisements

Agilent’s MX QPCR Software Tutorial Field Application Scientist
Exercise 1: Importing Illumina data  Using the Import tool File / Import folder. Select the folder IlluminaTeratospermiaHuman6v1_BS1 In the Import files.
The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
BISC 367 Plant Biology Fall 2006 BISC Plant Physiology Lab Spring 2009 Tutorial March : Due date for the 2 nd report is March 13 Please see.
Managing Grades with Excel Viewing Help To view Help 1.Open Excel on your computer. 2.In the top right hand corner of the Excel Screen type in the.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Scaffold Download free viewer:
Supplementary Material Supplementary Tables Supplementary Table 1. Sequencing statistics for ChIP-seq samples. Supplementary Table 2. Pearson correlation.
Before we start: Align sequence reads to the reference genome
NaviCell Web Service Data visualization tutorial.
NGS Analysis Using Galaxy
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
1 iSee Player Tutorial Using the Forest Biomass Accumulation Model as an Example ( Tutorial Developed by: (
Differential Analysis & FDR Correction
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Website Development with Dreamweaver
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
Networks and Interactions Boo Virk v1.0.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
RNAseq analyses -- methods
Lecture 11. Microarray and RNA-seq II
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Using geWorkbench: Hierarchical & SOM Clustering Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of.
CellFateScout step- by-step tutorial for a case study Version 0.94.
The iPlant Collaborative
Analysing Data with Excel Viewing Help To view Help 1.On the Start menu, point to Programs, and then click Microsoft Excel. 2.On the Help menu,
PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.
Guide to the SIPAGENE DataBase. Access to SIPAGENE goto: 2 enter your user name 2 enter your user name 3 enter your password 3.
XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.
SP5 - Neuroinformatics SynapsesSA Tutorial Computational Intelligence Group Technical University of Madrid.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT.
Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
No reference available
This tutorial will describe how to navigate the section of Gramene that allows you to view various types of maps (e.g., genetic, physical, or sequence-based)
First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
1 Berger Jean-Baptiste
Canadian Bioinformatics Workshops
Introductory RNA-seq Transcriptome Profiling
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
GSEA-Pro Tutorial Anne de Jong University of Groningen.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Introductory RNA-Seq Transcriptome Profiling
Additional file 8: Estimation of biological variations
Figure 1. Effect of acute TNF treatment on transcription in human SGBS adipocytes as assessed by RNA-seq and RNAPII ChIP-seq. Following 10 days in vitro.
HIS-24 regulates expression of infection-inducible genes.
Volume 5, Issue 1, Pages e4 (July 2017)
Assessing changes in data – Part 2, Differential Expression with DESeq2
The Omics Dashboard.
Transcriptomics Data Visualization Using Partek Flow Software
Volume 10, Issue 2, Pages (August 2011)
Volume 5, Issue 1, Pages e4 (July 2017)
GSEA-Pro Tutorial Gene Set Enrichment Analysis for Prokaryotes
Introduction to RNA-Seq & Transcriptome Analysis
Maria S. Robles, Sean J. Humphrey, Matthias Mann  Cell Metabolism 
Cancer Cell Line Encyclopedia
Presentation transcript:

RNA-seq analysis case study Anne de Jong 2015 for Prokaryotes RNA-seq Anne de Jong 2015

Measuring gene expression What can we do with RNA-seq analysis Transcription Start points (TTS) Transcription Termination (TT) Operon structures (Transcription Active Regions (TARs)) tRNAs rRNAs Discover ncRNA’s Gene Expression Here we focus on the last item: “Gene Expression”

Measuring gene expression What to do Grow cells and freeze (liquid Nitrogen) them at point X Isolate total RNA Optional rRNA depletion Library Prep (cDNA) Sequencing (Illumina, IonProton) Filter, trim, map the sequence reads to a reference genome Gene expression calling All steps above can be standardized, just follow the protocols

Gene expression values Starting point: Excel file with gene expression values ( RPKM/FPKM/TPM/Counts ) Rows are the features (genes) Columns are the experiments (samples) Tutorial Step1: Goto http://genome2d.molgenrug.nl In menu RNA-seq analysis; download the “example data set” Open the file RPKM.txt in Excel What do the numbers represent?

The factors The factors describe the experiment What are the replicates What is the biological meaning Multiple factors possible Factor-1 Factor-2 Tutorial Step 2: In this example we only use Factor-1. Open Factos.txt in Excel What do these Factors mean?

Contrasts The factors describe the data, next step is to ask questions Which genes are differential expressed between WT and one or more mutants? Is there a global effect? Which mutants are highly correlated? To answer these questions the contrasts needs to be defined A_F71Y-WT B_R61K-WT C_R61H-WT null-WT In this example all samples are compared to the WT Factors file Tutorial Step 3: Open the file Contrasts.txt in Excel Make a Contrasts file if you use Factor-2 (type) instead of Factor-1 [see previous slide]

Classes Adding literature data to the analyses One way is to define groups of genes/proteins that have a biological relation Metabolic pathway; KEGG Related protein domains; e.g. ABC transporters Regulons Related processes; e.g. sporulation Any defined group of genes is possible These groups of genes are called Classes Class file Tutorial Step 4: Open the file Classes.txt in Excel Define your own class for at least 20 genes e.g. the best hits found by Brinsbane

Overview Now we have 4 files Gene expression file Contrasts file Factors file Factors file Class file Tutorial Step 4: Open the file Classes.txt in Excel Define two or more classes for at least 10 genes in total

Flow chart of the Analysis RPKMs Factors Contrasts Class Project name RNA-seq Analysis Pipeline (Genome2D webserver or R-script) User input Global Analysis Normalization Library Sizes PCA/MDS Differential Expression Volcano Plots MA Plots Heatmaps Experiment Analysis Correlation Matrix Heatmap of Experiments K-means Clustering Class Analysis Correlation Matrices Mean Signal Plots Heatmaps of Top Hits Signals Class Groups Tables Tab delimited Html formatted RESULTS Downstream Analysis Functional Analysis on the Genome2D webserver TIGR Multi Experiment Viewer Etc..

Performing a RNA-seq analysis The pipeline is available as R-script or as webserver The R-script allows modification of settings and parameters The webserver is parameter free parameters are predefined, will be calculated or estimated on the fly Tutorial Step 6: Open the webserver http://genome2d.molgenrug.nl Goto to RNA-seq analysis and download the example data set Subsequently, upload these four files for analysis Give the project a logical (short) name Press start run and wait 1-2 min for the results

Mining the results The results are divided in 5 sections Global analysis Contrasts analysis Experiment analysis Class analysis Data tables Functional analysis Tutorial Step 7: Global analysis For this RNA-seq experiment we asked for at least 4M (Million) reads per experiment. Did all samples passed this criteria? Which sample duplicates showed the lowest dispersion

Mining the results The results are divided in 5 sections Global analysis Contrasts analysis Experiment analysis Class analysis Data tables Functional analysis Tutorial Step 8: Contrasts analysis Which CodY mutant showed the lowest number of significant changed genes? What is the highest fold change of a gene when the Wild Type was compared to the knock-out Volcano plots are used to visualize Fold change and there cognate p-value. Open a volcano plot and write a good legend for this Figure. On the left side of Heatmaps of TopHits, you see a Dendrogram. What is the meaning of the length of lines in a Dendrogram?

Mining the results The results are divided in 5 sections Global analysis Contrasts analysis Experiment analysis Class analysis Data tables Functional analysis Tutorial Step 9: Experiment analysis Correlation matrix of experiments is a visualization method to show the overall Pearson’s correlation between experiments. Write a legend for his Figure and include a description what the shades of blue represent. K-means clustering groups genes having a good correlation over multiple experiments. The Threshold of separation groups is always arbitrary, which k-means groups could optionally be merged to one group?

Mining the results The results are divided in 5 sections Global analysis Contrasts analysis Experiment analysis Class analysis Data tables Functional analysis Tutorial Step 10: Class analysis ‘Correlation matrix of Classes’ gives a quick view on the behavior of Class members (genes) over the various experiments. What do the colors in these matrices mean?

Mining the results The results are divided in 5 sections Global analysis Contrasts analysis Experiment analysis Class analysis Data tables Functional analysis Tutorial Step 11: Data tables The data that is produced and used by the pipeline to draw graphs can be used for further analysis. Such as the popular freeware programs TMEV and Cytoscape The file ‘Edge list for a gene network of Contrasts’ is compatible with Cytoscape but will not be discussed further. Save the file ‘TIGR MEV TopHits log2FC’ for TMEV Download MeV: http://www.tm4.org/mev.html

MeV; Multi experiment Viewer Tutorial Step 11: Using MeV Start MeV and load the file ‘TIGR MEV TopHits log2FC’ as dual channel data (because this is ratio data) Deselect “Load Annotation” Press load and now the data is imported and ready to analyze using MeV Optional: try to do a k-means clustering, here you have to estimate the number of clusters yourself

Functional Analysis Tutorial Step 12: Perform a functional analysis on the TopHits of one or multiple Contrasts Change the ‘Current active genome’ to your genome of interest Upload a list of locus tags to analyze Examine the results and describe shortly your findings/conclusion