Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq 17.08.15 Report.

Slides:



Advertisements
Similar presentations
Exercise 1: Importing Illumina data  Using the Import tool File / Import folder. Select the folder IlluminaTeratospermiaHuman6v1_BS1 In the Import files.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Processing of miRNA samples and primary data analysis
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Simon v2.3 RNA-Seq Analysis Simon v2.3.
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
RNA-seq analysis case study Anne de Jong 2015
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
What is Cluster Analysis?
RNA-seq Analysis in Galaxy
High Throughput Sequencing
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Before we start: Align sequence reads to the reference genome
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
NGS Analysis Using Galaxy
 We cannot use a two-sample t-test for paired data because paired data come from samples that are not independently chosen. If we know the data are paired,
Brief workflow RNA is isolated from cells, fragmented at random positions, and copied into complementary DNA (cDNA). Fragments meeting a certain size specification.
Introduction to RNA-Seq and Transcriptome Analysis
Expression Analysis of RNA-seq Data
Update on HTProcess Apps Sciplant May 8, HTProcessPipeline Purpose- – Provide a more functional set of commonly needed applications for RNASeq and.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
RNAseq analyses -- methods
Next Generation DNA Sequencing
RNA-Seq Analysis Simon V4.1.
Transcriptome Analysis
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.
Hierarchical Bayesian Model Specification Model is specified by the Directed Acyclic Network (DAG) and the conditional probability distributions of all.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Galaxy – Set up your account. Galaxy – Two ways to get your data.
Cluster validation Integration ICES Bioinformatics.
Vector Quantization CAP5015 Fall 2005.
The iPlant Collaborative
Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.
The iPlant Collaborative
No reference available
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
First of all: “Darnit Jim, I’m a doctor not a bioinformatician!”
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
Expression profiling & functional genomics Exercises.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Simon v RNA-Seq Analysis Simon v
Using command line tools to process sequencing data
Placental Bioinformatics
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
QC analysis Uppsala University Work done by Jonas Almlöf
Chip – Seq Peak Calling in Galaxy
Lab meeting
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Kallisto: near-optimal RNA seq quantification tool
Dimension reduction : PCA and Clustering
Working with RNA-Seq Data
Garbage In, Garbage Out: Quality control on sequence data
Transcriptomics Data Visualization Using Partek Flow Software
CD4+CLA+CD103+ T cells from human blood and skin share a transcriptional profile. CD4+CLA+CD103+ T cells from human blood and skin share a transcriptional.
Differential Expression of RNA-Seq Data
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq Report I

Overview of basic parameters of your NGS run (per sample): samples origin: Mouse Beta cells, line SR 60

FASTQC report for sample 1000_0A (randomly chosen)

FASTQC report for sample 1000_0A : Most reads start with 5’ GGG, typical in Transeq procedure

FASTQC report for sample : overrepresentation of Ins2 reads

Pre-Pipeline data processing  Need to remove 5’GGG sequences  Need to remove reads containing either poly-A or poly-T

Reads summary following removal of poly-A & poly-T reads (Mouse beta cells, line)

FASTQC report following removal of poly-A & poly-T reads & 5’ GGG trimming: (sample 1000_0A ) ‘normal’ frequency of reads’ 5’ GGG reduced frequency of %A

Data processing (pipeline) workflow (done using Mouse_mm9_v1 base repository) 1.If each sample has more than one fastq file (per sequencing read) then fastq files merging-step is performed 2.Transeq Reads pre-processing (5’ GGG trimming & polyA & polyT removal) 3.Processed-reads Mapping (using TopHat) 4.TES reads coverage profile (Transeq protocol QC step) 5.Reads Count (per 3’UTR) (using HTSeq-count) 6.Data Normalization and Differential Gene Expression (using DESeq2) 7.QC: Principal Component Analysis (PCA) & Hierarchical Clustering

Reads mapping summary_Exp4_1_ Mouse Beta cells, line

Reads mapping summary_Exp4_1_ Mouse Beta cells, line: 23-24% of total reads count were mapped to Ins2/Ins1 genes

Reads summary following removal of polyA and polyT reads (Exp4_2_Mouse Primary Beta cells):

Reads mapping summary_Exp4_2_ Mouse Primary Beta cells:

Reads mapping summary_Exp4_2_ Mouse Primary Beta cells: % of total reads count were mapped to Ins2/Ins1 genes

Hierarchical clustering: Mouse beta cells, line Drug Con=1000 Drug con=100

Hierarchical clustering: Mouse Primary beta cells Drug Con=1000Drug con=100 Separated by processing day: A/B/C ?

Differential gene expression data is assessed by DESeq2 DESeq output is summarized in a single sheet per experiment Genes differentially-expressed during each time series were called by two independent means: I.Using pairwise comparison vs. time zero II.Using a tool named: maSigPro

RNA-Seq drug dose response (1000 and 100)/ time series (0, 1, 6 and 12 hrs) gene filtering criteria The data filters used during the last analysis performed ( ): I.maSigPro: NormCounts of genes meeting MaxRawCount>50 served as input; maSigPro output was further filtered against potential outlier genes (flagged by maSigPro) Genes showing FC greater than 1.5 (at least in one of the paired-comparisons); By default maSigPro requires (BH) adjusted p-value <0.05 II.Pairwise comparison criteria: MaxRawCount>50, adjusted-p-value<0.05 and FC greater than 1.5 (at least in one of the paired-comparisons);

Exp 4.2: Mouse primary beta cells: Genes meeting criteria: Paired-comparison yields higher 100/1000 gene overlap (than the one obtained with maSigPro) maSigPro filtered outputPairwise-comparison filtered output

Exp 4.2: Mouse primary beta cells: Most of maSigPro shared-genes are included in the group of paired shared genes To determine which output is preferred data validation using an orthogonal method is essential

Partitioning clustering of genes responsive in both drug concentrations Paired comparison, con=1000, shared_genes partitioning clustering Paired comparison, con=100, shared_genes partitioning clustering Exp 4.2: Mouse primary beta cells:

Exp4.1: Mouse Beta Cells, cell line Generally this experiment yielded less significant results When applying the same filters used for the primary beta-cells datasets, very few genes pass; The possibility of using p-value (instead of adjusted-p-value should be tested by the investigator) The level of 100/1000 intersection (shared-genes) is lower here compared to the one observed in the primary cells experiment

Venn diagram of unfiltered maSigPro outputs of both the primary (100, 1000) and the cell-line TranSeq datasets Low overall intersection between the primary and the cell-line ‘significant’ genes; Relatively low intersection between the two drug concentrations tested on the beta cell line