S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

RNA-seq library prep introduction
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
RNAseq.
12/04/2017 RNA seq (I) Edouard Severing.
Peter Tsai Bioinformatics Institute, University of Auckland
RNAseq analysis Bioinformatics Analysis Team
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
RNA-seq Analysis in Galaxy
RNAseq Applications in Genome Studies
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
NGS Analysis Using Galaxy
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
Next Generation DNA Sequencing
Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.
RNA-Seq Analysis Simon V4.1.
Transcriptome Analysis
Introduction To Next Generation Sequencing (NGS) Data Analysis
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
The iPlant Collaborative
No reference available
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Simon v RNA-Seq Analysis Simon v
Introductory RNA-seq Transcriptome Profiling
Canadian Bioinformatics Workshops
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
RNA-Seq Software, Tools, and Workflows
Short Read Sequencing Analysis Workshop
RNA-Seq analysis in R (Bioconductor)
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Kallisto: near-optimal RNA seq quantification tool
Analysing ChIP-Seq Data
Sequence Analysis 2- RNA-Seq
Volume 50, Issue 1, Pages (April 2013)
Reference based assembly
From: TopHat: discovering splice junctions with RNA-Seq
Exploring and Understanding ChIP-Seq data
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
Maximize read usage through mapping strategies
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Additional file 2: RNA-Seq data analysis pipeline
Volume 21, Issue 9, Pages (November 2017)
Sequence Analysis - RNA-Seq 2
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10

RNA-seq: BIOINFORMATIC PIPELINE Cluster 3.0 / JavaTreeView (v1.1.6r4) Construction of functional clusters Pick Random (v1.0.0*) Standardization of read number to analyse Cufflinks (v2.1.1) Exon-intron structures Cuffmerge (v.1.0.0) Merger of the 21 exon-intron structures Htseq (v0.6.1p1) Quantification of read abundance RNA-seq: BIOINFORMATIC PIPELINE http://galaxy.univ-perp.fr cDNA Library construction PolyA Single strand 50 nucleotides Sequencing HiSeq2000 (Illumina) SBS technique FastQ groomer (v1.0.4*) Fastqsanger format check FASTX-Toolkit (v1.0.0*) (1) Quality statistics (2) Quality score boxplot (3) Nucleotide distribution chart Tophat (v2.0.9) with the aligner Bowtie (v2.1.0.0) Mapping on S. mansoni genome version 5.2 BAM File SAM BAM-to-SAM (v0.1.18) Filter SAM (v1.0.0*) Deletion of unmapped reads New S.mansoni transcriptome Used as reference Sex-biased genes per developmental stages QUALITY CHECK MAPPED READS SAMPLING DE NOVO TRANSCRIPTOMEASSEMBLY IDENTIFICATION OF SEX-BIASED GENES FUNCTIONAL ANALYSES Blast2GO (v2.6.4) Male and female specific biological pathways through S. mansoni lifecycle Blastx (v2.2.30)/ AmiGO (v1.8) / GeneDB De novo annotation of the 100 best sex-biased genes per stage STRUCTURAL ANALYSES IGV (v2.3.16) Blat (v34) Cuffcompare (v2.2.1) Exon/Intron structure genome v5.2 vs de novo transcriptome DEseq (v1.12.1 ) Assessment of differences in gene expression * Galaxy tool version S3 – 2/10 Cluster Analysis candidate genes expression variation through S. mansoni lifecycle

RNA-seq: QUALITY OF THE METRICS

RNA-seq: REPLICATE CLUSTERING ♀ 1 ♀ 2 ♂ 1 ♂ 2 ♀ 1 ♀ 2 ♂ 1 ♂ 2 ♀ 1 ♀ 2 ♂ 1 ♂ 2 ♀ 1 ♀ 1 ♀ 1 ♀ 2 ♀ 2 ♀ 2 ♂ 1 ♂ 1 ♂ 1 ♂ 2 ♂ 2 ♂ 2 Cercariae Schistosomula s#2 Adult worms Schistosomula s#1 ♂ 1 ♀ 1 ♂ 2 ♀ 2 Schistosomula s#3 ♂ 1 ♀ 1 ♂ 2 ♀ 2 100% 0% identity ♀1 & ♀2 female duplicates ♂1 & ♂2 male duplicates DESeq package (v1.12.1) S3 – 4/10

RNA-seq: HEATMAPS (100 best P-values per stage) ♂ 1 ♀ 1 ♂ 2 ♀ 2 Cercariae ♂ 1 ♂ 2 ♀ 1 ♀ 2 Schistosomula s#1 ♂ 1 ♀ 1 ♂ 2 ♀ 2 Schistosomula s#2 ♂ 1 ♀ 1 ♂ 2 ♀ 2 Schistosomula s#3 ♂ 1 ♀ 1 ♂ 2 ♀ 2 Adult worms DESeq package (v1.12.1) S3 – 5/10

Proportion of categories Quality analysis of the de novo transcriptome Description of the type of matches between the Cufflinks transcripts (XLOC) and the reference transcripts (Smp_ID v5.2) Number of XLOC Proportion of XLOC Match categories Proportion of categories Complete match of intron chain 6642 19,11% Smp overlap 27,86% Contained 113 0,33% Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript 2417 6,95% Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment. 272 0,78% Generic exonic overlap with a reference transcript 239 0,69% A transfrag falling entirely within a reference intron 6613 19,03% Intronic transcript Possible polymerase run-on fragment (within 2Kbases of a reference transcript) 1466 4,22% Others 7,72% Exonic overlap with reference on the opposite strand 1193 3,43% An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors) 24 0,07% Unknown, intergenic transcript 15776 45,39% Intergenic Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case 0,00% - (.tracking file only, indicates multiple classifications) Cuffcompare (Cufflinks v2.2.1) *Source of the S. mansoni genome reference: ftp://ftp.sanger.ac.uk/pub/pathogens/Schistosoma/mansoni/Latest_assembly_annotation_others/add_utrs.gff S3 – 6/10

ChIP-seq: QUALITY OF THE METRICS   Male adults Female adults Male cercariae Female cercariae Unbound_1 Unbound_2 H3K27Me3_1 H3K27Me3_2 Raw data 33 119 430 29 687 218 39 508 805 28 879 342 39 423 831 25 378 692 36 815 594 45 320 761 44 606 481 44 157 403 34 670 844 46 214 556 38 753 631 40 949 706 51 819 705 44 155 986 Groomed QC passed 28879342 45320761 46214556 Aligned (=1) 17 685 004 15 049 346 22 241 047 15 355 606 18 556 375 12 576 032 17 723 235 23 034 830 24 152 929 22 002 385 20 714 025 26 201 405 16 732 388 19 657 069 25 418 099 23 382 122 Aligned (=1) % 53,40% 50,69% 56,29% 53,17% 47,07% 49,55% 48,14% 50,83% 54,15% 49,83% 59,74% 56,70% 43,18% 48,00% 49,05% 52,95% Aligned (>1) 14 018 376 13 624 187 15 493 271 12 343 496 18 924 945 11 676 103 16 812 583 19 600 397 18 935 354 20 516 084 12 542 556 18 287 941 20 258 268 19 593 421 24 058 455 18 592 334 Aligned (>1) % 42,33% 45,89% 39,21% 42,74% 46,01% 45,67% 43,25% 42,45% 46,46% 36,18% 39,57% 52,27% 47,85% 46,43% 42,11% Total mapping % 95,72% 96,59% 95,51% 95,91% 95,07% 95,56% 93,81% 94,07% 96,60% 96,29% 95,92% 96,27% 95,45% 95,85% 95,48% 95,06% Used for peak calling 15 000 000 15000000 12576032 Number of peaks N.A. 8363 6947 14 302 5 697 7 116 11 382 5 044 4 719 S3 – 7/10

ChIP-seq: REPLICATE CONSISTENCY OF EPICHIP ANALYSIS TSS H3K27Me3 enrichment Male cercariae, replicate 1 Male cercariae, replicate 2 Female cercariae, replicate 1 Female cercariae, replicate 2 Female cercariae, replicate 3 Position on the gene (bases) TSS H3K27Me3 enrichment Position on the gene (bases) Male adults, replicate 1 Male adults, replicate 2 Female adults, replicate 1 Female adults, replicate 2 Female adults, replicate 3 S3 – 8/10 TSS = Transcription Start Site

ChIP-seq: Male and Female H3K27Me3 enrichments, depending on the developmental stages. A: Males Cercariae Adults -1000 0 = TSS +5000 0.5 0.6 0.7 0.8 0.9 1 B: Females Cercariae Adults -1000 0 = TSS +5000 0.5 0.6 0.7 0.8 0.9 1 Position on the gene (bases) S3 – 9/10 TSS = Transcription Start Site

Statistical test of EpiChIP profile differences Comparison between sexes   Extreme Differences Z value p Cercariae 76,50% 41,91 <0,001 Adults 10,70% 5,862 Comparison between stages Males 56,30% 30,811 Femelles 25,60% 13,997 All pairs of comparison were significant (Kolmogorov-Smirnov two sample tests; p<0.001). The extreme differences given by the Kolmogorov-Smirnov two sample tests show that: (i) The difference between adult male and adult female distributions is low (10.7% of maximum difference) compare to cercarial stage (76.5% of maximum difference). (ii) The difference in chromatin structural changes from cercariae to adult is twice in males compare to females (25.6% of maximum difference for females vs 56.3% of maximum difference for males). S3 – 10/10