LOC_Os02g08480 Supplementary Figure S1. Exons shorter than a read length have few or no reads aligned. The gene at LOC_Os02g08040 contains exons shorter.

Slides:



Advertisements
Similar presentations
Transcriptome Sequencing with Reference
Advertisements

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping
CSE182-L12 Gene Finding.
RNA-seq Analysis in Galaxy
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Todd J. Treangen, Steven L. Salzberg
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Introduction to RNA-Seq
Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Sackler Medical School
Supplementary Figure 2A. A. ZMYM6-variant missing Exon 2 C. ZMYM6-variant missing Exon 4 B. ZMYM6-variant missing Exon 5 D. ZMYM6-variant missing Exons.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
A B IL-4(+) IL-4(-) IL-4(+) IL-4(-) ChIP-Seq (STAT6) Ramos IL-4 (+) P-value Ramos IL-4 (-) P-value BEAS2B IL-4 (+) P-value BEASB IL-4 (-) P-value fold.
The iPlant Collaborative
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Supplemental Figure 1. Bias-corrected NGS bioinformatics strategies. Paired-end DNA sequencing reveals the sequence of the genomic clone, the sample ID.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
C acaatataATGGAGCGTGAGACTTCGTCATCTTCAACTCCTCCGGAGGATCTTGTTACATCGATGATCGGAAAGTTCGTCGCTGTCATGTCTA b acaatataATGGAGCGTGAGACTTCGTCATCTTCAACTCCTCCGGAGGATCTTGTTACATCGATGATCGGAAAGTTCGTCGCTGTCTTGTCTA.
Figure 1. Gene expression analysis
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Recurrent inversion breaking intron 1 of the factor VIII gene is a frequent cause of severe hemophilia A by Richard D. Bagnall, Naushin Waseem, Peter M.
GENE REGULATION prokaryotic cells – have about 2,000 genes
Volume 64, Issue 6, Pages (June 2016)
From: TopHat: discovering splice junctions with RNA-Seq
Sensitivity of RNA‐seq.
Deficiency of the ADP-Forming Succinyl-CoA Synthase Activity Is Associated with Encephalomyopathy and Mitochondrial DNA Depletion  Orly Elpeleg, Chaya.
Discovery and Characterization of piRNAs in the Human Fetal Ovary
Volume 64, Issue 6, Pages (June 2016)
Alternative Splicing May Not Be the Key to Proteome Complexity
Volume 3, Issue 4, Pages (April 2013)
Alternative Splicing QTLs in European and African Populations
Mapping Whole-Transcriptome Splicing in Mouse Hematopoietic Stem Cells
High-Resolution Expression Map of the Arabidopsis Root Reveals Alternative Splicing and lincRNA Regulation  Song Li, Masashi Yamada, Xinwei Han, Uwe Ohler,
Widespread Inhibition of Posttranscriptional Splicing Shapes the Cellular Transcriptome following Heat Shock  Reut Shalgi, Jessica A. Hurt, Susan Lindquist,
by Jonathan P. Ling, Olga Pletnikova, Juan C. Troncoso, and Philip C
Volume 117, Issue 3, Pages (September 1999)
Diverse abnormalities manifest in RNA
Volume 33, Issue 4, Pages (February 2009)
Supplemental Figure 3 A B C T-DNA 1 2 RGLG1 2329bp 3 T-DNA 1 2 RGLG2
Tau Mutations Cause Frontotemporal Dementias
Volume 28, Issue 2, Pages e5 (January 2018)
Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2  Lukas M. Simon, Edward S. Chen, Leonard C. Edelstein,
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Daniel F. Tardiff, Scott A. Lacadie, Michael Rosbash  Molecular Cell 
Jun Xu, Roger W. Hendrix, Robert L. Duda  Molecular Cell 
Volume 8, Issue 6, Pages (September 2014)
Volume 10, Issue 7, Pages (February 2015)
The PHANTASTICA Gene Encodes a MYB Transcription Factor Involved in Growth and Dorsoventrality of Lateral Organs in Antirrhinum  Richard Waites, Harinee.
Understanding splicing regulation through RNA splicing maps
Complete Haplotype Sequence of the Human Immunoglobulin Heavy-Chain Variable, Diversity, and Joining Genes and Characterization of Allelic and Copy-Number.
Antisense expression associates with larger gene expression variability. Antisense expression associates with larger gene expression variability. (A–D)
Universal Alternative Splicing of Noncoding Exons
Sequence Analysis - RNA-Seq 2
Volume 8, Issue 6, Pages (September 2014)
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Relative expression levels of three PTPN22 transcripts.
Figure 1a. Insertion of sequence into Claudi capsid gene
Volume 15, Issue 5, Pages (May 2016)
Volume 11, Issue 7, Pages (May 2015)
Neurodegenerative Tauopathies
Volume 97, Issue 6, Pages (June 1999)
Retained introns in AA and EA cases.
Presentation transcript:

LOC_Os02g08480 Supplementary Figure S1. Exons shorter than a read length have few or no reads aligned. The gene at LOC_Os02g08040 contains exons shorter than 50nt in length. Because these exons are shorter than a single read, full-length reads from spliced transcripts will not align to the genome at the location of the exons. By taking advantage of junction alignments by Tophat, though, the exons can be identified. The exons inside the red boxes are less than 50 nt in length and cannot be detected by Tiling Assembly based solely on the read alignment. The shade of a junction in the figure indicates the number of junctions at that position, with black bars indicating many junctions and light grey bars indicating fewer junctions.

LOC_Os02g WRKY71 Supplementary Figure S2. Initial steps of Tiling Assembly show genes with intron retention or noise as single exon genes. Small numbers of reads aligning across a junction lead to identification of multiple exons as a single exon. The gene at LOC_Os02g08440 was initially identified as a single exon gene due to noise reads aligning to the introns (red boxes). If there is a junction with low read coverage, Tiling Assembly identifies this region as an intron. MSU TA

Junction Junction Boundary LOC_Os05g WRKY70 Supplementary Figure S3. Junction boundaries were used to identify exon boundaries and eliminate noise reads. Occasionally, noise reads align across a junction or reads overlap the junction. The boundaries specified by Tophat junction alignments were used to fine-tune exon boundaries to within one nucleotide. The portion of the upper figure surrounded by the red box is magnified in the lower figure to better show the exon boundaries.

False Junction Valid Junction Regions with high similarity LOC_Os01g01800 LOC_Os01g01830 Supplementary Figure S4. Similar sequences can lead to invalid junction mapping. When two regions are highly similar to each other, junction alignments may erroneously lead to the alignment of a junction between two genes, as is seen with LOC_Os01g01800 and LOC_Os01g In order to prevent two genes from being erroneously merged based on these junction alignments, Tiling Assembly allows the user to specify a maximum length for a junction that skips exons.

Supplementary Figure S5. OLego identified more junctions than Tophat. Of the 158,314 junctions identified by OLego, 124,594 junctions (78.7%) matched identically to a junction identified by Tophat. Of the remaining 33,720 junctions identified by OLego, 71.3% were determined from a single read. OLego Junctions 158,314 Tophat Junctions 138,986 33,720124,594 14,392

Cufflinks gap TA gap Cuff TA A. B. MSU Cufflinks gap TA gap Cuff TA MSU Supplementary Figure S6. Low Read Alignment Leads to Assembly Errors by Both TA and Cufflinks. Genes with few reads aligning produced errors in alignment by both TA and Cufflinks, compared to the MSU annotation. These errors included A) gaps, or B) missing junctions.

Ignored Junctions Cufflinks Test Gene Supplementary Figure S7. Cufflinks Ignores Junctions. In many cases, such as the one presented, Cufflinks was found to ignore junction alignments in the identification of genes. In this case, Cufflinks identified three genes, while all three were joined by junction alignments. The shade of a junction in the figure indicates the number of junctions at that position, with black bars indicating many junctions and light grey bars indicating fewer junctions.

Cufflinks Genes Supplementary Figure S8. Cufflinks Identifies Intronic Noise Reads as Genes. Small numbers of reads, likely present due to noise or intron retention, align to intronic areas of a gene. Cufflinks often identifies these regions as genes separate from the gene containing the exons flanking the intron, as in the example shown. The shade of a junction in the figure indicates the number of junctions at that position, with black bars indicating many junctions and light grey bars indicating fewer junctions.

MSU exons TA exons LOC_Os01g01010 Supplementary Figure S9. Tiling Assembly can detect exons with an expression as low as 50 RPKE. In order to determine the point at which Tiling Assembly fails to correctly identify exons, reads aligning to LOC_Os01g01010 were reiteratively decreased and Tiling Assembly was run on the gene. All exons of the gene were correctly identified at expression levels of 50 RPKE, as can be seen with exons e3 and e4 in the red boxes. Below 50 RPKE, exons began to be misidentified. The user is able to specify the minimum expression level required for exon identification by Tiling Assembly.

Intron was recognized Intron was not recognized LOC_Os06g09560 Supplementary Figure S10. Genes where introns are retained at less than 50% were recognized as introns by Tiling Assembly. In order to identify the most common isoform of a gene where intron retention is a possibility, a 50% read-depth threshold was used. Tophat junction alignments were recognized as introns if the read depth across the junction was less than 50% of the read depth of the exons on either side of the junction. This threshold is user-adjustable. TA MSU

Kikuchi TA Supplementary Figure S11. Differences between Tiling Assembly and FL-cDNAs may be attributed to alternative splicing. A large number of FL-cDNAs agreed with Tiling Assembly-identified genes, however, there were some areas where the exon number differed between Tiling Assembly and its corresponding FL-cDNA. The red arrows in the above images indicate A) Tiling Assembly has an extra exon, B) Tiling Assembly is missing an exon, and C) Tiling Assembly has an extra intron, and D) Tiling Assembly is missing an intron. A. B. C. D.