June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.

Slides:



Advertisements
Similar presentations
MCB Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
September 4, 2014 Using National Cyberinfrastructure Tom Doak Carrie Ganote National Center for Genome Analysis Support.
DNAseq analysis Bioinformatics Analysis Team
Peter Tsai Bioinformatics Institute, University of Auckland
Introduction to Short Read Sequencing Analysis
Bioinformatics caacaagccaaaactcgtacaaCgagatatctcttggaaaaactgctcacaatattgacgtacaaggttgttcatgaaactttcggtaAcaatcgttgacattgcgacctaatacagcccagcaagcagaat Managing.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Sequencing and Sequence Alignment
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
RNA-seq Analysis in Galaxy
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
De-novo Assembly Day 4.
Li and Dewey BMC Bioinformatics 2011, 12:323
National Center for Genome Analysis Support: Carrie Ganote Ram Podicheti Le-Shin Wu Tom Doak Quality Control and Assessment.
Expression Analysis of RNA-seq Data
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
Introduction to next generation sequencing Rolf Sommer Kaas.
Introduction to Short Read Sequencing Analysis
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
RNAseq analyses -- methods
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Next Generation DNA Sequencing
RNA-Seq Analysis Simon V4.1.
RNA Sequencing I: De novo RNAseq
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
The iPlant Collaborative
Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
De novo assembly of RNA Steve Kelly
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
….. The cloud The cluster…... What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2.
Lean Notes Page. Lean Notes What are the four things that need to get written on the reference edge ?_______ ______ ______ ______ This written information.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
The genetic engineers toolkit A brief overview of some of the techniques commonly used.
Canadian Bioinformatics Workshops
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Simon v RNA-Seq Analysis Simon v
Introductory RNA-seq Transcriptome Profiling
Computing challenges in working with genomics-scale data
Lesson: Sequence processing
VCF format: variants c.f. S. Brown NYU
Short Read Sequencing Analysis Workshop
Transcriptomics II De novo assembly
Pre-assembly analyses
Toward Next Generation Biodiversity Research
Kallisto: near-optimal RNA seq quantification tool
Do You Want to Build a Transcriptome?
2nd (Next) Generation Sequencing
Maximize read usage through mapping strategies
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
BF528 - Sequence Analysis Fundamentals
Toward Accurate and Quantitative Comparative Metagenomics

RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support

National Center for Genome Analysis Support: Summary What does the raw data look like? What is “sequence quality”? What needs to be done before assembly? What is assembly anyway?

National Center for Genome Analysis Support: Hot out of the oven! The most common sequence format for raw “reads” is called fastq. It has 4 lines per sequence: There are many methods for getting to this point. Chemistry and technique, machinery, and approach can be different, but all must call bases and qualities.

National Center for Genome Analysis Support: What is Sequence Quality? The quality “score” is assigned by the sequencing machine as it reads a single base. It is a rough estimate of how ambiguous the signal is – how “sure” the machine is that it’s labeling the base correctly.

National Center for Genome Analysis Support: What needs to be done before assembly? Quality Control – Assess the state of the reads using FastQC ~ Demo ~ Trim and shape the reads based on your assessment using Trimmomatic.

National Center for Genome Analysis Support: What is assembly anyway? An assembler attempts to create one long string of nucleotides from the millions of short pieces it is given (ideally, one string per mRNA transcript). There are many approaches to this puzzle problem.

National Center for Genome Analysis Support: What is assembly anyway? We will explore the Trinity de novo assembler. De novo means “from scratch” – without a reference

National Center for Genome Analysis Support: What is assembly anyway? Other assemblers will try to align raw reads to a reference genome or transcriptome (e.g., Tophat or Bowtie)

National Center for Genome Analysis Support: It finished! We’re done, right? An assembler solves a computer problem of putting together a puzzle from tiny pieces. The output of the assembler is a guess – but we don’t know how accurate it is. We could look at: Basic stats of the assembly – “Contigs” Number of “Contigs” vs. Expected Number N50 – a weighted average Average Length Max Length Check contigs against known genes - Blast

National Center for Genome Analysis Support: What could go wrong? If the assembly came out poorly, handling the data differently could solve the problem. More/less stringent quality cutoff Clean data of “primers” Assemble with a different program/parameters Normalize data – this removes redundant reads from the set, making the dataset much smaller and making the job easier on the assembler

National Center for Genome Analysis Support: What could go wrong? Sometimes, the problem could be with the biological samples. More sequence will usually help. Genetic hiccups for the assembler – repeats, related genes Sample Prep was incorrect or poorly suited Low “Coverage”: Coverage: Like layers of paint on a stubborn surface, too few can leave holes, or “gaps”

National Center for Genome Analysis Support: Fin Thanks for watching! Questions and comments: