Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.

Similar presentations


Presentation on theme: "June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support."— Presentation transcript:

1 June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support

2 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org Summary What does the raw data look like? What is “sequence quality”? What needs to be done before assembly? What is assembly anyway?

3 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org Hot out of the oven! The most common sequence format for raw “reads” is called fastq. It has 4 lines per sequence: There are many methods for getting to this point. Chemistry and technique, machinery, and approach can be different, but all must call bases and qualities.

4 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org What is Sequence Quality? The quality “score” is assigned by the sequencing machine as it reads a single base. It is a rough estimate of how ambiguous the signal is – how “sure” the machine is that it’s labeling the base correctly.

5 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org What needs to be done before assembly? Quality Control – Assess the state of the reads using FastQC ~ Demo ~ Trim and shape the reads based on your assessment using Trimmomatic.

6 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org What is assembly anyway? An assembler attempts to create one long string of nucleotides from the millions of short pieces it is given (ideally, one string per mRNA transcript). There are many approaches to this puzzle problem.

7 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org What is assembly anyway? We will explore the Trinity de novo assembler. De novo means “from scratch” – without a reference

8 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org What is assembly anyway? Other assemblers will try to align raw reads to a reference genome or transcriptome (e.g., Tophat or Bowtie)

9 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org It finished! We’re done, right? An assembler solves a computer problem of putting together a puzzle from tiny pieces. The output of the assembler is a guess – but we don’t know how accurate it is. We could look at: Basic stats of the assembly – “Contigs” Number of “Contigs” vs. Expected Number N50 – a weighted average Average Length Max Length Check contigs against known genes - Blast

10 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org What could go wrong? If the assembly came out poorly, handling the data differently could solve the problem. More/less stringent quality cutoff Clean data of “primers” Assemble with a different program/parameters Normalize data – this removes redundant reads from the set, making the dataset much smaller and making the job easier on the assembler

11 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org What could go wrong? Sometimes, the problem could be with the biological samples. More sequence will usually help. Genetic hiccups for the assembler – repeats, related genes Sample Prep was incorrect or poorly suited Low “Coverage”: Coverage: Like layers of paint on a stubborn surface, too few can leave holes, or “gaps”

12 National Center for Genome Analysis Support: http://ncgas.orghttp://ncgas.org Fin Thanks for watching! Questions and comments: Email help@ncgas.org


Download ppt "June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support."

Similar presentations


Ads by Google