Titus Brown Qingpeng Zhang John Blischak Welcome!.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Marius Nicolae Computer Science and Engineering Department
RNA-Seq based discovery and reconstruction of unannotated transcripts
Walk-thru of CAGE exercise Also at /tag_analysis/ /tag_analysis/
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
Next-generation sequencing
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Genome Annotation BCB 660 October 20, From Carson Holt.
mRNA-Seq: methods and applications
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
Li and Dewey BMC Bioinformatics 2011, 12:323
Todd J. Treangen, Steven L. Salzberg
RNAseq analyses -- methods
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Developing a Novel Assay for High Resolution Melting analysis Scanning “YFG” Jason McKinney Scientist.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
The iPlant Collaborative
C. Titus Brown Asst Prof, CSE and Micro Michigan State University
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Tag profiling is dead... October 2009 Claudia Voelckel Patrick Biggs...long live mRNA-Seq!
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
1 CS161 Introduction to Computer Science Topic #9.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
Introduction to RNAseq
De novo assembly validation
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
….. The cloud The cluster…... What is “the cloud”? 1.Many computers “in the sky” 2.A service “in the sky” 3.Sometimes #1 and #2.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Canadian Bioinformatics Workshops
Interpreting exomes and genomes: a beginner’s guide
Lesson: Sequence processing
RNA Quantitation from RNAseq Data
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Quality Control & Preprocessing of Metagenomic Data
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
Metafast High-throughput tool for metagenome comparison
Transcriptomics II De novo assembly
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Kallisto: near-optimal RNA seq quantification tool
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.
Transcriptome Assembly
Do You Want to Build a Transcriptome?
Algorithm Discovery and Design
Maximize read usage through mapping strategies
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey
Assessing changes in data – Part 2, Differential Expression with DESeq2
Transcript length distribution resulting from different assemblies of the embryo samples across the three technologies (HiSeq, MiSeq, and PacBio). Transcript.
Sequence Analysis - RNA-Seq 2
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

Titus Brown Qingpeng Zhang John Blischak Welcome!

Goals! Drive-by introduction to: Cloud computing Basic Illumina sequence quality evaluation & control De novo mRNAseq assembly A (our) “protocol” for mRNAseq analysis => diff expr Variant calling protocol, too This will let you explore other online resources to your heart’s content, we hope! Other protocols & tutorials khmer-protocols ged.msu.edu/angus/tutorials-2013

Our goals: Answer your questions! Help you figure out what questions to ask! Point to further materials!

Structure of day Start by logging into cloud machines, grabbing data, running analyses. Coffee break at 10:30 After lunch, check out variant calling. Coffee break at 2:30 Some open time, if possible Starting your own cloud machine (costs $$, but: freedom!).

Strategy Run stuff! Talk while it’s running. Ask questions whenever!

Technology! Stickies Minute cards …Dropbox?

Etherpad?

Why… the cloud? Rental computers for small and BIG problems. Completely reproducible; independent of institution; so I can write tutorials! Once you get something working in the cloud, your local sysadmins can often help you get it running at your institution. If not, well, you can always pay $$. (How much? est $150 compute/$1000 mRNAseq sample)

The challenges of non-model transcriptomics Missing or low quality genome reference. Evolutionarily distant. Most extant computational tools focus on model organisms – Assume low polymorphism (internal variation) Assume reference genome Assume somewhat reliable functional annotation More significant compute infrastructure …and cannot easily or directly be used on critters of interest.

The problem of lamprey… Diverged at base of vertebrates; evolutionarily distant from model organisms. Large, complicated genome (~2 GB) Relatively little existing sequence. We sequenced the liver genome…

Assembly It was the best of times, it was the wor, it was the worst of times, it was the isdom, it was the age of foolishness mes, it was the age of wisdom, it was th It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness …but for lots and lots of fragments!

Shared low-level transcripts may not reach the threshold for assembly.

Two problems: We want to assemble a lot of stuff together. We need to construct transcript families (to collapse isoforms) without having a reference genome.

Diginorm

Solution: Digital normalization (a computational version of library normalization) Suppose you have a dilution factor of A (10) to B(1). To get 10x of B you need to get 100x of A! Overkill!! This 100x will consume disk space and, because of errors, memory. We can discard it for you…

Digital normalization

Digital normalization approach A digital analog to cDNA library normalization, diginorm: Is single pass: looks at each read only once; Does not “collect” the majority of errors; Keeps all low-coverage reads; Smooths out coverage of regions. => Enables analyses that are otherwise completely impossible.

Partitioning transcripts into families based on overlap

Isoform analysis – some easy…

Isoform analysis – some hard Counting methods mostly rely on presence of unique sequence to which to map.

Exons are easy to locate, given genomic sequence

Genome-reference-free assembly leads to many isoforms. Massive redundancy!

Gene models can be “collapsed” given genomic sequence... But don’t always have.

Solution: Partitioning transcripts into “transcript families” Pell et al., 2012, PNAS