Reference based assembly

Slides:



Advertisements
Similar presentations
Gene Prediction: Similarity-Based Approaches
Advertisements

RNA-Seq as a Discovery Tool
RNA-Seq based discovery and reconstruction of unannotated transcripts
RIP – T RANSCRIPT E XPRESSION L EVELS. O UTLINE RNA Immuno-Precipitation (RIP) NGS on RIP & its alternatives Alternate splicing Transcription as a graph.
RNAseq.
12/04/2017 RNA seq (I) Edouard Severing.
Cufflinks Matt Paisner, Hua He, Steve Smith and Brian Lovett.
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 8 Network models.
Ab initio gene prediction Genome 559, Winter 2011.
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Transcriptomics Jim Noonan GENE 760.
RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.
Picking Alignments from (Steiner) Trees Fumei Lam Marina Alexandersson Lior Pachter.
Gene Finding Charles Yan.
MATH 310, FALL 2003 (Combinatorial Problem Solving) Lecture 10, Monday, September 22.
mRNA-Seq: methods and applications
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
Sequencing a genome and Basic Sequence Alignment
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Li and Dewey BMC Bioinformatics 2011, 12:323
Todd J. Treangen, Steven L. Salzberg
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Chap. 7 Relations: The Second Time Around
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
OPERA highthroughput paired-end sequences Reconstructing optimal genomic scaffolds with.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
Maximum Flow - Anil Kishore Graph Theory Basics. Prerequisites What is a Graph Directed, Weighted graphs How to traverse a graph using – Depth First Search.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Graph theory Definitions Trees, cycles, directed graphs.
Graph Theory and Algorithm 01
Kallisto: near-optimal RNA seq quantification tool
Eukaryotic Gene Finding
Ab initio gene prediction
3.3 Applications of Maximum Flow and Minimum Cut
From: TopHat: discovering splice junctions with RNA-Seq
Transcriptome analysis
Chapter 9: Graphs Basic Concepts
Expression profiling of snoRNAs in normal hematopoiesis and AML
Vertex Covers, Matchings, and Independent Sets
Problem Solving 4.
SEG5010 Presentation Zhou Lanjun.
Alternative Splicing QTLs in European and African Populations
High-Resolution Expression Map of the Arabidopsis Root Reveals Alternative Splicing and lincRNA Regulation  Song Li, Masashi Yamada, Xinwei Han, Uwe Ohler,
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey
Volume 16, Issue 2, Pages (February 2015)
Quantitative analyses using RNA-seq data
Determine CDS Coordinates
Universal Alternative Splicing of Noncoding Exons
Sequence Analysis - RNA-Seq 2
Chapter 9: Graphs Basic Concepts
Sequence Analysis - RNA-Seq 1
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Reference based assembly Macrogen Inc 김세환

Reconstructing transcripts from RNA-Seq Denovo assembly는 발현이 많은 transcript만이 fully assembly 되는 경향이 있다.

Scripture VS Cufflinks SIMILATIRY Both programs then build directed graphs and traverse the graphs to identify distinct transcripts, using paired end information to link sparsely covered transcripts and filter out unlikely isoforms DIFFERENCE - Cufflinks uses a rigorous mathematical model to identify the complete set of alternatively regulated transcripts at each locus - Scripture employs a statistical segmentation model to distinguish expressed loci and filter out experimental noise

NATURE BIOTECHNOLOGY MAY 2010 Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs (Scripture) NATURE BIOTECHNOLOGY MAY 2010

1 2 3 4 1. Unique mapped read를 사용하고, splice junction에 align 된 read, 곧 aligned spliced read 사용.  2. Aligned spliced read로 connectivity graph 구축(canonical intron2 donor 랑 acceptor를 찾는다(GT-AG) 로 connectivity의 방향성을 선택) 3. statistical segmentation approach로 significant path를 찾는다. 4. 3번의 path로 부터 transcript graph를 구축 5. Paired-end read의 connection 정보와 그들의 distance constraint로 부터 떨어져 있던 exon을 합치기도 하고 말도 안되 보이는 isoform들은 제거 하기도 한다. 5

1. Map Read to Genome Using Tophat, since ~30% of 76 base reads are expected on average to span an exon-exon junction ‘spliced’ reads provide direct information (GT/AG or GC/AG,AT/AC)

2. Construct Connectivity Graph Use only ‘spliced’ reads for construction of connectivity graph Splicing motifs provide direct information (GT/AG or GC/AG,AT/AC) Node = base, edge = connection between base A G T A G T C G A A G T A A C A A A T C A C A G A G A A A A T A A A A A

3. Identify Significantly Enriched Paths Use a statistical segmentation strategy : segmentation approach identifies regions of mapped read enrichment compared to the genomic background A G T A G T C G A A G T A A C A A A T C A C A G A G A A A A T A G A C C G C C

4. Construct Transcript Graphs Each node in a transcript graph is an exon and each edge is a splice junction A path through the graph represents one isoform of the gene

5. Weighting of Isoforms Isoform 1 Insert size distribution Single isoform 이 있는 유전자들의 paired end를 가지고 insert size distribution을 구한다. 각 path에 떨어지는 paired end들의 I (inferred insert size – average insert size) 를 구한다. I의 확률 값을 insert size distribution의 면적으로 구한다. 각 path 마다의 paired end의 확률값의 평균을 구한다. 0.1 보다 작은 isoform은 버린다. (Σ probability of insert size of paired read) Normalized weighted score of Isoform 1 = (# of paired read) Filter out :: Normalized weighted score < 0.1

NATURE BIOTECHNOLOGY MAY 2010 Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation (Cufflinks) NATURE BIOTECHNOLOGY MAY 2010

Cufflinks Cufflinks seek an assembly that parsimoniously explains the fragments from the RNA-Seq experiment; => Every fragment in the experiment should have come from a Cufflinks transcript, and Cufflinks should produce as few transcripts as possible with that property

Transcript Assembly Isoform 1 Isoform 2

Transcript Assembly

Transcript Assembly - compatibility & incompatibility - Compatibility Incompatiblity Nested Uncertain : x4 assembly that parsimoniously explains the fragments from the RNA-Seq experiment : aligned 된 모든 fragment를 설명할 수 있는 최소한의 transcript를 assembly 하는 것이 목표 - compatibility & incompatibility -

Transcript Assembly Nested incompatible

Transcript Assembly Nested incompatible chain

Transcript Assembly Bipartite graph Directed Acyclic Graph

Transcript Assembly Hasse diagram & reachability graph Theorem (Dilworth's theorem) Let P be a finite partially ordered set. The maximum number of elements in any antichain of P equals the minimum number of chains in any partition of P into chains Theorem (Konig's theorem) In a bipartite graph, the number of edges in a maximum matching equals the number of vertices in a minimum vertex cover. Theorem Dilworth's theorem is equivalent to Konig's theorem. 하세 diagram도 그리고 DAG(direct acyclic graph)도 그린다. 그리고 이것의 transitive closure(바로 연결된 edge 뿐만 아니라 a->b, b->c를 통해 a->c 인 edge까지 포함하는 것)인 reachability graph를 그린다 Dilworth’s theorem 문제를 konig’s theorem문제로 바꾸는 이유는 시간 절약(maximum matching problem이 polynomial time algorithm 이라서.) Hasse diagram & reachability graph

the percent-spliced-in x

Transcript Assembly Finally, Finding minimum number of chains in directed acyclic graph is reduced to finding maximum matching problem in bipartite graph This can be solved by LEMON and Boost graph library.

Conditions for filtering transctript x x aligns to the genome entirely within an intronic region of the alignment for a transcript y, and the abundance of x is less than 15% of y's abundance. x is supported by only a single fragment alignment to the genome. More than 75% of the fragment alignments supporting x, are mappable to multiple genomic loci. x is an isoform of an alternatively spliced gene, and has an estimated abundance less than 5% of the major isoform of the gene.

Keyword for Fresher 1.Reference-based assembly == mapping-first approach Likelihood 와 probability의 차이 :

Keyword for Intermediate 1. Graph theory - reading recommendation : introduction to graph theory

Keyword for Expert 1. Scan statistics

Transcript Assembly Bipartite graph Directed Acyclic Graph

Transcript Assembly