Class 02: Whole genome sequencing. The seminal papers www.cs.arizona.edu/people/gene/#papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.

Slides:



Advertisements
Similar presentations
Assembling Algorithms and Techniques Upmanyu Misra Computational Issues in Molecular Biology CSE
Advertisements

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Sequence Assembly for Single Molecule Methods Steven Skiena, Alexey Smirnov Department of Computer Science SUNY at Stony Brook {skiena,
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
Genome sequence assembly
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
DNA Sequencing Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the circular genome (host)
Assembly.
DNA Sequencing and Assembly
CS273a Lecture 4, Autumn 08, Batzoglou Fragment Assembly (in whole-genome shotgun sequencing) CS273a Lecture 5.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
CSE182-L10 LW statistics/Assembly. Whole Genome Shotgun Break up the entire genome into pieces Sequence ends, and assemble using a computer LW statistics.
Genome sequencing and assembling
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
1 Sequencing and Sequence Assembly --overview of the genome sequenceing process Presented by NIE, Lan CSE497 Feb.24, 2004.
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
Bacterial Genome Finishing Using Optical Mapping Dibyendu Kumar, Fahong Yu and William Farmerie Interdisciplinary Center for Biotechnology Research, University.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
De-novo Assembly Day 4.
How to Build a Horse Megan Smedinghoff.
Mouse Genome Sequencing
CS 394C March 19, 2012 Tandy Warnow.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Sequencing a genome. Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native.
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Fragment assembly of DNA A typical approach to sequencing long DNA molecules is to sample and then sequence fragments from them.
Human Genome.
Today Please read… Science 291: Human Genome Project Dissenters My Brush with Greatness? 1992: Two years into the HGP, two of the projects.
By Alfonso Farrugio, Hieu Nguyen, and Antony Vydrin Sequencing Technologies and Human Genetic Variation.
Whole Genome Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 13, 2005 ChengXiang Zhai Department of Computer Science University of.
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
1. Assembly by alignment Instead of overlap-layout-consensus we use alignment-consensus 2.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
OPERA highthroughput paired-end sequences Reconstructing optimal genomic scaffolds with.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
DNA Sequencing Project
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Fragment Assembly (in whole-genome shotgun sequencing)
Genome sequence assembly
Pre-genomic era: finding your own clones
Introduction to Genome Assembly
CS 598AGB Genome Assembly Tandy Warnow.
Bioinformatics: Buzzword or Discipline (???)
CSE 589 Applied Algorithms Spring 1999
CSCI 1810 Computational Molecular Biology 2018
Sequence the 3 billion base pairs of human
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
Fragment Assembly 7/30/2019.
Presentation transcript:

Class 02: Whole genome sequencing

The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA Sequencing'' ``A Whole-Genome Assembly of Drosophila''``A Whole-Genome Assembly of Drosophila''

Shotgun sequencing Multiply target sequence Break sequences into random fragments Sort by size, discard big and small pieces ‘Insert into bacterial virus (‘vector’) Infect bacterial, and let it reproduce, ‘cloning’ the insert ‘Read’ the insert

Definitions G – length of target sequence L – avg length of read R – number of sequencing reads N – base pairs sequences = RL I – avg length of clone inset c – N/G = avg sequence coverage m – RI/2G, avg clone or map coverage

Problems Incomplete coverage Sequencing errors (<.01, avg) Unknown orientation Repeated sequences

Repeat problem Repeats vary in length, number, fidelity Length: few bp to thousands Number: highly variable, even by individual Fidelity: sometimes 1-2% variation, or less (multiple copies, pseudogenes) Long, infrequent, hi-fi repeats are the biggest problem

Overlap phase Compare every read (in both orientations) to every other Accept weighted agreement, bounded by fixed epsilon Exact solution is tractable Result is overlap graph, with each read a node, each overlap an edge

Layout phase Determine pairs which position each fragment In graph theoretic terms, find a spanning forest Optimal spanning forest is NP-hard Variation on greedy is commonly used

Consensus phase Problem: find consensus of multiple alignment of reads Initially, use overlaps in the spanning forest Apply one of several algorithms to refine this

Mates & contigs

‘Double-barreled’ shotgun Choose inserts of length at least two ‘reads’ Sequence both ends (we know their relative orientation and distance) Used to order and orient contigs Use a supplementary process to fill in the gaps between contigs

Clone by clone (HGP)

Whole genome assembly Mates can resolve short repeats Problem when you ‘exit’ the repeat: you don’t know which is right Resolve using a mate pair which has a read in the unique flanking sequence

Whole genome (illustr)