BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Slides:



Advertisements
Similar presentations
Figure 7.1 Genomes 3 (© Garland Science 2007) Figure 7.2 Genomes 3 (© Garland Science 2007)
Advertisements

Bioinformatics Tutorial I BLAST and Sequence Alignment.
WGS Assembly and Reads Clustering Zemin Ning Production Software Group Informatics Division.
Lecture 14 Genome sequencing projects
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
BME 130 – Genomes Lecture 3 Sequencing technology I The bad old days.
DNA Sequencing – “Plus and Minus” Plus –Incubate with T4 DNA Polymerase and single dNTP –T4 Polymerase degrades 3’ ends in absence of dNTP –Fractionated.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
DNA Sequencing Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the circular genome (host)
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Assembly.
CSE182-L12 Gene Finding.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Transcriptional profiling I – microarrays and proteomics
BME 130 – Genomes Lecture 23 Genome evolution I: Mutations, repair, and recombination.
CSE182-L10 LW statistics/Assembly. Whole Genome Shotgun Break up the entire genome into pieces Sequence ends, and assemble using a computer LW statistics.
BME 130 – Genomes Lecture 4 Sequencing technology II Next generation sequencing.
Genome sequencing and assembling
BME 130 – Genomes Lecture 16 Alternative genome anatomies (Viruses and mobile elements)
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
Whole Exome Sequencing for Variant Discovery and Prioritisation
How to Build a Horse Megan Smedinghoff.
Mouse Genome Sequencing
CS 394C March 19, 2012 Tandy Warnow.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
Introduction to Short Read Sequencing Analysis
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Detection of Genomic Rearrangements in K562 cells using Paired End Sequencing Rosa Maria Alvarez Massachusetts Institute of Technology Class of 2009.
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
BME 130 – Genomes Lecture 20 Gene expression and mRNA processing.
Human Genome.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
Supplemental Figure 1. Bias-corrected NGS bioinformatics strategies. Paired-end DNA sequencing reveals the sequence of the genomic clone, the sample ID.
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
Short Read Workshop Day 5: Mapping and Visualization
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
BME 130 – Genomes Lecture 4 Sequencing technology II Next generation sequencing.
Virginia Commonwealth University
CAP5510 – Bioinformatics Sequence Assembly
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Genome sequence assembly
Assembly.
Pre-genomic era: finding your own clones
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Department of Computer Science
Stuff to Do.
CS 598AGB Genome Assembly Tandy Warnow.
2nd (Next) Generation Sequencing
Understanding a Genome Sequence
CSCI 1810 Computational Molecular Biology 2018
Understanding a Genome Sequence
Introduction to Sequencing
A T C.
IWGS workflow. iWGS workflow. A typical iWGS analysis consists of four steps: (1) data simulation (optional); (2) preprocessing (optional); (3) de novo.
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

BME 130 – Genomes Lecture 5 Genome assembly I The good old days

Administrivia Homework 1 – on the website today, due Friday; homework policy Student-led paper discussion; choose groups and pick paper Guest lecture Friday – Bob Kuhn will demo the UCSC genome browser

Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses Genomics in the news Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses Citation: Gilbert C, Feschotte C (2010) Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses. PLoS Biol 8(9): e1000495. doi:10.1371/journal.pbio.1000495

Figure 4.10 Genomes 3 (© Garland Science 2007)

Figure 4.10 part 1 of 2 Genomes 3 (© Garland Science 2007)

Figure 4.10 part 2 of 2 Genomes 3 (© Garland Science 2007)

Sequence assembly de novo reference- guided overlap layout consensus Reference sequence s1 s5 s3 s4 s2 s6

Most CPU and memory demanding stage de novo sequence assembly Most CPU and memory demanding stage overlap s1 s2 s3 s4 s5 s6 Phrap: “banded” alignment of reads around k-mer matches; tolerate alignment mismatches of low-quality bases Phusion: group reads sharing >= 11 k-mers of 17 bases Celera: k-mer seed and extend alignment of reads Arachne: 24-mer seed and extend alignment of reads newbler: flowgram similarities (?)

de novo sequence assembly Generate alignments s1 s2 s3 s4 s5 s6 s5 s1 s5 Find connected components s1 s2 s2 s3 s3 s4 s6 s4 s6 Wide range of strategies for the layout stage, many using mate-pair information

de novo Sequence assembly consensus PHRAP Consensus base is base with highest quality score Quality score for position is based on all reads quality scores s1 s5 s2 s3 PCAP/CAP3 Sum up quality scores for each base take base with highest sum Quality score for position: highest sum – all other sums s4 s6

Reference-guided sequence assembly Advantages (much) faster Reference sequence Advantages (much) faster (much) less memory Disadvantages Indels/rearragements Lack of closely related reference Bias towards reference similarity Pop M et al., “Comparative Genome Assembly” Brief Bioinform. 2004 Sep;5(3):237-48.

Why is this called a sequence gap and not a physical gap? Figure 4.11a Genomes 3 (© Garland Science 2007)

Closing a physical gap means finding a physical clone to sequence that will span the gap

Genomic DNA is template for this PCR Figure 4.11b Genomes 3 (© Garland Science 2007)

Chromosome walking (is slow) Figure 4.12 Genomes 3 (© Garland Science 2007)

PCR from clone library Insert 1 connects to who? Figure 4.13 Genomes 3 (© Garland Science 2007)

Figure 4.14 Genomes 3 (© Garland Science 2007)

Figure 4.15 Genomes 3 (© Garland Science 2007)

Figure 4.15a Genomes 3 (© Garland Science 2007)

Figure 4.15b Genomes 3 (© Garland Science 2007)

Figure 4.15c Genomes 3 (© Garland Science 2007)

Figure 4.15d Genomes 3 (© Garland Science 2007)

Assembly can by validated by mate-pair information Figure 4.16 Genomes 3 (© Garland Science 2007)

Figure 4.16a Genomes 3 (© Garland Science 2007)

Figure 4.16b Genomes 3 (© Garland Science 2007)

Figure 4.17a Genomes 3 (© Garland Science 2007)

Figure 4.17b Genomes 3 (© Garland Science 2007)

Figure 4.18 Genomes 3 (© Garland Science 2007)