Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.

Slides:



Advertisements
Similar presentations
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Advertisements

Studying genomes.
SEQUENCING-related topics 1. chain-termination sequencing 2. the polymerase chain reaction (PCR) 3. cycle sequencing 4. large scale sequencing stefanie.hartmann.
9 Genomics and Beyond Brief Chapter Outline
A Lot More Advanced Biotechnology Tools DNA Sequencing.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
DNA Sequencing – “Plus and Minus” Plus –Incubate with T4 DNA Polymerase and single dNTP –T4 Polymerase degrades 3’ ends in absence of dNTP –Fractionated.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
The Human Genome Race. Collins vs. Venter Collins Venter.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Genome sequencing and assembling
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
Přednáška odpadá. Last lecture summary recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join.
Reading the Blueprint of Life
DNA Technology and Genomics
Recombinant DNA Technology for the non- science major.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Mouse Genome Sequencing
AP Biology Ch. 20 Biotechnology.
CS 394C March 19, 2012 Tandy Warnow.
AP Biology A Lot More Advanced Biotechnology Tools Sequencing.
Genomics BIT 220 Chapter 21.
Technological Solutions. In 1977 Sanger et al. were able to work out the complete nucleotide sequence in a virus – (Phage 0X174) This breakthrough allowed.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Sequencing a genome. Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native.
A Lot More Advanced Biotechnology Tools (Part 1) Sequencing.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Used for detection of genetic diseases, forensics, paternity, evolutionary links Based on the characteristics of mammalian DNA Eukaryotic genome 1000x.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Studijní obor Bioinformatika. LAST LECTURE SUMMARY.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Biochemistry 412 Overview of Genomics & Proteomics 18 January 2005.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
Human Genome.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
GENETIC ENGINEERING CHAPTER 20
Genomics Chapter 18.
Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Whole Genome Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 13, 2005 ChengXiang Zhai Department of Computer Science University of.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
Virginia Commonwealth University
Human Genome Project.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Genome sequence assembly
The Human Genome Project
A Sequenciação em Análises Clínicas
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Sequence the 3 billion base pairs of human
A Lot More Advanced Biotechnology Tools
Presentation transcript:

Last lecture summary

recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector (plasmid, BAC), PCR genome mapping relative locations of genes are established by following inheritance patterns visual appearance of a chromosome when stained and examined under a microscope the order and spacing of the genes, measured in base pairs sequence map

genetic markers polymorphic (alternative alleles) restriction fragment length polymorphisms (RFLPs) some restriction sites exist as two alleles simple sequence length polymorphisms (SSLPs) repeat sequences, minisatellites (repeat unit up to 25 bp), microsatellites (repeat unit of 2-4 bp) single nucleotide polymorphisms (SNPs, pron.: “snips”) Positions in a genome where some individuals have one nucleotide and others have a different nucleotide RFLPSSLP

New stuff

DNA sequencing Sanger method, chain-termination method, developed 1974, Nobel prize in chemistry 1980 The key principle: use of dideoxynucleotide triphosphates (ddNTPs) as DNA chain terminators. source: dNTPddNTP

source: wikipedia

Shotgun sequencing Current technology can only reliably sequence a short stretch – a ‘read’ is typically ~1000 bp. However genomes are large. The sequence of a long DNA molecule has to be constructed from a series of shorter sequences. This is done by breaking (cleaving by restriction endonuclease) the molecule into fragments, determining their sequences, and using a computer to search for overlaps and build up the master sequence This shotgun sequencing is the standard approach for sequencing small prokaryotic genomes. But is much more difficult with larger eukaryotic genomes, as it can lead to errors when repeats are analyzed. human genome is repeat-rich, >50% repeats ( kpb duplicated regions with >98% identity)

Target Copies Shotgun Sequence each short piece Sequence assembly Consensus Finalizing (directed read) source: slides by Martin Farach-Colton

source: Brown T. A., Genomes. 2nd ed.

Human genome project (HGP) Determine the sequence of haploid human genome Govermentally funded (DOE) Began in 1990, working draft published in 2001, complete in 2003, last chromosome finished in 2006 Cost: $3 billion Whose genome was sequenced? The “reference genome” is a composite from several people who donated blood samples.

Celera - competition begins In 1998, a similar, privately funded quest was launched by the American researcher Craig Venter, and his company Celera Genomics. The $300,000,000 Celera effort was intended to proceed at a faster pace and at a fraction of the cost. Celera wanted to patent identified genes. Celera promised to release data annually (while the HGP daily). However, Celera would, unlike HGP, not permit free redistribution or scientific use of the data. HGP was compelled to release ( ) the first draft of the human genome before Celera for this reason.

How did it end? March 2000 – president Clinton announced that the genome sequence could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotechnology-heavy Nasdaq. The biotechnology sector lost about $50 billion in two days. Celera and HGP annouced jointly the draft sequence in The drafts covered about 83% of the genome. Improved drafts were announced in 2003 and 2005, filling in to ≈92% of the sequence currently.

Human genome 3 billions bps, ~ – genes Only 1.1 – 1.4 % of the genome sequence codes for proteins. State of completion: best estimate – 92.3% is complete problematic unfinished regions: centromeres, telomeres (both contain highly repetitive sequences), some unclosed gaps It is likely that the centromeres and telomeres will remain unsequenced until new technology is developed Genome is stored in databases Primary database – Genebank ( Additional data and annotation, tools for visualizing and searching UCSCS ( ) Ensembl ( )

Hierarchical genome shotgun – HGS Hierarchical genome shotgun, hierarchical shotgun sequencing, clone-by-clone sequencing, map-based shotgun sequencing, clone contig sequencing Adopted by HGP Strategy “map first, sequence second” Create physical map Divide chromosomes to smaller fragments. Order (map) them to correspond to their respective locations on the chromosomes. Determine the base sequence of each of the mapped fragments.

Hierarchical genome shotgun – HGS 1. Map genome As genetic markers (landmarks), short tagged sites (STS) were used (200 to 500 base pair DNA sequence that has a single occurrence in the genome) 2. Copy target DNA 3. Make BAC library cleave (partial cleavage by restriction endonuclease) all target DNA copies randomly, insert these sub-clones into BACs 4. Physically map all BACs 5. Find a subset of BACs that cover target DNA minimal tiling path 6. Shotgun sequence only BACs at minimal tailing path Divide BACs into fragments (ultrasound or pressure), do plasmid cloning, reconstruct BAC sequence 7. Fill in gaps between BACS 8. Merge into consensus sequence The sequenced sub-clones are linked up to produce the DNA contig, which is the de-coded version of the original source DNA. As this method progresses, larger and larger contigs will be produced, until a single ordered contig of the genome is achieved.

Minimal tiling path A collection of overlapping bacterial artificial chromosome (BAC) clones. The clones outlined in red, which provide a minimal tiling path across the corresponding genomic region, are selected for sequencing.

Coverage As it was shown, individual nucleotides are represented more than with one read. Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. Let’s say that for a source strand of length G = 100 Kbp we sequence R = reads of average legth L = 500. Thus, we collect N = RL = 750 Kbps of data. So we have sequenced on average every bp in the source N/G = 7.5 times. The coverage is 7.5X Coverage in HGS adopted by HGP was 8X.

Whole genome shotgun – WGS Adopted by Celera. De facto application of shotgun to large genome. Never done before on such a large scale. Expensive and time consuming mapping is not performed. Each piece of DNA is cut into smaller fragments. Each fragment is sequenced first, and then overlapping sequences are joined together to create the contig. To achieve enough of accuracy, higher coverage (20X) had to be used. Crucial for the assembly was development of new algorithms.

Genome assembly Aligning and merging short fragments of DNA sequence in order to reconstruct the original (loger) sequence. reads – typically bp, merge them into contigs, arrange/merge contigs into scaffolds scaffold – a series of contigs that are in the right order but are not necessarily connected in one continuous stretch of sequence source: Xiong, Essential Bioinformatics

Genome assembly Can be very computationally intensive when dealt at the whole genome level. Major challenges: sequence errors – can be corrected by drawing consensus sequence from an alignment of multiple overlapped sequences contamination by bacterial vectors – can be removed using filtering programs prior to assembly repeats – RepeatMasker ( can be used to detect and mask repeatshttp://

Forward-reverse constraint Common constraint to avoid errors due to the repeats: forward-reverse constraint Sequence is generated from both ends of a clone → distance between the two opposing fragments of a clone is fixed to a certain range (defined by a clone length). When the constraint is applied, even when one of the fragments has a perfect match with a repetitive element outside the range, it is not able to be moved to that location to cause missassembly. source: Xiong, Essential Bioinformatics no constraint constraint red fragment is misassembled

Sequence assemblers base calling – convert raw/processed data from a sequencing instrument into sequences and scores individual bases have scores, reflect the likelihood the base is correct/incorrect in capillary sequencing, identify the sequence from chromatogram source: Lee SH, Vigliotti VS, Pappu S., J Clin Pathol. 2010, 63(3) PMID:

PHRED

PHRAP Sequence assembly Takes PHRED base-call files with quality scores as input. Aligns individual fragments in a pairwise fashion. The base quality information is taken into account during the pairwise alignment.

Personal human genomes Personal genomes had not been sequenced in the Human Genome Project to protect the identity of volunteers who provided DNA samples. Following personal genomes are available by July 2011: Japanese male (2010, PMID: ) Korean male (2009, PMID: ) Chinese male (2008, PMID: ) Nigerian male (2008, PMID: ) J. D. Watson (2008, PMID: ) J. C. Venter (2007, PMID: ) HGP sequence is haploid, however, the sequence maps for Venter and Watson for example are diploid, representing both sets of chromosomes.