DNA Sequencing. CS273a Lecture 3, Autumn 08, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.

Slides:



Advertisements
Similar presentations
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Advertisements

Connection Between Alignment and HMMs. A state model for alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII.
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
DNA Sequencing.
DNA Sequencing. CS273a Lecture 3, Spring 07, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
DNA Sequencing. Next few topics DNA Sequencing  Sequencing strategies Hierarchical Online (Walking) Whole Genome Shotgun  Sequencing Assembly Gene Recognition.
DNA Sequencing and Assembly
DNA Sequencing.
DNA Sequencing. CS262 Lecture 9, Win07, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
DNA Sequencing.
DNA Sequencing. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT.
Conditional Random Fields 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
DNA Sequencing. CS262 Lecture 9, Win06, Batzoglou DNA Sequencing – gel electrophoresis 1.Start at primer(restriction site) 2.Grow DNA chain 3.Include.
DNA Sequencing. DNA sequencing … ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC.
DNA Sequencing. CS262 Lecture 9, Win06, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
CS273a Lecture 1, Autumn 10, Batzoglou DNA Sequencing.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
CS273a Lecture 4, Autumn 08, Batzoglou DNA Sequencing.
Human population migrations Out of Africa, Replacement –Single mother of all humans (Eve) ~150,000yr –Single father of all humans (Adam) ~70,000yr –Humans.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
CS262 Lecture 9, Win07, Batzoglou Conditional Random Fields A brief description.
DNA Sequencing. Next few topics DNA Sequencing  Sequencing strategies Hierarchical Online (Walking) Whole Genome Shotgun  Sequencing Assembly Gene Recognition.
Genome sequencing and assembling
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Basic Molecular Biology Many slides by Omkar Deshpande.
DNA Technology and Genomics
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
The Clone Age Human Genome Project Recombinant DNA Gel Electrophoresis DNA fingerprints
AP Biology Ch. 20 Biotechnology.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Chapter 16 Gene Technology. Focus of Chapter u An introduction to the methods and developments in: u Recombinant DNA u Genetic Engineering u Biotechnology.
CS273a Lecture 4, Autumn 08, Batzoglou CS273a 2011 DNA Sequencing.
Genome sequencing Haixu Tang School of Informatics.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Genome sequencing Haixu Tang School of Informatics.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
DNA Technology Chapter 11. Genetic Technology- Terms to Know Genetic engineering- Genetic engineering- Recombinant DNA- DNA made from 2 or more organisms.
Alu Polymorphysims as Identity Markers Using PCR and Gel Electrophoresis.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
CS273a Lecture 4, Autumn 08, Batzoglou CS273a 2013 DNA Structure.
Human Genome.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
DNA Sequencing.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Genomics Chapter 18.
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
Whole Genome Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 13, 2005 ChengXiang Zhai Department of Computer Science University of.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
VL Algorithmische BioInformatik (19710) WS2015/2016 Woche 7 - Mittwoch Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
Objectives: Outline the steps involved in sequencing the genome of an organism. Outline how gene sequencing allows for genome wide comparisons between.
CSCI2950-C Lecture 2 DNA Sequencing and Fragment Assembly
DNA Sequencing.
DNA Sequencing Project
CSCI2950-C Genomes, Networks, and Cancer
Genomes and Their Evolution
Stuff to Do.
The Human Genome Project
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
A Sequenciação em Análises Clínicas
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Presentation transcript:

DNA Sequencing

CS273a Lecture 3, Autumn 08, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…

CS273a Lecture 3, Autumn 08, Batzoglou Which representative of the species? Which human? Answer one: Answer two: it doesn’t matter Polymorphism rate: number of letter changes between two different members of a species Humans: ~1/1,000 Other organisms have much higher polymorphism rates  Population size!

CS273a Lecture 3, Autumn 08, Batzoglou Why humans are so similar A small population that interbred reduced the genetic variation Out of Africa ~ 40,000 years ago Out of Africa Heterozygosity: H H = 4Nu/(1 + 4Nu) u ~ 10 -8, N ~ 10 4  H ~ 4  N

CS273a Lecture 3, Autumn 08, Batzoglou Human population migrations Out of Africa, Replacement  “Grandma” of all humans (Eve) ~150,000yr Ancestor of all mtDNA  “Grandpa” of all humans (Adam) ~100,000yr Ancestor of all Y-chromosomes Multiregional Evolution  Fossil records show a continuous change of morphological features  Proponents of the theory doubt mtDNA and other genetic evidence New fossil records bury “multirigionalists”  Nice article in Economist on that

CS273a Lecture 3, Autumn 08, Batzoglou DNA Sequencing – Overview Gel electrophoresis  Predominant, old technology by F. Sanger Whole genome strategies  Physical mapping  Walking  Shotgun sequencing Computational fragment assembly The future—new sequencing technologies  Pyrosequencing, single molecule methods, …  Assembly techniques Future variants of sequencing  Resequencing of humans Resequencing of humans  Microbial and environmental sequencing Microbial and environmental sequencing  Cancer genome sequencing Cancer genome sequencing

CS273a Lecture 3, Autumn 08, Batzoglou DNA Sequencing Goal: Find the complete sequence of A, C, G, T’s in DNA Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence ~900 letters at a time

CS273a Lecture 3, Autumn 08, Batzoglou DNA Sequencing – vectors + = DNA Shake DNA fragments Vector Circular genome (bacterium, plasmid) Known location (restriction site)

CS273a Lecture 3, Autumn 08, Batzoglou Different types of vectors VECTORSize of insert Plasmid 2,000-10,000 Can control the size Cosmid40,000 BAC (Bacterial Artificial Chromosome) 70, ,000 YAC (Yeast Artificial Chromosome) > 300,000 Not used much recently

CS273a Lecture 3, Autumn 08, Batzoglou DNA Sequencing – gel electrophoresis 1.Start at primer(restriction site) 2.Grow DNA chain 3.Include dideoxynucleoside (modified a, c, g, t) 4.Stops reaction at all possible points 5.Separate products with length, using gel electrophoresis

CS273a Lecture 3, Autumn 08, Batzoglou Method to sequence longer regions cut many times at random (Shotgun) genomic segment Get one or two reads from each segment ~900 bp

CS273a Lecture 3, Autumn 08, Batzoglou Reconstructing the Sequence (Fragment Assembly) Cover region with high redundancy Overlap & extend reads to reconstruct the original genomic region reads

CS273a Lecture 3, Autumn 08, Batzoglou Definition of Coverage Length of genomic segment:G Number of reads: N Length of each read:L Definition: Coverage C = N L / G How much coverage is enough? Lander-Waterman model:Prob[ not covered bp ] = e -C Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides C

CS273a Lecture 3, Autumn 08, Batzoglou Repeats Bacterial genomes:5% Mammals:50% Repeat types: Low-Complexity DNA (e.g. ATATATATACATA…) Microsatellite repeats (a 1 …a k ) N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) Transposons  SINE (Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 10 6 copies  LINE (Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies  LTR retroposons (Long Terminal Repeats (~700 bp) at each end) cousins of HIV Gene Families genes duplicate & then diverge (paralogs) Recent duplications ~100,000-long, very similar copies

CS273a Lecture 3, Autumn 08, Batzoglou Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions

CS273a Lecture 3, Autumn 08, Batzoglou What can we do about repeats? Two main approaches: Cluster the reads Link the reads

CS273a Lecture 3, Autumn 08, Batzoglou What can we do about repeats? Two main approaches: Cluster the reads Link the reads

CS273a Lecture 3, Autumn 08, Batzoglou What can we do about repeats? Two main approaches: Cluster the reads Link the reads

CS273a Lecture 3, Autumn 08, Batzoglou Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides C R D ARB, CRD or ARD, CRB ? ARB

CS273a Lecture 3, Autumn 08, Batzoglou Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides

CS273a Lecture 3, Autumn 08, Batzoglou Strategies for whole-genome sequencing 1.Hierarchical – Clone-by-clone i.Break genome into many long pieces ii.Map each long piece onto the genome iii.Sequence each piece with shotgun Example: Yeast, Worm, Human, Rat 2.Online version of (1) – Walking i.Break genome into many long pieces ii.Start sequencing each piece with shotgun iii.Construct map as you go Example: Rice genome 3.Whole genome shotgun One large shotgun pass on the whole genome Example: Drosophila, Human (Celera), Neurospora, Mouse, Rat, Dog