CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)

Slides:



Advertisements
Similar presentations
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Advertisements

Connection Between Alignment and HMMs. A state model for alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII.
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
DNA Sequencing Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the circular genome (host)
DNA Sequencing.
DNA Sequencing. CS273a Lecture 3, Spring 07, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
Stuff to Do. Midterm I questions due 1/31 me your question (with answers), –if you have the capability, mail complete questions, figures, etc. and.
DNA Sequencing. Next few topics DNA Sequencing  Sequencing strategies Hierarchical Online (Walking) Whole Genome Shotgun  Sequencing Assembly Gene Recognition.
DNA Sequencing and Assembly
DNA Sequencing.
DNA Sequencing. CS262 Lecture 9, Win07, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
DNA Sequencing.
DNA Sequencing. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT.
DNA Sequencing. CS262 Lecture 9, Win06, Batzoglou DNA Sequencing – gel electrophoresis 1.Start at primer(restriction site) 2.Grow DNA chain 3.Include.
DNA Sequencing. CS262 Lecture 9, Win06, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
CS273a Lecture 1, Autumn 10, Batzoglou DNA Sequencing.
DNA Sequencing. CS273a Lecture 3, Autumn 08, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
Cloning:Recombinant DNA
CS273a Lecture 4, Autumn 08, Batzoglou DNA Sequencing.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
CS262 Lecture 9, Win07, Batzoglou Conditional Random Fields A brief description.
DNA Sequencing. Next few topics DNA Sequencing  Sequencing strategies Hierarchical Online (Walking) Whole Genome Shotgun  Sequencing Assembly Gene Recognition.
Genome sequencing and assembling
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
DNA Technology and Genomics
Genetic Engineering Do you want a footer?.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Mouse Genome Sequencing
Genomic walking (1) To start, you need: -the DNA sequence of a small region of the chromosome -An adaptor: a small piece of DNA, nucleotides long.
1 Genetics Faculty of Agriculture Instructor: Dr. Jihad Abdallah Topic 13:Recombinant DNA Technology.
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Biotechnology Methods Producing Recombinant DNAProducing Recombinant DNA Locating Specific GenesLocating Specific Genes Studying DNA SequencesStudying.
CS273a Lecture 4, Autumn 08, Batzoglou CS273a 2011 DNA Sequencing.
Genome sequencing Haixu Tang School of Informatics.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
CS273a Lecture 4, Autumn 08, Batzoglou CS273a 2013 DNA Structure.
Human Genome.
Lecture # 04 Cloning Vectors.
Genetic Engineering Genetic engineering is also referred to as recombinant DNA technology – new combinations of genetic material are produced by artificially.
DNA Sequencing.
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
Whole Genome Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 13, 2005 ChengXiang Zhai Department of Computer Science University of.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
VL Algorithmische BioInformatik (19710) WS2015/2016 Woche 7 - Mittwoch Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
VECTORS: TYPES AND CHARACTERISTICS
Objectives: Outline the steps involved in sequencing the genome of an organism. Outline how gene sequencing allows for genome wide comparisons between.
DNA Sequencing Project
Molecular Genetic Analysis and Biotechnology
Genomics Sequencing genomes.
Pre-genomic era: finding your own clones
Stuff to Do.
A Sequenciação em Análises Clínicas
CSCI 1810 Computational Molecular Biology 2018
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Presentation transcript:

CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)

CS273a Lecture 2, Autumn 10, Batzoglou CS273a Lecture million species Each individual has different DNA Within individual, some cells have different DNA (i.e. cancer) Sequencing Applications

CS273a Lecture 2, Autumn 10, Batzoglou CS273a Lecture 1 What genes are on/off, when, and in which cells? Where do molecules bind to DNA? Sequencing Applications

CS273a Lecture 2, Autumn 10, Batzoglou Method to sequence longer regions cut many times at random (Shotgun) genomic segment Get one or two reads from each segment ~900 bp

CS273a Lecture 2, Autumn 10, Batzoglou Reconstructing the Sequence (Fragment Assembly) Cover region with high redundancy Overlap & extend reads to reconstruct the original genomic region reads

CS273a Lecture 2, Autumn 10, Batzoglou Definition of Coverage Length of genomic segment:G Number of reads: N Length of each read:L Definition: Coverage C = N L / G How much coverage is enough? Lander-Waterman model:Prob[ not covered bp ] = e -C Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides C

CS273a Lecture 2, Autumn 10, Batzoglou Repeats Bacterial genomes:5% Mammals:50% Repeat types: Low-Complexity DNA (e.g. ATATATATACATA…) Microsatellite repeats (a 1 …a k ) N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) Transposons  SINE (Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 10 6 copies  LINE (Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies  LTR retroposons (Long Terminal Repeats (~700 bp) at each end) cousins of HIV Gene Families genes duplicate & then diverge (paralogs) Recent duplications ~100,000-long, very similar copies

CS273a Lecture 2, Autumn 10, Batzoglou Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions

CS273a Lecture 2, Autumn 10, Batzoglou What can we do about repeats? Two main approaches: Cluster the reads Link the reads

CS273a Lecture 2, Autumn 10, Batzoglou What can we do about repeats? Two main approaches: Cluster the reads Link the reads

CS273a Lecture 2, Autumn 10, Batzoglou What can we do about repeats? Two main approaches: Cluster the reads Link the reads

CS273a Lecture 2, Autumn 10, Batzoglou Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides C R D ARB, CRD or ARD, CRB ? ARB

CS273a Lecture 2, Autumn 10, Batzoglou Sequencing and Fragment Assembly AGTAGCACAGA CTACGACGAGA CGATCGTGCGA GCGACGGCGTA GTGTGCTGTAC TGTCGTGTGTG TGTACTCTCCT 3x10 9 nucleotides

CS273a Lecture 2, Autumn 10, Batzoglou Strategies for whole-genome sequencing 1.Hierarchical – Clone-by-clone i.Break genome into many long pieces ii.Map each long piece onto the genome iii.Sequence each piece with shotgun Example: Yeast, Worm, Human, Rat 2.Online version of (1) – Walking i.Break genome into many long pieces ii.Start sequencing each piece with shotgun iii.Construct map as you go Example: Rice genome 3.Whole genome shotgun One large shotgun pass on the whole genome Example: Drosophila, Human (Celera), Neurospora, Mouse, Rat, Dog

CS273a Lecture 2, Autumn 10, Batzoglou Hierarchical Sequencing

CS273a Lecture 2, Autumn 10, Batzoglou Hierarchical Sequencing Strategy 1.Obtain a large collection of BAC clones 2.Map them onto the genome (Physical Mapping) 3.Select a minimum tiling path 4.Sequence each clone in the path with shotgun 5.Assemble 6.Put everything together a BAC clone map genome

CS273a Lecture 2, Autumn 10, Batzoglou Hierarchical Sequencing Strategy 1.Obtain a large collection of BAC clones 2.Map them onto the genome (Physical Mapping) 3.Select a minimum tiling path 4.Sequence each clone in the path with shotgun 5.Assemble 6.Put everything together a BAC clone map genome

CS273a Lecture 2, Autumn 10, Batzoglou Methods of physical mapping Goal: Make a map of the locations of each clone relative to one another Use the map to select a minimal set of clones to sequence Methods: Hybridization Digestion

CS273a Lecture 2, Autumn 10, Batzoglou 1. Hybridization Short words, the probes, attach to complementary words 1.Construct many probes 2.Treat each BAC with all probes 3.Record which ones attach to it 4.Same words attaching to BACS X, Y  overlap p1p1 pnpn

CS273a Lecture 2, Autumn 10, Batzoglou 2.Digestion Restriction enzymes cut DNA where specific words appear 1.Cut each clone separately with an enzyme 2.Run fragments on a gel and measure length 3.Clones C a, C b have fragments of length { l i, l j, l k }  overlap Double digestion: Cut with enzyme A, enzyme B, then enzymes A + B

CS273a Lecture 2, Autumn 10, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the circular genome (host) that incorporated the fragment BAC Bacterial Artificial Chromosome, a type of insert–vector combination, typically of length kb read a long word that comes out of a sequencing machine coverage the average number of reads (or inserts) that cover a position in the target DNA piece shotgun the process of obtaining many reads sequencing from random locations in DNA, to detect overlaps and assemble

CS273a Lecture 2, Autumn 10, Batzoglou Whole Genome Shotgun Sequencing cut many times at random genome forward-reverse paired reads plasmids (2 – 10 Kbp) cosmids (40 Kbp) known dist ~800 bp