Sequencing a genome. Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native.

Slides:



Advertisements
Similar presentations
The Human Genome Project
Advertisements

Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Biotechnology Chapter 11.
Genome organization Lesk, Ch 2 (Lesk, 2008). Genomes and proteomes Genome of a typical bacterium comes as a single DNA molecule of about 5 million characters.
9 Genomics and Beyond Brief Chapter Outline
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
The Human Genome Race. Collins vs. Venter Collins Venter.
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Genome sequencing and assembling
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Reading the Blueprint of Life
DNA Technology and Genomics
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Chapter 20 DNA Technology and Genomics
Mouse Genome Sequencing
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Chapter 13 Section 1 DNA Technology. DNA Identification Only.10% of the human genome varies from person to person 98% of our genetic makeup does not code.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
1 Genetics Faculty of Agriculture Instructor: Dr. Jihad Abdallah Topic 13:Recombinant DNA Technology.
Genomics BIT 220 Chapter 21.
Technological Solutions. In 1977 Sanger et al. were able to work out the complete nucleotide sequence in a virus – (Phage 0X174) This breakthrough allowed.
Module 1 Section 1.3 DNA Technology
Section 2 Genetics and Biotechnology DNA Technology
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
© 2012 Pearson Education, Inc. Lecture by Edward J. Zalisko PowerPoint Lectures for Campbell Biology: Concepts & Connections, Seventh Edition Reece, Taylor,
Chapter 21 Eukaryotic Genome Sequences
DNA Fingerprinting
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
DNA TECHNOLOGY AND GENOMICS CHAPTER 20 P
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
Chapter 20 DNA Technology and Genomics. Viruses have restriction enzymes to attack and destroy invading viral DNA. Restriction enzymes cut DNA at specific.
15.2, slides with notes to write down
Human Genome.
Highlights of DNA Technology. Cloning technology has many applications: Many copies of the gene are made Protein products can be produced.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Genetic Engineering Genetic engineering is also referred to as recombinant DNA technology – new combinations of genetic material are produced by artificially.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Genomics Chapter 18.
Genomics Part 1. Human Genome Project  G oal is to identify the DNA sequence of every gene in humans Genome  all the DNA in one cell of an organism.
DNA Technology Ch. 20. The Human Genome The human genome has over 3 billion base pairs 97% does not code for proteins Called “Junk DNA” or “Noncoding.
Chapter 20 DNA Technology and Genomics. Biotechnology is the manipulation of organisms or their components to make useful products. Recombinant DNA is.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
Objectives: Outline the steps involved in sequencing the genome of an organism. Outline how gene sequencing allows for genome wide comparisons between.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Biotechnology.
Human Genome Project.
15.2, slides with notes to write down
Section 2 Genetics and Biotechnology DNA Technology
Chapter 4 “DNA Finger Printing”
Stuff to Do.
Bellwork: What is the human genome project. What was its purpose
CHAPTER 12 DNA Technology and the Human Genome
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Human Genome Project Seminal achievement. Scientific milestone.
Presentation transcript:

Sequencing a genome

Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native structures of proteins from the known amino acid sequence, i.e., protein folding, has become pressing in structural genomics and computational biology. Though it is plausible to use molecular dynamics (MD) simulations to study the folding of proteins, the currently available methodologies are incapable of addressing the timescale problems. In this talk, I will describe the recent advances in the development of two new multiscale integrators that allow very large time steps (and thus ``approximate'' molecular dynamics)

Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism

Basic problem Genomes are large (typically millions or billions of base pairs) Current technology can only reliably ‘read’ a short stretch – typically hundreds of base pairs

Elements of a solution Automation – over the past decade, the amount of hand-labor in the ‘reads’ has been steadily and dramatically reduced Assembly of the reads into sequences is an algorithmic and computational problem

A human drama There are competing methods of assembly The competing – public and private – sequencing teams used competing assembly methods

Assembly: Putting sequenced fragments of DNA into their correct chromosomal positions

BAC Bacterial artificial chromosome: bacterial DNA spliced with a medium- sized fragment of a genome (100 to 300 kb) to be amplified in bacteria and sequenced.

Contig Contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome (whether natural or artificial, as in BACs)

Cosmid DNA from a bacterial virus spliced with a small fragment of a genome (45 kb or less) to be amplified and sequenced

Directed sequencing Successively sequencing DNA from adjacent stretches of chromosome

Draft sequence Sequence with lower accuracy than a finished sequence; some segments are missing or in the wrong order or orientation

EST Expressed sequence tag: a unique stretch of DNA within a coding region of a gene; useful for identifying full- length genes and as a landmark for mapping

Exon Region of a gene’s DNA that encodes a portion of its protein; exons are interspersed with noncoding introns

Genome The entire chromosomal genetic material of an organism

Intron Region of a gene’s DNA that is not translated into a protein

Kilobase (kb) Unit of DNA equal to 1000 bases

Locus Chromosomal location of a gene or other piece of DNA

Megabase (mb) Unit of DNA equal to 1 million bases

PCR Polymerase chain reaction: a technique for amplifying a piece of DNA quickly and cheaply

Physical map A map of the locations of identifiable markers spaced along the chromosomes; a physical map may also be a set of overlapping clones

Plasmid Loop of bacterial DNA that replicates independently of the chromosomes; artificial plasmids can be inserted into bacteria to amplify DNA for sequencing

Regulatory region A segment of DNA that controls whether a gene will be expressed and to what degree

Repetitive DNA Sequences of varying lenths that occur in multiple copies in the genome; it represents much of the genome

Restriction enzyme An enzyme that cuts DNA at specific sequences of base pairs

RFLP Restriction fragment length polymorphism: genetic variation in the length of DNA fragments produced by restriction enzymes; useful as markers on maps

Scaffold A series of contigs that are in the right order but are not necessarily connected in one continuous stretch of sequence

Shotgun sequencing Breaking DNA into many small pieces, sequencing the pieces, and assembling the fragments

STS Sequence tagged site: a unique stretch of DNA whose location is known; serves as a landmark for mapping and assembly

YAC Yeast artificial chromosome: yeast DNA spliced with a large fragment of a genome (up to 1 mb) to be amplified in yeast cells and sequenced

Readings Myers, “Whole Genome DNA Sequencing,” Venter, et al, “The Sequence of the Human Genome,” Science, 16 Feb 2001, Vol. 291 No 5507, 1304 (parts 1 & 2) Waterston, Lander, Sulston, “On the sequencing of the human genome,” PNAS, March 19, 2002, Vol 99, no 6, Myers, et.al., “On the sequencing and assembly of the human genome,”

Hierarchical sequencing Create a high-level physical map, using ESTs and STSs Shred genome into overlapping clones Multiply clones in BACs ‘shotgun’ each clone Read each ‘shotgunned’ fragment Assemble the fragments

Physical map

Whole genome sequencing (WGS) Make multiple copies of the target Randomly ‘shotgun’ each target, discarding very big and very small pieces Read each fragment Reassemble the ‘reads’

Hierarchical v. whole-genome

The fragment assembly problem Aim: infer the target from the reads Difficulties – –Incomplete coverage. Leaves contigs separated by gaps of unknown size. –Sequencing errors. Rate increases with length of read. Less than some . –Unknown orientation. Don’t know whether to use read or its Watson-Crick complement.

Scaling and computational complexity Increasing size of target G. –1990 – 40kb (one cosmid) –1995 – 1.8 mb (H. Influenza) –2001 – 3,200 mb (H. sapiens)

The repeat problem Repeats –Bigger G means more repeats –Complex organisms have more repetitive elements –Small repeats may appear multiple times in a read –Long repeats may be bigger than reads (no unique region)

Gaps Read length L R hasn’t changed much  = L R /G gets steadily smaller Gaps ~ Re -  R (Waterman & Lander)

How deep must coverage be?

Double-barreled shotgun sequencing Choose longer fragments (say, 2 x L R ) Read both ends Such fragments probably span gaps This gives an approximate size of the gap This links contigs into scaffolds

Genomic results

HGSC v Celera results

To do or not to do? “The idea is gathering momentum. I shiver at the thought.” – David Baltimore, 1986 “If there is anything worth doing twice, it’s the human genome.” – David Haussler, 2000

Public or private? “This information is so important that it cannot be proprietary.” – C Thomas Caskey, 1987 “If a company behaves in what scientists believe is a socially responsible manner, they can’t make a profit.” – Robert Cook- Deegan, 1987

HW for Feb 19 Comment on these assertions words: –WLS – “Our analysis indicates that the Celera paper provides neither a meaningful test of the WGS approach nor an independent sequence of the human genome.” –Venter – “This conclusion is based on incorrect assumptions and flawed reasoning.” Lesk, Exercise 2.15, problem 2.3