The Human Genome Project And how we got there… Sequencing technologies Sequencing strategies So what? What’s next
But before that… How do you find out the sequence of DNA? Sanger’s dideoxy sequencing method
Frederick Sanger Won the Nobel Prize in Chemistry twice 1958 – for sequencing insulin 1980 – for inventing a method for sequencing DNA (together with Gilbert) All the high-throughput sequencing methods in use today are based on the Sanger dideoxy method http://www.nlm.nih.gov/visibleproofs/media/detailed/vi_a_208b.jpg
Sanger’s recipe: Ingredients The DNA of interest - template An oligonucleotide primer to get the ball rolling A DNA polymerase dNTPs (deoxyribonucleotide triphosphates) – dATP, dCTP, dGTP, dTTP The special ingredient: ddNTPs
Concepts of Genetics 7th Ed, Klug and Cummings Revision from ATBMS: Nucleosides Triphosphates/deoxyribonucleotide triphosphates Concepts of Genetics 7th Ed, Klug and Cummings
Concepts of Genetics 7th Ed, Klug and Cummings Revision from ATBMS: Nucleosides Triphosphates/deoxyribonucleotide triphosphates Phosphodiester bonds are formed between the 3’ carbon of one nucleotide and the 5’ carbon of the next nucleotide Concepts of Genetics 7th Ed, Klug and Cummings
Linkage of two nucleotides Revision from ATBMS: Nucleosides Triphosphates/deoxyribonucleotide triphosphates Concepts of Genetics 7th Ed, Klug and Cummings
What’s special about ddNTP? This method is also known as the chain termination method Concepts of Genetics 7th Ed, Klug and Cummings
What’s special about ddNTP? Fluorescent dye coupled to N-base Each ddNTP - ddATP, ddCTP, ddGTP, ddTTP– is coupled to a different type of fluorescent dye – each ddNTP will absorb a characteristic laser wavelength and emit a characteristic colour
Sanger recipe: Method Divide DNA into 4 tubes with dNTPs and a different ddNTP in each tube and incubate Polymerase catalyses addition of dNTPs ddNTPs will terminate reactions Form oligonucleotides of varying lengths terminated by fluorescent ddNTPs
Denature DNA to produce single stranded oligonucleotides Load single stranded oligonucleotides and separate by electrophoresis – usually by capillary electrophoresis ‘Read’ DNA sequence What would an agarose gel look like?
Advances in technology… The use of fluorescently labelled ddNTPs (previously radioactive isotopes were used) Each ddNTP could be labelled with a different flurochrome Sequencing could be done in a single tube Capillaries replaced large sheet gels Fluorescence could be read by a laser, leading to: Automation The human genome was sequenced using Sanger’s dideoxy method
Capillary electrophoresis (from wikipedia) Capillary tube filled with agarose and buffer Electrical voltage applied across the capillary Oligonucleotides move across capillary, according to size
Typical Electropherogram But usually first 10-20 bp are not reliable, also limited to about 600-800 bp - Peaks get broader and smaller
The Human Genome
What’s in a genome? Genes that code for proteins – 2-3% - contain Open Reading Frame (ORF) beginning with start and stop codons Many genes have multiple copies or have several closely related ‘family’ members Regions coding for structural RNA (not proteins)– eg ribosomal RNA, tRNA Regulatory regions – binding regions for regulatory proteins, transcription factors
Moderately Repetitive DNA Functional Gene families eg globin, actin Gene family arrays eg histone genes, rRNA genes (250 copies), tRNA genes Without known function Short interspersed elements (SINES) eg Alu 200-300 bp long, 100,000s of copies, 13% Long interspersed elements (LINES) 1-5 kb long 10-10000 copies per genome, 21% Pseudogenes
Highly repetitive DNA About 15% of genome Minisatellites (Variable number tandem repeats (VNTR) Repeats of 14-500 bp segments scattered throughout genome, number of repeats varies on different chromosomes Microsatellites (Short tandem repeat polymorphisms (STRP) Regions up to 2-5 bp repeated many times 10-30 copies Hundreds of kb long Eg heterochromatin Telomeres 6 bp repeat 250 – 1000 repeats at the end of each chromosome
The race to sequence the human genome 3 billion bases in the human genome In 22 pairs of chromosomes + 2 sex chromosomes Only about 30,000 genes
2 competing approaches Hierarchical method Adopted by the publicly funded Human Genome Project Sequence of 12 individuals Whole genome shotgun (WGS) method Adopted by Celera, a for-profit company Sequence of 1 individual
Craig Venter Founder of Celera Applied whole genome shotgun sequencing method to human genome Made the first synthetic chromosome
Assignment for next week You will be working in 4 groups of 4-5. Explain to the class (10-15 min): Group 1and 2 – explain the following terms related to genome sequencing: 1: mapping, STSs and ESTs, coverage, contigs, golden tiling path, 2: library, BACs, finishing, annotation Group 3 – explain the hierarchical approach Group 4 – explain the whole genome shotgun approach