Download presentation
Presentation is loading. Please wait.
1
Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College marth@bc.edu Medical Genomics Course – Debrecen, Hungary, May 2006
2
Lecture overview 1. Genome sequencing strategies, sequencing informatics 2. Genome annotation, functional and structural features in the human genome 3. Genome variability, DNA nucleotide, structural, and epigenetic variations
3
1. The Human genome sequence
4
The nuclear genome (chromosomes)
5
The genome sequence the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)
6
Completed genomes ~1 Mb ~100 Mb >100 Mb ~3,000 Mb
7
Main genome sequencing strategies Clone-based shotgun sequencing Whole-genome shotgun sequencing Human Genome ProjectCelera Genomics, Inc.
8
Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing sequence reconstruction (sequence assembly) Lander et al. Nature 2001
9
Clone mapping – “sequence ready” map
10
Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001
11
Shotgun subclone library construction BAC primary clone cloning vector sequencing vector subclone insert
12
Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001
13
Sequencing
14
Robotic automation Lander et al. Nature 2001
15
Base calling PHRED base = A Q = 40
16
Vector clipping
17
Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001
18
Sequence assembly PHRAP
19
Repetitive DNA may confuse assembly
20
Sequence completion (finishing) CONSED, AUTOFINISH gap region of low sequence coverage and/or quality
21
2. Human genome annotation
22
Genome annotation – Goals protein coding genesRNA genes repetitive elements GC content
23
The starting material AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT
24
Coding genes – ab initio predictions ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA Open Reading Frame = ORF Stop codon Start codon PolyA signal
25
Ab initio predictions Gene structure
26
Ab initio predictions …AGAATAGGGCGCGTACCTTCCAACGAAGACTGGG… splice donor site splice acceptor site
27
Ab initio predictions Genscan Grail Genie GeneFinder Glimmer etc… EST_genome Sim4 Spidey EXALIN
28
Homology based predictions ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA ACGGAAGTCT known coding sequence from another organism GGACTATAAA expressed sequence genes predicted by homology Genomescan Twinscan etc…
29
Consolidation – gene prediction systems Otto Ensembl FgenesH Genscan Grail Genewise Sim4 dbEst
30
ncRNA genes prediction based on structure (e.g. tRNAs) for other novel ncRNAs, only homology-based predictions have been successful
31
Repeat annotations Repeat annotation are based on sequence similarity to known repetitive elements in a repeat sequence library
32
The landscape of the human genome
33
Gene annotations – # of coding genes Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001
34
Gene annotations – gene length Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001
35
Gene annotations – gene function Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001
36
GC content and coding potential Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001
37
ncRNAs Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001
38
Segmental duplications Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001
39
Repeat elements Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001
40
Genes and repeats
41
Physical vs. genetic map (Mb/cM) 0.4 cM1.3 cM0.7 cM 0.4 Mb0.7 Mb0.3 Mb
42
3. Human genome variability
43
DNA sequence variations the reference Human genome sequence is 99.9% common to each human being sequence variations make our genetic makeup unique SNP the most abundant human variations are single-nucleotide polymorphisms (SNPs) – 10 million SNPs are currently known
44
DNA sequence variations insertion-deletion (INDEL) polymorphisms
45
Structural variations Speicher & Carter, NRG 2005
46
Structural variations Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767
47
Detection of structural variants Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767
48
Epigenetic changes: chromatin structure Sproul, NRG 2005
49
Epigenetic changes: DNA methylation Laird, NRC 2003
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.