Download presentation
Presentation is loading. Please wait.
1
Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics
2
Genome annotation – Goals protein coding genesRNA genes repetitive elements GC content
3
The starting material AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT
4
Coding genes – ab initio predictions ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA Open Reading Frame = ORF Stop codon Start codon PolyA signal
5
Ab initio predictions Gene structure
6
Ab initio predictions …AGAATAGGGCGCGTACCTTCCAACGAAGACTGGG… splice donor site splice acceptor site
7
Ab initio predictions Genscan Grail Genie GeneFinder Glimmer etc… EST_genome Sim4 Spidey
8
Homology based predictions ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA ACGGAAGTCT known coding sequence from another organism GGACTATAAA expressed sequence genes predicted by homology Genomescan Twinscan etc…
9
Consolidation – gene prediction systems Otto Ensembl FgenesH Genscan Grail Genewise Sim4 dbEst
10
ncRNA genes prediction based on structure (e.g. tRNAs) for other novel ncRNAs, only homology-based predictions have been successful
11
Repeat annotations Repeat annotation are based on sequence similarity to known repetitive elements in a repeat sequence library
12
The landscape of the human genome
13
Gene annotations – # of coding genes
14
Gene annotations – gene length
15
Gene annotations – gene function
16
GC content and coding potential
17
ncRNAs
18
Segmental duplication
19
Repeat elements
20
Genes and repeats
21
Physical vs. genetic map (MB/cM)
22
Synteny (human-mouse)
23
Gene duplication – paralogs
24
Gene classes across organisms
25
Gene conservation across organisms
26
Human SNPs – polymorphism rate in different regions of given lengths at the scale of the chromosomes
27
Human SNPs – polymorphism rate G+C nucleotide content CpG di-nucleotide content recombination rate functional constraints 3’ UTR5.00 x 10 -4 5’ UTR4.95 x 10 -4 Exon, overall4.20 x 10 -4 Exon, coding3.77 x 10 -4 synonymous 366 / 653 non-synonymous287 / 653
28
Human SNPs – polymorphism rate
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.