Download presentation
Presentation is loading. Please wait.
1
Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?
2
Eukaryotes Large Have organelles Diploid (mostly) linear chromosomes lower % coding Genes have introns
3
Genomes—How Big? Genome Size # of Genes H. influenzae 1.8 Mb1700 E. coli4.7 Mb4400 Yeast12 Mb6300 Diatom (Thaps)34 Mb11,000 Fruit Fly180 Mb13,600 Fugu400 Mb30,000 Human3000 Mb30,000
4
http://www.genomesize.com/ Gregory, 2004 Paleobiology 30:179-202 1pg ~= 1 billion base pairs (1000 Mbp).
5
Eukaryotic genomes are big What does this mean for sequencing? Strategies are similar Low coverage of large insert library (BACs, fosmids) Higher coverage of small insert library Finishing is harder Often additional mapping tools, RE maps, optical maps employed to map scaffolds to chromosomes Genomes released in “versions” (Thaps 3.0) Publications often based on draft versions
6
Where are draft Versions in GenBank? Model organisms have their own web sites YeastDB WormDB FlyBase
7
Eukaryotic genomes are diploid What does this mean for sequencing? Finishing is harder Will never get a 100% consensus Instead identify “high quality discrepancies” What is the sequence in the released genome? How to find where the SNPs are? T. pseudonana 0.75% of nuclear genome polymorphic
8
Eukaryotic genomes are arranged in linear chromosomes Finishing is harder Need to use additional maps to decide if contigs shoulf be joined or belong on their own chromosoms Additional mechanisms of gene duplication available/common
9
Eukaryotic genomes have low % coding Finishing is harder Much of non coding DNA made up of “selfish DNA” Repeats make assembly problematic Thaps: 2% of genome is retrotransposons Mammalian cells—less than 1% of genomic DNA is coding
10
Eukaryotic gene structure
11
Gene finding in eukaryotic genomes Relies on both signal sequences and coding statistics Signals: promoters, start and stop codons, splice sites, poly A sites These are all relatively weak signals Need to combine with codon statistics Organisms Specific Training Set is crucial Generated from cDNA library sequenced in conjucntion with genome project
12
Implications for Environmental genomics Need even more sequencing to get adequate coverage For any given piece of DNA, likely to have fewer genes than if were prokaryotic in origin Current state of gene finding and available genomes for comparison mean gene finders likely have very poor perfomrance on DNA of unknown origin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.