Download presentation
Presentation is loading. Please wait.
Published byJulia Peters Modified over 9 years ago
1
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA
2
2 Large and Complex Eukaryotes
3
3 Outline Eukaryotic Genome Annotation Fungal Genomics Program MycoCosm
4
4 Started with Human Genome Project
5
5 genome.jgi.doe.gov IMG MycoCosm 150+ annotated eukaryotic genomes
6
6 Genomic assembly and ESTs Annotation Pipeline Gene predictions Protein annotations Reference data mapping Repeat masking Manual curation (optional) Annotation Pipeline Analysis Gene families Gene expression Phylogenomics Proteomics Protein targeting etc Annotation Validations
7
7 Protein-based methods build CDS exons around known protein alignments. (Fgenesh, GeneWise) GenBank protein Transcript-based methods map or assemble transcripts on the genome, including UTRs (EST_map, Combest) EST contig Predict model Ab initio methods use knowledge of known genes’ structures to predict start, stop, and splice sites in CDS only. (Fgenesh+, GeneMark) Train on known genes ATG TGA GT AG exonsintrons 5’UTR 3’UTR Promoter PolyA Gene model Eukaryotic Gene Prediction
8
8 More Gene Prediction Use ESTs/cDNAs to extend, correct or predict gene models ESTEXT Predicted model ESTs Extended model 5’UTR3’UTR ATG TGA ATGTGA Detect orthologs with poor alignments and refine with synteny based methods FGENESH2 Genome A Genome B FGENESH Representative set GENEWISE EXTERNAL MODELS Non-redundant gene set is built from “the best” models from each locus according to homology and ESTs, followed by manual curation
9
9 Combine Gene Predictors for Better Quality EugeneGenemarkFgeneshJGI Pipe Number of gene models11,5479,6098,40912,270 Models with partial EST support5544382945675248 with full length EST support2538118228963073 EST coverage per gene77.7%68.2%80.8%79.1% supported splice sites41,58140,80845,49847,671 Models with homology support6758604357507214 with strong homology support (80+%ide, 80+%cov.) 112109174187 model coverage64%60%68%69% Models with homology and EST support 2894217227202953 Heterobasidion annosum v1.0
10
10 Re-annotation Using Comparative Genomics MAKERJGI pipelineRe-annot # of predicted gene models 9,94012,29012,802 with Swissprot hits6,5217,3567,900 With non-repeat PFAM domains 5,3656,0106,353 with EST support9,25210,79611,105 with >90% EST support 7,7299,1789,444 # of unique PFAM domains 2,2072,2452,322 EST coverage per gene 93.0%93.3% # EST-supported splice sites 99,627102,200104,246 Asaf Salamov
11
11 Predicted protein Protein Annotation Higher order assignments: Gene Ontology terms EC numbers --> KEGG pathways Gene families, with and without other species Possible orthologs (in nr, SwissProt, KEGG, KOG) Possible paralog (Blastp+MCL) Domain (InterPro, tmhmm) Signal peptide (signalP)
12
12 Validation with Transcriptomics Sanger454Illumina 5531 34 EST profile Processing RNA-Seq with CombEST models ESTs Old Sanger Days Transformation of EST sequencing
13
13 Validation with Proteomics Wright et al, BMC Genomics (2009)
14
14 Gene Cluster Analysis Comparative analysis
15
15 Genome Portal Framework
16
16 Many Genes of Eco-responsive Daphnia pulex First crustacean, aquatic animal sequenced, new model organism 30,940 predicted D.pulex genes in ~200Mb genome 85% supported by 1+ lines of evidence Colbourne et al, Science, 2011
17
17 Half of Daphnia Genes: no Homologs, Experessed Under Environmental Stress With Evgeny Zdobnov’s group (Univ. Genève) * Of 716 highly conserved single copy orthologs, Daphnia is missing only two Colbourne et al, 2011
18
18 Outline Eukaryotic Genome Annotation Fungal Genomics Program MycoCosm
19
19 Fungal Genomics for Energy & Environment Grow Grow Degrade Degrade Lignocellulose degradation Plant symbionts and pathogens Sugar Fermentation Ferment Ferment Bio-refinery GOAL: Scale up sequencing and analysis of fungal diversity for DOE science and applications
20
20 GOLD (October 2011) 758 fungal projects
21
21 Chapter 1: Plant health Symbiosis Plant Pathogenicity Biocontrol Chapter 2: Biorefinery Lignocellulose degradation Sugar fermentation Industrial organisms Chapter 3: Diversity Phylogentics Ecology Genomic Encyclopedia of Fungi
22
22 Genome-Centric View Comparative View http://jgi.doe.gov/fungi 100+ fungal genomes 5000+ visitors/month
23
23 Comparative Genome Analysis
24
24 Strategy: 1000 Fungal Genomes Goal: Sequencing 1000 fungal genomes from across the Fungal Tree of Life will provide references for research on plant-microbe interactions and environmental metagenomics.
25
25 Strategy: Fungal Systems Lichen: alga+ fungus ECM: plant+ fungus T.terrestris Forest soil metagenomes S.commune Model fungi Simple systems Complex environments
26
26 Model Mushroom Development Ohm et al, 2010 SEQUENCE FUNCTIONMODEL WT S.commune Gene knock-outs Modeling regulatory cascades
27
27 Summary Eukaryotic Annotation Recipe: Combine gene predictors, experimental data, and community expertise Fungal Genomics: we aim to scale-up sequencing & comparative analysis of fungi relevant for energy & environment (jgi.doe.gov/fungi)
28
28 Enjoy Algae as well! http://genome.jgi.doe.gov/Algae
29
29 Acknowledgements JGI Staff Our Users
30
30 Outline Eukaryotic Genome Annotation Fungal Genomics Program MycoCosm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.