The Integrated Microbial Genome (IMG) systems Nikos Kyrpides Genome Biology Program (GBP) DOE Joint Genome institute 1
IMG IMG Systems Data Types Genes SNPs Proteomics Regulons Genomes Functions Metadata Clusters IMG SNPs Proteomics Regulons Transcriptomes
Gene/Genome context analysis tools Gene Context Tools Gene Fusion Gene neighborhood Gene co-occurrence Genome Synteny Tools VISTA (predetermined genome set) DotPlot (two user specified genomes) ACT (multiple user specified genomes) Gene Fusion Gene Synteny Co-occurence DotPlot ACT
Gene Fusions
Conserved chromosomal cassettes Genes are replaced by protein families (COGs, pfams, IMG ortholog families). One gene multiple families. H G F E D C B A XI X IX VII VI V IV III II I Conserved chromosomal cassette contains: cassettes that share at least TWO protein families, protein families that cassettes have in common. The definition of conserved chromosomal cassette does not take into account the order of the protein families on the cassette. Mavromatis et al, (2009) PLoS ONE
Missing Function context based analysis Missing function from the fatty acid biosynthesis pathway No known gene for this function has a homolog in Streptococci
Genome Synteny tools
IMG Function Curation 1. Protein Product 2. Protein Family public & automatic 1. Protein Product 2. Protein Family
IMG Function Curation (b) manual 3. IMG Term 4. MyIMG
IMG Function Curation 1. Protein Product 2. Protein Family 3. IMG Term Automatic and Manual 1. Protein Product 2. Protein Family 3. IMG Term 4. MyIMG
Who is there?
Finding organisms
What is the role of the organism in the community?
What is the metabolic potential of the community? Function Abundance
Relative abundance of functions Cloning bias. PCR bias. Assembly coverage. Misassemblies. Erroneous gene prediction.
IMG curation
Curation check
Gene annotation curation Allows overview comparisons between cluster (family) and gene annotations to identify over and underclassified genes
Gene page Main gene detail page