First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et Biologie Evolutive Université Claude Bernard - Lyon 1 Simon Penel, Laurent Duret, Pascal Calvat, Jean-François Dufayard, Guy Perrière, Manolo Gouy. POSTER JO 60
Homologous Genes Databases Research fields: Proteome/genome comparative analysis Phylogenetic studies Orthology/Paralogy relationship assignments Development of generic databases, specialised databases –HOVERGEN: families of homologous vertebrate genes –HOBACGEN: families of homologous bacterial genes –NureBase, RTKdb, Hoppsigen, Mitalib, Polymorphix..
Contents: Nucleic and protein sequences Sequence annotations Taxonomic data Protein multiple alignments Phylogenetic trees The HoGenom database: Homologous Genes Families from fully Sequenced Organisms European project TEMBLOR
The HoGenom database: Building of Database European Bioinformatic Institute Data selection 1 sequence many species Proteome sets Rat etc. Mouse Human SwissProt TrEMBL TrEMBL-new Protein sequences 1 sequence 1 species
Filtering (SEG) Local pairwise alignments The HoGenom database: Building of Database Similarity search BLASTP BLOSUM62 E ≤ Parralelised calculations at IN2P3
Clustering into families A B A C HSP ≥ 80 % length Similarity ≥ 50 % 1 : Clustering of complete sequences into families 2 : Including partial sequences to the families defined previously The HoGenom database: Building of Database C B A Cluster A, B, C Protein Family
Protein family ABCDEFGABCDEFG BIONJ Neighbor joining, Observed divergence Partial sequences: distance matrix with missing values Multiple alignment ABCDEFGABCDEFG Rooting: mid-point Phylogenetic tree G F E D C B A CLUSTAL W Default parameters Alignments and trees The HoGenom database: Building of Database
Arabidopsis thaliana (plant) Caenorhabditis elegans (nematod) Drosophila melanogaster (fly) Encephalitozoon cuniculi (microsporidia) Guillardia theta (alguae) Homo sapiens (man) Mus musculus (mouse) Rattus norvegicus (rat) Saccharomyces cerevisiae (yeast) Schizosaccharomyces pombe (fungus) proteins, cds families 31% 9% 60% 117 organisms The HoGenom database: Contents
WWW Query Query on sequences and families according to multiple criteria Cross Taxa Query on families according to complex taxonomic criteria Querying the databases
POSTER JO-60 à suivre…