Extending MAO : towards an Ontology of Genetic and Evolutionary Events Laboratory of Integrative BioInformatics and Genomics (LBGI), Department of Biology and Structural Genomics, UMR 7104, IGBMC Strasbourg Julie Thompson Collège de France NESCent Evolutionary Informatics Working Group Meeting November 12-14, 2007
Current projects : Outline MAO : multiple alignment ontology MACSIMS : multiple alignment based information management system AlexSys : integrated workbench for multiple alignment MyoNet : interactome, transcriptome analyses ( French Myopathy Association) EvolHHuPro : Evolution Histories of the Human Proteome Introduction :
MAO consortium: - MSA algorithms (J Thompson, O Poch, IGBMC) (Kazutake KATOH, Kyoto) - Protein 3D analysis (Patrice KOEHL, Davis) - Protein 3D structure (Dino MORAS, IGBMC) - RNA analysis (Steve HOLBROOK, Berkeley) - 3D RNA structure (Eric WESTHOF, IBMC) Thompson et al, Nucl Acids Res MAO: multiple alignment ontology DNA, RNA, protein sequences and 3D structures Standardised vocabulary for multiple alignments :
nucleotide amino_acid residue alignment_column alignment_sequence sub_alignment multiple_sequence_alignment residue_function interaction mutation structural_location exposed helix hinge_region domain signal sequence_feature_type sequence_feature is_a part_of attribute_of atom 3d_atomic_coordinates column_conservation level type basic aliphatic Thompson et al, Nucl Acids Res MAO: multiple alignment ontology
well-structured vocabularies for shared use across different biological domains Acceptance criteria. The ontology : must be open, freely available to all should be in a shared format for compatibility should be orthogonal to the other OBO ontologies The ontology is then considered to be authoritative by the OBO consortium MAO: multiple alignment ontology Open Biomedical Ontologies :
MAO: multiple alignment SO: gene structure MI: interactions GO: gene ontology taxon: TAXID OBO ontology DOID: human disease PW: pathway PSI-MOD: modifications ProPreO: proteomics BTO: tissue IPR: Interpro OBO ontologies are all available in the same format and can be used in combination MAO: links to other ontologies
Data collection : creation of a relational database (BIRD, H. Nguyen) Information management : data validation reliable propagation Efficient exploitation : of the multiple alignment for phylogenetic inference automatic, high-throughput processing (XML format) visualisation (JalView, G. Barton, Scotland) MACSIMS : Information Management System
MS2PH : Prediction of structural/functional effects of mutations Sulfatase protein family : GALNS Mutations in GALNS gene are implicated in Morquio A syndrome : mutation C79Y -> severe phenotype others -> milder phenotypes MACSIMS : Information Management System
Genome annotation Transcriptomic data analyses : PRIMA, EVI GENORET IP FP6 PAS nuclear receptor coactivator 2 Functional genomics annotation of prokaryotic genome Mycobacterium smegmatis (JM. Reyrat, Hopital Necker, Paris) functional annotation of cDNA from eukaryote Alvinella pompejana (F. Zal, Station Biologique, Roscoff) human genetic diseases : MS2PH (Structural Mutation to Human Pathology Phenotypes) (G. Deléage, IBCP, Lyon) Prediction of structural/functional effects of mutations Structural/functional characterisation 1500 structural genomics targets : SPINE IP FP5 PACPASHLH CREBBP interaction AT Poly-Gln PAS LXXLL Acetylation (by CREBBP) S-nitrosylation nuclear receptor coactivator 2 Receptor-interacting domain High throughput applications
(thesis, R. Aniba; co-directed by A. Marchler-Bauer, NCBI, Washington) Current Projects ALEXSYS ALEXSYS : Alignment Expert SYStem test, evaluate and optimize all the stages of the construction, analysis and exploitation of a multiple sequence alignment develop a modular platform, incorporating different, complementary algorithms and mined knowledge (sequence, structure, function, taxonomy…) understand relationships between sequence characteristics and algorithmic strengths and weaknesses automatic selection of suitable algorithms depending on sequences design optimal scenarios/workflows for different biological applications Motivation Objectives Platform for integration and exploitation of pertinent information for the study of complex biological systems
Interactome data (Isabelle Richard, Genethon, Paris) Current Projects extension of MAO, MACSIMS MyoNet : interactome, transcriptome analyses ( French Myopathy Association) Transcriptome data (Frédéric Relaix, Hopital Pitié-Salpétrière, Paris; Miguel Andrade, Ottawa) construction of transcriptional networks involved in muscle development interaction residues represented by non-contiguous sequence features interactions between proteins requires definition of links between MSA, notion of ‘collection of MSA’ gene expression data phylogenetic profile approach functional anayses from MACSIMS
Reconstruction of the evolutionary histories of the human proteome human frog fish mouse duplication loss recombinationmutation Genetic events (duplication, loss, recombination,...) humanmouse Analysis of protein coding regions (extensions, insertions, deletions,...) human frog fish mouse active site EvolHHuPro: Reconstruction of evolutionary histories for human proteome (P. Pontarotti, Marseille) Current Projects MSA construction (PipeAlign) MACSIMS analysis Tree construction (Figenix) Localisation genetic events Construction of evolutionary histories Genome mapping (Cassiope) Evolutionary mechanisms
Classify evolutionary histories to define a set of ‘typical’ histories compare stable and unstable families identify proteins that have never experienced specific events (duplications, fusions,…) … Genome scale analysis: Objective: better understanding of mechanisms involved in vertebrate evolution Current Projects EvolHHuPro: Reconstruction of evolutionary histories for human proteome (P. Pontarotti, Marseille) Functional analyses of clusters, based on MACSIMS enrichment of a particular class of proteins correlations between the genetic events and structural/functional context …
Acknowledgements Collaborators : Miguel ANDRADE (Ottawa) Toby GIBSON (Heidelberg) Des HIGGINS (Dublin) Kazutaka KATOH (Kyoto) Patrice KOEHL (UC Davis) Aron MARCHLER-BAUER (Washington) Pierre PONTAROTTI (Marseille) Frédéric RELAIX (Paris) Eric WESTHOF (Strasbourg) Laurent-Phillippe ALBOU Radouene ANIBA Yannick-Noel ANNO Guillaume BERTHOMMIER Yann BRELIVET Annaick CARLES Anne FRIEDRICH Nicolas GAGNIERE David KIEFFER Odile LECOMPTE LBGI (Laboratory of Integrative Bioinformatics and Genomics): Frédéric PLEWNIAK Laurent BIANCHETTI Sophie CANDEL Véronique GEOFFROY BIPS (Strasbourg BioInformatics Platform): Luc MOULINIER Jean MULLER Ngoc-Hoan NGUYEN Emmanuel PERRODOU Laetitia POIDEVIN Francisco PROSDOCIMI Wolfgang RAFFELSBERGER Ravikiran REDDY Raymond RIPP Nicolas WICKER Olivier POCH
IGBMC 90 researchers 72 postdocs 145 PhD 227 engineers/technicians m2 laboratory area 7 departments 4 highthroughput technological platforms, RIO 534 personnel : Research Center (CNRS/Inserm/Université Louis Pasteur) European Biomedical research center Eukaryote genome study Genetic expression control Genes and proteins functional analysis Human pathologies studies (cancer, monogenic disease, metabolic disease,...)
IGBMC Department of Biology and Structural Genomics (D. Moras) Laboratory of Integrative Bioinformatics and Genomics (O. Poch) RIO platforms Bioinformatics Services, education Ressources updates Development and distribution ProteinsComplexesTissuesCells Informational Families TranscriptionProstate, breast Stem cells Cancer DatabasesAlgorithms Comparative Genomics Structure Fonctional genomics O. Lecompte YN. Anno N. Gagnière Phylogenetic Inference L. Moulinier LP. Albou Y. Brelivet A. Friedrich W. Raffelsberger A. Carles L. Poidevin R. Reddy R. Ripp Y. Benabbou G. Berthommier H. Nguyen N. Wicker D. Kieffer JD. Thompson R. Aniba E. Perrodou F. Prosdocimi Platform F. Plewniak L. Bianchetti S. Candel V. Geoffroy LBGI : Laboratory of Integrative Bioinformatics and Genomics
BlastP, Ballast SRS, BIRD DbClustal RASCAL LEON NorMD Secator/DPC MAO, MACSIMS Integrated protein family analysis Automatic construction and analysis of a high quality multiple alignment Reliable environment for integration of information related to protein families Plewniak et al, Nucl Acids Res. 2003
Bardet Biedl Syndrome, BBS10 : conservation profile schematic overview window detailed alignment window sub-groups MACSIMS Visualisation with JalView (G Barton) sub_alignment:=BBS10 sequence:=bbs10_human motif:=(47-63) ATP binding residue:= 49 mutation:=R->W column:=85 conservation:=100% BBS10 chaperonin BBS6 R49W XML format based on MAO: