Extending MAO : towards an Ontology of Genetic and Evolutionary Events Laboratory of Integrative BioInformatics and Genomics (LBGI), Department of Biology.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012) Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012.
Protein sequence analysis is a key issue in post-genomic biology. High-throughput genome sequencing and assembly techniques, structural proteomics and.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Gene Ontology John Pinney
Guillaume Berthommier¹, Dominique Santiard-Baron², Olivier Poch¹ and Raymond Ripp¹ ¹ Laboratoire de BioInformatique et Génomique Intégratives IGBMC (CNRS.
From cDNA to integrative protein annotation and beyond: application to Alvinella pompejana cDNA collection Gagnière, N. 1, Bigot, Y. 2, Gaill, F. 3, Higuet,
Laura Cammas 1, Guillaume Berthommier 2, Raymond Ripp 2, Pascal Dollé 1 1 Component B, Departement of Physiological Genetics 2 Component T, Laboratoire.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
L. Poidevin, W. Raffelsberger, R. Reddy, G. Berthommier, N. Gagnière, R. Ripp and O. Poch Laboratoire de BioInformatique et Génomique Intégratives IGBMC.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
A Naive Bayesian Classifier To Assign Protein Sequences to Protein Subfamilies Learning Set Test Set The development of high throughput technologies in.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
COG and GO tutorial.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Raymond Ripp, Julie D. Thompson, Frédéric Plewniak, Jean-Claude Thierry, Olivier Poch Laboratoire de BioInformatique et Génomique Intégratives du Département.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Protein and Function Databases
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics and medicine: Are we meeting the challenge?
Networks and Interactions Boo Virk v1.0.
Copyright © 2009 Pearson Education, Inc. Art and Photos in PowerPoint ® Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino Chapter 21.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Copyright © 2009 Pearson Education, Inc. Genomics, Bioinformatics, and Proteomics Chapter 21 Lecture Concepts of Genetics Tenth Edition.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein and RNA Families
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Alignement multiple: progrès et perspectives dans l’estimation et l’exploitation des algorithmes et des données Marseille 17 Novembre 2005 Laboratory of.
InterPro Sandra Orchard.
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Bioinformatics Overview
The Transcriptional Landscape of the Mammalian Genome
Networks and Interactions
Gil McVean Department of Statistics
Data-intensive Computing: Case Study Area 1: Bioinformatics
Interrogation of cross talk between proteins and gene regulatory networks in breast cancer Chambers, Teressa Lee Hiren Karathia Sridhar Hannenhalli.
Genomes and Their Evolution
Predicting Active Site Residue Annotations in the Pfam Database
Genomes and Their Evolution
There are four levels of structure in proteins
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

Extending MAO : towards an Ontology of Genetic and Evolutionary Events Laboratory of Integrative BioInformatics and Genomics (LBGI), Department of Biology and Structural Genomics, UMR 7104, IGBMC Strasbourg Julie Thompson Collège de France NESCent Evolutionary Informatics Working Group Meeting November 12-14, 2007

Current projects : Outline MAO : multiple alignment ontology MACSIMS : multiple alignment based information management system AlexSys : integrated workbench for multiple alignment MyoNet : interactome, transcriptome analyses ( French Myopathy Association) EvolHHuPro : Evolution Histories of the Human Proteome Introduction :

MAO consortium: - MSA algorithms (J Thompson, O Poch, IGBMC) (Kazutake KATOH, Kyoto) - Protein 3D analysis (Patrice KOEHL, Davis) - Protein 3D structure (Dino MORAS, IGBMC) - RNA analysis (Steve HOLBROOK, Berkeley) - 3D RNA structure (Eric WESTHOF, IBMC) Thompson et al, Nucl Acids Res MAO: multiple alignment ontology DNA, RNA, protein sequences and 3D structures Standardised vocabulary for multiple alignments :

nucleotide amino_acid residue alignment_column alignment_sequence sub_alignment multiple_sequence_alignment residue_function interaction mutation structural_location exposed helix hinge_region domain signal sequence_feature_type sequence_feature is_a part_of attribute_of atom 3d_atomic_coordinates column_conservation level type basic aliphatic Thompson et al, Nucl Acids Res MAO: multiple alignment ontology

well-structured vocabularies for shared use across different biological domains Acceptance criteria. The ontology : must be open, freely available to all should be in a shared format for compatibility should be orthogonal to the other OBO ontologies The ontology is then considered to be authoritative by the OBO consortium MAO: multiple alignment ontology Open Biomedical Ontologies :

MAO: multiple alignment SO: gene structure MI: interactions GO: gene ontology taxon: TAXID OBO ontology DOID: human disease PW: pathway PSI-MOD: modifications ProPreO: proteomics BTO: tissue IPR: Interpro OBO ontologies are all available in the same format and can be used in combination MAO: links to other ontologies

Data collection : creation of a relational database (BIRD, H. Nguyen) Information management : data validation reliable propagation Efficient exploitation : of the multiple alignment for phylogenetic inference automatic, high-throughput processing (XML format) visualisation (JalView, G. Barton, Scotland) MACSIMS : Information Management System

MS2PH : Prediction of structural/functional effects of mutations Sulfatase protein family : GALNS Mutations in GALNS gene are implicated in Morquio A syndrome : mutation C79Y -> severe phenotype others -> milder phenotypes MACSIMS : Information Management System

 Genome annotation Transcriptomic data analyses : PRIMA, EVI GENORET IP FP6 PAS nuclear receptor coactivator 2  Functional genomics annotation of prokaryotic genome Mycobacterium smegmatis (JM. Reyrat, Hopital Necker, Paris) functional annotation of cDNA from eukaryote Alvinella pompejana (F. Zal, Station Biologique, Roscoff) human genetic diseases : MS2PH (Structural Mutation to Human Pathology Phenotypes) (G. Deléage, IBCP, Lyon)  Prediction of structural/functional effects of mutations  Structural/functional characterisation 1500 structural genomics targets : SPINE IP FP5 PACPASHLH CREBBP interaction AT Poly-Gln PAS LXXLL Acetylation (by CREBBP) S-nitrosylation nuclear receptor coactivator 2 Receptor-interacting domain High throughput applications

(thesis, R. Aniba; co-directed by A. Marchler-Bauer, NCBI, Washington) Current Projects ALEXSYS ALEXSYS : Alignment Expert SYStem test, evaluate and optimize all the stages of the construction, analysis and exploitation of a multiple sequence alignment develop a modular platform, incorporating different, complementary algorithms and mined knowledge (sequence, structure, function, taxonomy…) understand relationships between sequence characteristics and algorithmic strengths and weaknesses automatic selection of suitable algorithms depending on sequences design optimal scenarios/workflows for different biological applications Motivation Objectives Platform for integration and exploitation of pertinent information for the study of complex biological systems

 Interactome data (Isabelle Richard, Genethon, Paris) Current Projects extension of MAO, MACSIMS MyoNet : interactome, transcriptome analyses ( French Myopathy Association)  Transcriptome data (Frédéric Relaix, Hopital Pitié-Salpétrière, Paris; Miguel Andrade, Ottawa) construction of transcriptional networks involved in muscle development interaction residues represented by non-contiguous sequence features interactions between proteins requires definition of links between MSA, notion of ‘collection of MSA’ gene expression data phylogenetic profile approach functional anayses from MACSIMS

Reconstruction of the evolutionary histories of the human proteome human frog fish mouse duplication loss recombinationmutation Genetic events (duplication, loss, recombination,...) humanmouse Analysis of protein coding regions (extensions, insertions, deletions,...) human frog fish mouse active site EvolHHuPro: Reconstruction of evolutionary histories for human proteome (P. Pontarotti, Marseille) Current Projects MSA construction (PipeAlign) MACSIMS analysis Tree construction (Figenix) Localisation genetic events Construction of evolutionary histories Genome mapping (Cassiope) Evolutionary mechanisms

Classify evolutionary histories to define a set of ‘typical’ histories compare stable and unstable families identify proteins that have never experienced specific events (duplications, fusions,…) … Genome scale analysis: Objective: better understanding of mechanisms involved in vertebrate evolution Current Projects EvolHHuPro: Reconstruction of evolutionary histories for human proteome (P. Pontarotti, Marseille) Functional analyses of clusters, based on MACSIMS enrichment of a particular class of proteins correlations between the genetic events and structural/functional context …

Acknowledgements Collaborators : Miguel ANDRADE (Ottawa) Toby GIBSON (Heidelberg) Des HIGGINS (Dublin) Kazutaka KATOH (Kyoto) Patrice KOEHL (UC Davis) Aron MARCHLER-BAUER (Washington) Pierre PONTAROTTI (Marseille) Frédéric RELAIX (Paris) Eric WESTHOF (Strasbourg) Laurent-Phillippe ALBOU Radouene ANIBA Yannick-Noel ANNO Guillaume BERTHOMMIER Yann BRELIVET Annaick CARLES Anne FRIEDRICH Nicolas GAGNIERE David KIEFFER Odile LECOMPTE LBGI (Laboratory of Integrative Bioinformatics and Genomics): Frédéric PLEWNIAK Laurent BIANCHETTI Sophie CANDEL Véronique GEOFFROY BIPS (Strasbourg BioInformatics Platform): Luc MOULINIER Jean MULLER Ngoc-Hoan NGUYEN Emmanuel PERRODOU Laetitia POIDEVIN Francisco PROSDOCIMI Wolfgang RAFFELSBERGER Ravikiran REDDY Raymond RIPP Nicolas WICKER Olivier POCH

IGBMC 90 researchers 72 postdocs 145 PhD 227 engineers/technicians m2 laboratory area 7 departments 4 highthroughput technological platforms, RIO 534 personnel : Research Center (CNRS/Inserm/Université Louis Pasteur) European Biomedical research center Eukaryote genome study Genetic expression control Genes and proteins functional analysis Human pathologies studies (cancer, monogenic disease, metabolic disease,...)

IGBMC Department of Biology and Structural Genomics (D. Moras) Laboratory of Integrative Bioinformatics and Genomics (O. Poch) RIO platforms Bioinformatics Services, education Ressources updates Development and distribution ProteinsComplexesTissuesCells Informational Families TranscriptionProstate, breast Stem cells Cancer DatabasesAlgorithms Comparative Genomics Structure Fonctional genomics O. Lecompte YN. Anno N. Gagnière Phylogenetic Inference L. Moulinier LP. Albou Y. Brelivet A. Friedrich W. Raffelsberger A. Carles L. Poidevin R. Reddy R. Ripp Y. Benabbou G. Berthommier H. Nguyen N. Wicker D. Kieffer JD. Thompson R. Aniba E. Perrodou F. Prosdocimi Platform F. Plewniak L. Bianchetti S. Candel V. Geoffroy LBGI : Laboratory of Integrative Bioinformatics and Genomics

BlastP, Ballast SRS, BIRD DbClustal RASCAL LEON NorMD Secator/DPC MAO, MACSIMS Integrated protein family analysis Automatic construction and analysis of a high quality multiple alignment Reliable environment for integration of information related to protein families Plewniak et al, Nucl Acids Res. 2003

Bardet Biedl Syndrome, BBS10 : conservation profile schematic overview window detailed alignment window sub-groups MACSIMS Visualisation with JalView (G Barton) sub_alignment:=BBS10 sequence:=bbs10_human motif:=(47-63) ATP binding residue:= 49 mutation:=R->W column:=85 conservation:=100% BBS10 chaperonin BBS6 R49W XML format based on MAO: