1 How Philosophy of Science Can Help Biomedical Research Barry Smith
How to Do Biology across the Genome? 2
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV 3 sequence of X chromosome in baker’s yeast
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 4
5
6 Stelzl et al., Cell, 2005
network of gene interactions in E. coli
8
9
10 what cellular component? what molecular function? what biological process?
11
12
13 The Idea of Common Controlled Vocabularies MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity
14 The Idea of Common Controlled Vocabularies MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex
15 male courtship behavior, orientation prior to leg tapping and wing vibration Gene Ontology
16 Benefits of GO 1.based in biological science 2.links data to biological reality 3.links people to software 4.links data together across species (human, mouse, yeast, fly...) across granularities (molecule, cell, organ, organism, population)
The goal all biological (biomedical) research data should cumulate to form a single, algorithmically processible, whole 17
Ontologies already being applied to achieve this goal Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers GO tells you what is standard functional information for these genes By tracking deviations from this standard 189 genes could be identified as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention. Science Oct 13;314(5797):
Towards Empirical Philosophy processualist vs. 3-dimensionalist reductionist vs. non-reductionist realist vs. nominalist If ontologies based on different philosophical principles are tested for their utility in support of scientific research, which types of ontologies will prove most useful? 19
20 Some sample ontologies Cell Ontology (CL) Foundational Model of Anatomy (FMA) Environment Ontology (EnvO) Gene Ontology (GO) Infectious Disease Ontology Phenotypic Quality Ontology (PaTO) Protein Ontology (PRO) RNA Ontology (RnaO) Sequence Ontology (SO)
21
22
23
24
The problem High throughput experimentation data is meaningless unless the researcher is provided with detailed information concerning how it was obtained 25
To make experimental data computationally accessible we need ontologies to describe the data (1) from the point of view of their relation to reality (2) from the point of view of their relation to experiments 26
27 Three solutions The MGED Ontology OBI: The Ontology for Biomedical Investigations EXPO: The Experiment Ontology
28 MGED (Microarray Gene Expression Data) Ontology
MGED Ontology Individual =def. name of the individual organism from which the biomaterial was derived Experiment =def. The complete set of bioassays and their descriptions performed as an experiment for a common purpose.... An experiment will be often equivalent to a publication. 29
MGED Ontology Chromosome =Def An abstraction used for annotation Chromosome =Def A biological sequence that can be placed on an array 30
31 OBI The Ontology for Biomedical Investigations with thanks to Trish Whetzel and Richard Scheuermann
32 Purpose of OBI To provide a resource for the unambiguous description of the components of biomedical investigations such as the design, protocols and instrumentation, material, data and types of analysis and statistical tools applied to the data NOT designed to model biology
Hypothesis That it is possible to create ontology resources of genuine utility by drawing on logical and philosophical principles e.g. pertaining to consistency of definitions, avoidance of use-mention confusions. 33
34 OBI Collaborating Communities Crop sciences Generation Challenge Programme (GCP), Environmental genomics MGED RSBI Group, Genomic Standards Consortium (GSC), HUPO Proteomics Standards Initiative (PSI), psidev.sourceforge.net Immunology Database and Analysis Portal, Immune Epitope Database and Analysis Resource (IEDB), International Society for Analytical Cytology, Metabolomics Standards Initiative (MSI), Neurogenetics, Biomedical Informatics Research Network (BIRN), Nutrigenomics MGED RSBI Group, Polymorphism Toxicogenomics MGED RSBI Group, Transcriptomics MGED Ontology Group
OBI – Tools and Documentation Open source, standards compliant and version management Ontology Web Language (OWL) using Protégé editor OBI.owl files are available from the OBI SVN Repository
The Problem of Clinical Investigations Regulatory bodies such as the FDA need to assess the evidentiary value of enormous volumes of data collected e.g. in trials on specific drug formulations For this, they need to impose standardization of terminologies used to express these data, e.g. as developed by the Clinical Data Interchange Standards Consortium (CDISC) 36
37
Clinical Investigations terminologies
“Study Design” Descriptive research –Case study – description of one or more patients –Developmental research – description of pattern of change over time –Qualitative research – gathering data through interview or observation Exploratory research –Secondary analysis – exploring new relationships in old data –Historical research – reconstructing the past through an assessment of archives or other records Experimental research –Randomized clinical trial –Meta-analysis – statistically combining findings from several different studies to obtain a summary analysis
“Population” Recruited population –Randomized population –Eligible population –Screened population –Premature termination population Excluded population –Excluded post-randomization population –Not-eligible-population Analyzed population –Study arm population –Crossover population –Subgroup population –Intent-to-treat population - based on randomization
Overview of OCI
Meta-analysis (CDISC) Quality assurance (CDISC) Quality control (CDISC) Baseline assessment (CDISC) Validation (CDISC) Coding (MUSC) Permuted block randomization (MUSC) Secondary-study-protocol (RCT) Intervention-step (RCT) Blinding-method (RCT) Study design Development plan (CDISC) Standard operating procedures (CDISC) Statistical analysis plan (CDISC)
Negative findings (MUSC) Positive findings (MUSC) Primary-outcome (RCT) Secondary-outcome (RCT)
46 EXPO The Ontology of Experiments L. Soldatova, R. King Department of Computer Science The University of Wales, Aberystwyth
47 EXPO: Experiment Ontology
48 EXPO: Experiment Ontology
49 EXPO: Experiment Ontology
50 experimental actions part_of experimental design subject of experiment part_of experimental design
51 Role of Philosophy of Science EXPO: Experiment Ontology
Towards Empirical Philosophy of Science rational statistical models of induction case-based / domain-based reasoning falsifiabilism Humeanism vs. laws logical, relative frequency, Bayesian, objective (chance) and epistemic theories of probability These generate different ontologies of scientific evidence – which one is correct? 52
Environment Ontology + Phenotypic Quality Ontology + Ontology for Personalized and Community Medicine ‘Racial’ Phenotypes: Social, Phylogenetic, Essentialistic... 53
54 Ontology for Personalized and Community Medicine to support studies of differential effects on health 1. of environmental qualities of different neighborhoods and 2. of different community behavior phenotypes