Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam.

Similar presentations


Presentation on theme: "Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam."— Presentation transcript:

1 Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam

2 The Vision: Scientist as knowledge worker For Knowledge Workers: – Knowledge is the data (i.e. rules, relations, properties, hypotheses, etc.) For Today's Biologist: – Numbers, sequences, organisms(!), and images are the data Manipulate knowledge instead of data – Find support for relations between concepts instead of discovering table and column names and numbers. In the virtual laboratory, everything is a resource that can be described and manipulated with semantics

3 Vision: Concept-based interfaces The scientist should be able to work in terms of commonly used concepts. The scientist should be able to work in terms of personal concepts and hypotheses. - Not be forced to map concepts to the terms that have been chosen for a given application by the application builder.

4 Interface Sketch: Finding a basis for relation Epigenetic Mechanisms Transcription Chromatin Transcription Factors “There is a relation” Common Domain Instances Classes Hypothesis Histone Modification Transcription Factor Binding Sites position

5 KSinBIT’06 Biological cartoon as interface Source: Marco Roos

6 Biology in a nutshell: Bigger isn’t better DNA Dogma – Transcription = DNA -> mRNA -> Protein Molecular pathways allow biologists to ‘connect’ one process to another. Huntington’s mutation mapped in 1993 yet there is still no understanding of the mechanism that causes the neurodegeneration. Semantic models are necessary to create a ‘systems view’ of biology.

7 Show Bigger isn’t Better Scaling up should be done in small increments but once you’ve reached a certain threshold..

8 What is metadata (in this course)? Metadata: data about data Metadata can be syntactic such as a data type, e.g. Integer. Metadata can be semantic such as chromosome number. Note: not always ontology, but metadata can be stored in OWL

9 Common approaches to metadata Code it into the GUI or application (in datastructures, object types, etc.) Create special tables or fields for it in a relational database Map it into substrings of filenames Mix it in with data in proprietary file formats Let the user figure it out Conclusion: There is a need for semantic disclosure.

10 The Semantic Gap User ResourcesMiddlewareApplication

11 The Model in the middle User ResourcesMiddlewareApplication My Model Model

12 What is knowledge (in this course) “data”, “information”, “facts”, “knowledge” Knowledge is a statement that can be tested for truth. (by a machine) Otherwise, computing can’t add much

13 Resources are shared on the grid Shared: – CPU time – network bandwidth – memory – storage space But also: – Data – Knowledge: ontologies, rules, vocabularies – Services

14 Abundance of resources in Grid: A Challenge Knowledge Sharing – How will we find the relevant resources (data, services)? – How can we automatically integrate them into an application? – How will we leverage existing knowledge in my analysis? – How will we integrate our results as usable data for a new (computational) experiment? – And link to the evidence (data) for the new knowledge?

15 Knowledge Capture How will we acquire the knowledge? – Literature – Other forms of discourse – Data analysis How will we represent and store it? – In Semantic Web formats such as RDF, OWL, RIF

16 Knowledge capture from a computational experiment Database Computational experiment in workflow environment Database...

17 What will we do with knowledge? How will we use it? – Query it – Reason across it – Integrate it with other data Link it up

18 Linked Data Principles 1.Use URIs as names for things. 2.Use HTTP URIs so that people can look up those names. 3.When someone looks up a URI, provide useful RDF information. 4.Include RDF statements that link to other URIs so that they can discover related things. Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html

19 Background of the HCLS IG Originally chartered in 2005 – Chairs: Eric Neumann and Tonya Hongsermeier Re-chartered in 2008 – Chairs: Scott Marshall and Susie Stephens – Team contact: Eric Prud’hommeaux Broad industry participation – Over 100 members – Mailing list of over 600 Background Information – http://www.w3.org/2001/sw/hcls/ http://www.w3.org/2001/sw/hcls/ – http://esw.w3.org/topic/HCLSIG http://esw.w3.org/topic/HCLSIG

20 Mission of HCLS IG The mission of HCLS is to develop, advocate for, and support the use of Semantic Web technologies for – Biological science – Translational medicine – Health care These domains stand to gain tremendous benefit by adoption of Semantic Web technologies, as they depend on the interoperability of information from many domains and processes for efficient decision support

21 Translating across domains Translational medicine – use cases that cross domains Link across domains and research: – What are the links? gene – transcription factor – protein pathway – molecular interaction – chemical compound drug – drug side effect – chemical compound

22 Group Activities Document use cases to aid individuals in understanding the business and technical benefits of using Semantic Web technologies Document guidelines to accelerate the adoption of the technology Implement a selection of the use cases as proof-of-concept demonstrations Develop high-level vocabularies Disseminate information about the group’s work at government, industry, and academic events

23 Current Task Forces BioRDF – integrated neuroscience knowledge base – Kei Cheung (Yale University) Clinical Observations Interoperability – patient recruitment in trials – Vipul Kashyap (Cigna Healthcare) Linking Open Drug Data – aggregation of Web-based drug data – Chris Bizer (Free University Berlin) Pharma Ontology – high level patient-centric ontology – Christi Denney (Eli Lilly) Scientific Discourse – building communities through networking – Tim Clark (Harvard University) Terminology – Semantic Web representation of existing resources – John Madden (Duke University)

24 BioRDF Task Force Task Lead: Kei Cheung Participants: M. Scott Marshall, Eric Prud’hommeaux, Susie Stephens, Andrew Su, Steven Larson, Huajun Chen, TN Bhat, Matthias Samwald, Erick Antezana, Rob Frost, Ward Blonde, Holger Stenzhorn, Don Doherty

25 BioRDF: Answering Questions Goals: Get answers to questions posed to a body of collective knowledge in an effective way Knowledge used: Publicly available databases, and text mining Strategy: Integrate knowledge using careful modeling, exploiting Semantic Web standards and technologies

26 BioRDF: Looking for Targets for Alzheimer’s Signal transduction pathways are considered to be rich in “druggable” targets CA1 Pyramidal Neurons are known to be particularly damaged in Alzheimer’s disease Casting a wide net, can we find candidate genes known to be involved in signal transduction and active in Pyramidal Neurons? Source: Alan Ruttenberg

27 NeuronDB BAMS Literature Homologene SWAN Entrez Gene Gene Ontology Mammalian Phenotype PDSPki BrainPharm AlzGene Antibodies PubChem MESH Reactome Allen Brain Atlas BioRDF: Integrating Heterogeneous Data Source: Susie Stephens

28 BioRDF: SPARQL Query Source: Alan Ruttenberg

29 BioRDF: Results: Genes, Processes DRD1, 1812adenylate cyclase activation ADRB2, 154adenylate cyclase activation ADRB2, 154arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway DRD1IP, 50632dopamine receptor signaling pathway DRD1, 1812dopamine receptor, adenylate cyclase activating pathway DRD2, 1813dopamine receptor, adenylate cyclase inhibiting pathway GRM7, 2917G-protein coupled receptor protein signaling pathway GNG3, 2785G-protein coupled receptor protein signaling pathway GNG12, 55970G-protein coupled receptor protein signaling pathway DRD2, 1813G-protein coupled receptor protein signaling pathway ADRB2, 154G-protein coupled receptor protein signaling pathway CALM3, 808G-protein coupled receptor protein signaling pathway HTR2A, 3356G-protein coupled receptor protein signaling pathway DRD1, 1812G-protein signaling, coupled to cyclic nucleotide second messenger SSTR5, 6755G-protein signaling, coupled to cyclic nucleotide second messenger MTNR1A, 4543G-protein signaling, coupled to cyclic nucleotide second messenger CNR2, 1269G-protein signaling, coupled to cyclic nucleotide second messenger HTR6, 3362G-protein signaling, coupled to cyclic nucleotide second messenger GRIK2, 2898glutamate signaling pathway GRIN1, 2902glutamate signaling pathway GRIN2A, 2903glutamate signaling pathway GRIN2B, 2904glutamate signaling pathway ADAM10, 102integrin-mediated signaling pathway GRM7, 2917negative regulation of adenylate cyclase activity LRP1, 4035negative regulation of Wnt receptor signaling pathway ADAM10, 102Notch receptor processing ASCL1, 429Notch signaling pathway HTR2A, 3356serotonin receptor signaling pathway ADRB2, 154transmembrane receptor protein tyrosine kinase activation (dimerization) PTPRG, 5793ransmembrane receptor protein tyrosine kinase signaling pathway EPHA4, 2043transmembrane receptor protein tyrosine kinase signaling pathway NRTN, 4902transmembrane receptor protein tyrosine kinase signaling pathway CTNND1, 1500Wnt receptor signaling pathway Many of the genes are related to AD through gamma secretase (presenilin) activity Source: Alan Ruttenberg

30 Linking Open Drug Data HCLSIG task started October 1 st, 2008 Primary Objectives Survey publicly available data sets about drugs Explore interesting questions from pharma, physicians and patients that could be answered with Linked Data Publish and interlink these data sets on the Web Participants: Bosse Andersson, Chris Bizer, Kei Cheung, Don Doherty, Oktie Hassanzadeh, Anja Jentzsch, Scott Marshall, Eric Prud’hommeaux, Matthias Samwald, Susie Stephens, Jun Zhao

31 The Classic Web B C HTML Web Browsers Search Engines hyper- links Single information space Built on URIs – globally unique IDs – retrieval mechanism Built on Hyperlinks – are the glue that holds everything together A hyper- links Source: Chris Bizer

32 Linked Data B C Thing typed links A D E Thing Search Engines Linked Data Mashups Linked Data Browsers Use Semantic Web technologies to publish structured data on the Web and set links between data from one data source and data from another data sources Source: Chris Bizer

33 Data Objects Identified with HTTP URIs pd:cygri Richard Cyganiak dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri dbpedia:Berlin = http://dbpedia.org/resource/Berlin Forms an RDF link between two data sources Source: Chris Bizer

34 Dereferencing URIs over the Web dp:Cities_in_Germany 3.405.259 dp:population skos:subject Richard Cyganiak dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type pd:cygri Source: Chris Bizer

35 Dereferencing URIs over the Web dp:Cities_in_Germany 3.405.259 dp:population skos:subject Richard Cyganiak dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type pd:cygri skos:subject dbpedia:Hamburg dbpedia:Meunchen Source: Chris Bizer

36 LODD Data Sets Source: Anja Jentzsch

37 LODD in Marbles Source: Anja Jentzsch

38 The Linked Data Cloud Source: Chris Bizer

39 Accomplishments Technical – HCLS KB hosted at 2 institutes – Linked Open Data contributions – Demonstrator of querying across heterogeneous EHR systems – Integration of SWAN and SIOC ontologies for Scientific Discourse Outreach – Conference Presentations and Workshops: Bio-IT World, WWW, ISMB, AMIA, C-SHALS, etc. – Publications: Proceedings of LOD Workshop at WWW 2009: Enabling Tailored Therapeutics with Linked Data Proceedings of the ICBO: Pharma Ontology: Creating a Patient-Centric Ontology for Translational Medicine AMIA Spring Symposium: Clinical Observations Interoperability: A Semantic Web Approach BMC Bioinformatics. A Journey to Semantic Web Query Federation in Life Sciences Briefings in Bioinformatics. Life sciences on the Semantic Web: The Neurocommons and Beyond

40 New Technologies SPARQL-DL Semantic Wiki (integration with KB’s) Cloud Computing (e.g. Amazon) Query rewriting: SPARQL -> SQL – Legacy integration – Improve interfaces FeDeRate: Federated query

41 We’ve come a long way Triplestores have gone from millions to billions Linked Open Data cloud http://lod.openlinksw.com/ On demand Knowledge Bases: Amazon’s EC2 Terminologies: SNOMED-CT, MeSH, UMLS,.. Neurocommons, Flyweb, Biogateway, Bio2RDF, Linked Life Data,..

42 Penetrance of ontology in biology OBO Foundry - http://www.obofoundry.orghttp://www.obofoundry.org BioPortal - http://bioportal.bioontology.orghttp://bioportal.bioontology.org National Centers for Biomedical Computing http://www.ncbcs.org/ http://www.ncbcs.org/ Shared Names Concept Web Alliance Semantic Web Interest Group PRISM Forum Work packages in ELIXIR

43 Recipe for a Semantic Web Follow Linked Open Data principles Attempt to use Shared Names (same URI’s) Query rewriting to map from: – SPARQL -> (query language) – SPARQL (term1) -> SPARQL (term2) Add federated query support to SPARQL engine implementations

44 The End “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.” – Henri Poincaré, Science and Hypothesis, 1905


Download ppt "Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam."

Similar presentations


Ads by Google