Presentation is loading. Please wait.

Presentation is loading. Please wait.

And now, for something completely different…. Web 2.0 + Web 3.0 = Web 5.0? The HSFBCY + CIHR + Microsoft Research SADI and CardioSHARE Projects Mark Wilkinson.

Similar presentations


Presentation on theme: "And now, for something completely different…. Web 2.0 + Web 3.0 = Web 5.0? The HSFBCY + CIHR + Microsoft Research SADI and CardioSHARE Projects Mark Wilkinson."— Presentation transcript:

1 And now, for something completely different…

2 Web 2.0 + Web 3.0 = Web 5.0? The HSFBCY + CIHR + Microsoft Research SADI and CardioSHARE Projects Mark Wilkinson & Bruce McManus Heart + Lung Institute iCAPTURE Centre, St. Paul’s Hospital, UBC

3 Who am i?

4 Mark Wilkinson

5

6 Carole Goble

7 “Shopping for …data should be as easy as shopping for shoes!!” Carole Goble

8 Carole’s Shoes

9 My shoes

10 It is hard to fill Carole’s shoes!

11 Who am i?

12 Geneticist & Molecular Biologist

13 Informatician

14 Software

15 Middleware

16 Robert Stevens

17

18 Shouldn’t be seen!

19 Then why am I presenting to you?

20 enables useful and exciting behaviours

21 Cool!

22 For cardiovascular researchers

23 Please indulge me as I expose my underwear

24 …BRIEFly ;-) (sorry, I couldn’t resist)

25 The Problem

26

27 The Holy Grail: (circa 2002) Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

28 The Problem

29

30 The Solution??

31 Not really…

32 Why not? Heart

33 Occam’s Razor “Pluralitas non est ponenda sine neccesitate.” “Plurality should not be posited without necessity."

34 “Biology is hard!”

35 So… what can we do?

36 What we need

37 Our “information systems” are DATA-centric

38 Our “information systems” are NOT KNOWLEDGE-centric

39 (Source: Clarke and Rollo, Education and Training, 2001)

40 The REAL Solution Knowledge Machine-readable Knowledge

41 How do we make knowledge machine-readable?

42 Ontology

43 Ontology (Gr: “things which exist” +-logy) An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.

44 Problem…

45 Ontology Spectrum Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html WHY? Because I say so! Because it fulfils XXX

46 My Definition of Ontology (for this talk) Ontologies explicitly define the things that exist in “the world” based on what properties each kind of thing must have

47 Ontology Spectrum Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints

48 My goal with this talk

49 Clay Shirky “Ontology is Over-rated” http://www.shirky.com/writings/ontology_overrated.html

50 COST Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints

51 COMPREHENSIBILITY Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints

52 Likelihood of being “right” Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints

53 Here’s my argument…

54 Semantic Web? An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had not anticipated.

55 Semantic Web? If we cannot achieve those two things, then IMO we don’t have a “semantic web”, we only have a distributed (??), linked database… and that isn’t particularly exciting…

56 Where is the semantic web? Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints REASON: “Because I say so” is not open to re-interpretation

57 The games we have been playing for ~2 years

58 Find. Integrate. Analyse. Founding partner SAD I

59 Data + Knowledge for Cardiologists. Founding partner CardioSHARE

60 Brief History of CardioSHARE

61 Automating the interaction between biologists and their preferred data and analytical resources (Running since 2002)

62 Moby defines an ontology of datatypes, and a WS registry that is aware of that ontology

63 I’m studying a mutation in my favorite organism I want to know what mutations in the equivalent gene look like in another organism

64 Oh no…

65 The Moby Semantic Web Service Approach

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81 WORKFLOW

82 Analytical Workflow construction Using Moby No explicit coordination between providers Automated discovery of appropriate tools Automated execution of selected tools The machine “understands” the data-type you have in-hand, and assists you in choosing the next step in your analysis.

83 This got me thinking…

84 Datatypes… Biology?!?!?

85 CardioSHARE The Cardiovascular Semantic Health And Research Environment

86 Theatre Analogy

87 The Components of a Play The Script The Stage The Casting Director & directors The Actors

88 The Components of CardioSHARE Cardiovascular Ontologies (The Scripts) SHARE (The Stage & Set Decorator) SADI (The Casting Director) Data (The Actors)

89 DATA the “actors” in CardioSHARE

90 ACTORS: No intelligence No Use or Function (…without a script) more successful when clean!!

91 No Intelligence? No Use? Data exhibits “Late binding”

92

93 Late binding: “purpose and meaning” of the data is not determined until the moment it is required

94 benefit/Consequence of Late binding: data is amenable to constant re-interpretation

95 How do we achieve this? Because we DO NOT CLASSIFY our data… we simply hang properties on it

96 Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints These are classification systems These are axioms that enable interpretation

97 The Components of CardioSHARE Cardiovascular Ontologies (The Scripts) SHARE (The Stage & Set Decorator) SADI (The Casting Director) Data (The Actors)

98 SADI “Semantic Automated Discovery and Integration”

99 SADI = …but smarter…

100 SADI Instead of specifying the analysis or database you specify the Biological/clinical RELATIONSHIP of interest

101 “Take this gene sequence, run the BLAST algorithm to search for genes with similar sequence, and give me the result”

102 SADI “get me the homologues for this gene” SADI knows that the BLAST algorithm is used to search for homologous gene sequences SADI finds a BLAST server on the Web and executes your search for you

103 SADI By analogy: SADI, the casting director, knows which actors are best able to fit the part, automatically finds them, and casts them in the play

104 SADI Web Services WS Discovery is aided by reasoning WS interfaces are defined in OWL as two classes: Input and Output Input and Output classes include the property restriction axioms that define the input to/output from the service “mini ontology”

105

106 SADI Web Services The WS consumes input data (native RDF) and adds a new property (predicate+object) to it.

107

108

109 The Components of CardioSHARE Cardiovascular Ontologies (The Scripts) SHARE (The Stage & Set Decorator) SADI (The Casting Director) Data (The Actors)

110 SHARE SHARE is a completely generic “stage” upon which a “play” is going to be performed It’s the place where the actors are assembled to do their thing for an audience

111 SHARE Simply a place where you can ask ANY question (i.e. the Play) and see the result

112 SHARE At the moment it isn’t particularly impressive… By analogy, we have a stage, but we haven’t hired any set decorators yet! You’ll see what I mean in a minute…

113 DEMO

114 Recap what we just saw A SPARQL database query was entered into the SHARE environment The query was passed to SADI and was interpreted based on the properties being asked-about SADI searched-for, found, and accessed the databases and/or analytical tools required to generate those properties “The play was performed”

115 Recap what we just saw We asked, and answered a complex “database query” WITHOUT A DATABASE!!

116 The Holy Grail: Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

117 The Components of CardioSHARE Cardiovascular Ontologies (The Scripts) SHARE (The Stage & Set Decorator) SADI (The Casting Director) Data (The Actors)

118 Cardiovascular Ontologies: Allow us to refer to complex clinical/molecular phenomenon inside of our query (i.e. Biological Knowledge!) Allows experts to define these concepts, and then have their expert-knowledge re-used by non-experts in their own queries.

119 HUH?

120 WORKFLOW QUERY: Retrieve images of mutations from genes in organism XXX that share homology to this gene in organism YYY Concept: “Homologous Mutant Image”

121 Phrased in terms of properties: Find image P where { Gene Q hasImage image P Gene Q hasSequence Sequence Q Gene R hasSequence Sequence R Sequence Q similarTo Sequence R Gene R = “my gene of interest” }

122 …but these are simply axioms… HomologousMutantImage is { Gene Q hasImage image P Gene Q hasSequence Sequence Q Gene R hasSequence Sequence R Sequence Q similarTo Sequence R Gene R = “my gene of interest” }

123 QUERY: Retrieve homologous mutant images for gene XXX

124 CardioSHARE We construct small, independent OWL classes representing these kinds of concepts These classes simplify the construction of complex queries by “encapsulating” data discovery, retrieval, and analysis pipelines into simple, easy-to-understand words and phrases.

125 CardioSHARE These Classes are shared on the Web such that third-parties, potentially with different expertise, can utilize the expertise of the person who designed the Class. Easily share your expertise with others! Easily utilize the expertise of others!

126 CardioSHARE We are not building massive ontologies! Publish small, independent Class definitions Cheap Scalable Flexible Don’t try to describe all of biology!

127 Importantly! CardioSHARE/SADI releases Semantic Systems from the constraints of pure logical reasoning If the defining property restriction of your class maps to e.g. a statistical or other analytical service, then classification into that class can be achieved through non-logical means

128 Biology is hard! (Too hard to define purely logically)

129 DEMO #2

130 The Holy Grail: Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

131

132 CardioSHARE architecture: Increasingly complex ontological layers organize data into richer concepts, even hypotheses Blood Pressure Hypertension Ischemia Hypothesis Database 1Database 2 Analysis AlgorithmXX SADI Web “agents”

133 Recap SADI interprets queries (SPARQL + OWL Class Definitions) Determine which properties are available, and which need to be discovered/generated Discovery of services via on-the-fly “classification” of local data with small OWL Classes representing service interfaces

134 Recap CardioSHARE encapsulates individual data retrieval and analysis workflows into OWL Classes An ontology of 1 class Low-cost, high accuracy

135 Semantic Web An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had not anticipated.

136 What we achieve Re-interpretation : The SADI data-store simply collects properties, and matches them up with OWL Classes in a SPARWL query and/or from individual service provider’s WS interface

137 What we achieve Novel re-use: Because we don’t pre-classify, there is no way for the provider to dictate how their data should be used. They simply add their properties into the “cloud” and those properties are used in whatever way is appropriate for me.

138 What we achieve Data remains distributed – no warehouse! Data is not “exposed” as a SPARQL endpoint  greater provider-control over computational resources Yet data appears to be a SPARQL endpoint… no modification of SPARQL or reasoner required.

139 Recap CardioSHARE allows researchers to: - Ask questions in natural, intuitive ways - Execute complex analyses without a bioinformatician - Access output from databases and analytical tools in exactly the same way - Share hypotheses and models in an EXPLICIT manner - Evaluate other’s hypotheses over your own data

140 Join us! SHARE and SADI are Open-Source projects undertaken with the collaboration of, and for the benefit of, the biological and bioinformatics research community. YOUR participation is welcome as we move from prototype to “the real world”!

141 Fin O | B | F


Download ppt "And now, for something completely different…. Web 2.0 + Web 3.0 = Web 5.0? The HSFBCY + CIHR + Microsoft Research SADI and CardioSHARE Projects Mark Wilkinson."

Similar presentations


Ads by Google