Lifecycle Support for Networked Ontologies And related research in KMi Mathieu dAquin and Marta Sabou And also Enrico Motta, Martin Dzbor, Lucia Sepia, Sofia Angeletou, Laurian Gridinoc and Claudio Baldassarre
IST NeOn-project.org Slide 2 The Semantic Web A large scale, heterogenous collection of formal, machine processable, ontology-based statements (semantic metadata) about web resources and other entities in the world, expressed in a XML-based syntax Lee, J., Goodwin, R. (2004) The Semantic Webscape: a View of the Semantic Web. IBM Research Report.
Ontology Metadata UoD Elementaries - The Watson Blog "Oh dear! Where the Semantic Web is going to go now?" -- imaginary user 23 en Watson team Thu, 01 Mar :49:52 GMT Pebble ( … Elementaries - The Watson Blog "Oh dear! Where the Semantic Web is going to go now?" -- imaginary user 23 en Watson team Thu, 01 Mar :49:52 GMT Pebble ( … Zen wisteria Mathieu d'Aquin … Zen wisteria Mathieu d'Aquin … <rdfs:comment rdf:datatype=" >The Knoledge Media Institute of the Open University, Milton Keynes UK … <rdfs:comment rdf:datatype=" >The Knoledge Media Institute of the Open University, Milton Keynes UK … DOAP FOAF DC RSS TAP WORDNET NCI Galen Music … … … … … …
IST NeOn-project.org Slide 4 SW = A Conceptual Layer over the web
IST NeOn-project.org Slide 5 SW is Heterogeneous!
IST NeOn-project.org Slide 6 The NeOn Project NeOn is not 100% dependent on the SW –NeOn is really about developing large scale semantic applications. However the SW as a large-scale, heterogeneous semantic layer over the web provides a natural focus for characterizing the NeOn project. In other Words, the issues characterizing the NeOn project… –heterogeneity, –large-scale semantics, –metadata and ontology dynamics, –distributed development, etc. …perfectly fit the emerging semantic web scenario
IST NeOn-project.org Slide 7 Economic vision underpinning NeOn The vision of a knowledge-based economy supported by the availability of large scale semantic information –Key is the ability to build open, ontology-based applications able to scale up to large quantities of data and to evolve, as heterogeneous data are dynamically generated on the (semantic) web Ontologies become central –Semantic web built around ontologies –Ontologies key enablers for handling interoperability
IST NeOn-project.org Slide 8 Current technological limitations No adequate infrastructure for the whole application development lifecycle of the envisaged applications Specifically, current infrastructures not effective –Do not scale up –Poor support for rapid development of large applications by reuse Reuse typically so expensive that people prefer to re-build from scratch Problem concerns both the lack of methodologies as well tools/techniques –Poor support for managing the evolution of an application –Poor support for collaborative development –Limitations of current user interfaces E.g., support for navigating several large ontologies at the same time Software crisis all over again?
IST NeOn-project.org Slide 9Ambition Overall goals –major integrative effort aiming at providing a radical leap forward by developing the infrastructure needed to make large-scale semantic application development feasible and cost-effective –lowering the entry barrier for organizations needing semantic solutions –targeting robustness, scalability, multi-ontology scenarios, multi- user development, multi-lingual solutions,.. Emphasis –On concrete engineering solutions –On concrete support for life-cycle activities –On measurable improvements Ambition on the technology level (4 yrs) –NeOn as the standard reference infrastructure for large-scale semantic web application development
IST NeOn-project.org Slide 10 Key Planned Outputs System-level contributions (methodology, architecture, toolkit) –An open, service-centred reference architecture for managing the complete lifecycle of networked ontologies and meta-data –The NeOn toolkit for system development with NOs –The NeOn methodology for sys. development with NOs Contributions to foundational research –Methods and tools for managing dynamic, evolving, possibly inconsistent and contextually grounded networked ontologies –Methods and tools for supporting large-scale collaborative development Also… –Sector-level: Three innovative testbeds in two sectors –Community-level: Creation of an active community of users and developers
IST NeOn-project.org Slide 11Testbeds Managing fishery knowledge to support automatic alert mechanisms –United Nations Food and Agriculture Organization E-Invoice management in the pharmaceutical sector –AECE/PharmaInnova Integration and management of information about pharmaceutical products –Atos Origin
IST NeOn-project.org Slide 12Partners KMi, the Open University University of Sheffield Universität Koblenz-Landau Software AG Universität Karlsruhe Ontoprise GmbH Institute 'Jozef Stefan Institut National de Recherche en Informatique et en Automatique Asociación Española de Comercio Electrónico - PharmaInnova Cluster Universidad Politécnica de Madrid Atos Origin SAE Intelligent Software Components SA Consiglio Nazionale delle Ricerche Food and Agriculture Organization of the United Nations
IST NeOn-project.org Slide 13 NeOn at KMi: Supporting and developing next generation Semantic Web applications
IST NeOn-project.org Slide 14 Example: Magpie
IST NeOn-project.org Slide 15 Example: PowerAqua
IST NeOn-project.org Slide 16 Next Generation Semantic Web Applications
IST NeOn-project.org Slide 17 Next Generation Semantic Web Applications NG SW Application Able to exploit the SW at large –Dynamically retrieving the relevant semantic resources –Combining several, heterogeneous Ontologies –… Need tools to efficiently access the knowledge available on the SW: a Gateway…
IST NeOn-project.org Slide 18Swoogle… Existing Semantic Web Gateway, but…
IST NeOn-project.org Slide 19 Limitations of Swoogle No quality control mechanisms –Many ontologies are duplicated –No quality information provided Limited Query/Search mechanisms –Only keyword search, we need more powerful query methods (e.g., ability to pose formal queries) Limited range of ontology ranking mechanisms –Swoogle only uses a 'popularity-based' one No support for relations between ontologies –Duplication, incompatibility (contradiction), modularization, versioning, etc.
IST NeOn-project.org Slide 20
IST NeOn-project.org Slide 21 Watson: (truly) a Gateway to the SW
IST NeOn-project.org Slide 22 Watson Architecture Keyword Search SPARQL Query Crawling Parsing (Jena) Validation/ Analysis Indexing RepositoryURLsMetadataIndexes populates used extracted retrieved Ontology Exploration queries request WWWWWW discovered CollectingAnalyzing Querying
IST NeOn-project.org Slide 23 The current content of Watson The current demo version of Watson have collected more than 7500 (syntactically unique) semantic documents –Could do more, but limited by our current test server… –2983 RDF or RDF(S), 1997 OWL, 1391 DAML, 302 RSS, 83 FOAF, 133 mixed (e.g, OWL+DAML(5) or OWL+FOAF+RSS(1)) –Lots of ontologies are in OWL FULL (3x the number of OWL Lite) –… but most of the ontologies use only a very restricted sub-part of the expressivity of OWL and DAML, e.g., only 147 go beyond ALC role transitivity is used in only 11 ontologies –1304 (semantic) duplications detected (to be refined) –About 300,000 entities extracted –typeOf and subClassOf are the most popular relations –Language information is rarely used but: English is clearly the most employed language Then come in this order de, fr, fi, pt, es, tr, nl
IST NeOn-project.org Slide 24 Example: selection of the complementary ontologies
IST NeOn-project.org Slide 25 Formal Queries and relation discovery…
IST NeOn-project.org Slide 26 Going Further: Knowledge Selection t2 t1 tn t1 t3 t4 t5 Ontology Selection t1 t2 t3 t4 t5 … tn Web t2 t1 t3 t4 t5 tn The ideal world (Web)The real world (Web) Knowledge Selection Ontology Modularization t1 tn t2 Ontology Modularization t5 t4 t3 Ontology Modularization t3 t1 t2 t1 t3 t4 t5 tn Ontology Merging
IST NeOn-project.org Slide 27 Modularization: Example 2 …
IST NeOn-project.org Slide 28 Modularization: Example 2 Resulting module Cancer Lung AdenoCarcinoma
IST NeOn-project.org Slide 29Implementation
IST NeOn-project.org Slide 30 Implementation Integration with ontology selection
IST NeOn-project.org Slide –Label similarity methods e.g., Full_Professor = FullProfessor –Structure similarity methods Using taxonomic/property related information Ontology Matching
IST NeOn-project.org Slide 32 New paradigm: use of background knowledge A B Background Knowledge (external source) A B R R
IST NeOn-project.org Slide 33 Where the background knowledge comes from? Aleksovski et al. EKAW06 A richly axiomatized domain ontology Assumes that a suitable domain ontology is available. van Hage et al. ISWC05 Google and an online dictionary in the food domain Noise introduce by the use of IR technique on a Web corpus AB rel + OnlineDictionary IR Methods
IST NeOn-project.org Slide 34 rely on online ontologies (Semantic Web) to derive mappings ontologies are dynamically discovered and combined AB rel Semantic Web Our Approach: Using the SW as background knowledge Exploit the Semantic Web: next generation Semantic Web application Does not rely on any pre- selected knowledge sources.
IST NeOn-project.org Slide 35Examples ka2.rdf Researcher AcademicStaff Semantic Web Researcher AcademicStaff ISWC SWRC Both concepts are found in one ontology Ham SeaFood Semantic Web Ham SeaFood Meat SeaFood Concepts are related across several ontologies Agrovoc NALT pizza-to-go wine.owl NALT
IST NeOn-project.org Slide 36 Evaluation: 1600 mappings, two teams Average precision: 70% (comparable/better than standard) (derived from 180 different ontologies) Matching AGROVOC (16k terms) and NALT(41k terms) Large Scale Evaluation
IST NeOn-project.org Slide 37 Back to the Web: Folksonomies Tags are popular, easy to use annotations But they are not structured… No computable semantics…
IST NeOn-project.org Slide 38 Finding tagged images Flower Rose Lilac Flower Tulip Flowers CutFlower Tulip
IST NeOn-project.org Slide 39 Flower Rose Lilac Flower Tulip Flowers CutFlower Tulip Finding tagged images – FLOWER
IST NeOn-project.org Slide 40 What if … Rose Tulip Flower Lilac …folksonomies were semantically richer
IST NeOn-project.org Slide 41 Flower Rose Lilac Flower Tulip Flowers CutFlower Tulip Finding tagged images – FLOWER (II) Rose Tulip Flower Lilac
IST NeOn-project.org Slide 42 Learning Relations Between Tags Tags {camera, digital slr, photograph} {damage, flooding, hurricane, katrina, Louisiana} Clusters Digital SLR cameraphotograph takenWith Ontologies NLP/Clustering Find and combine Online ontologies +modularizaton +matching +modularizaton +matching
IST NeOn-project.org Slide 43Examples
IST NeOn-project.org Slide 44Examples
IST NeOn-project.org Slide 45Examples
IST NeOn-project.org Slide 46 Read more… NeOn Next Generation Semantic Web Applications E. Motta and M. Sabou. Next Generation Semantic Web Applications. AWC E. Motta and M. Sabou. Language Technologies and the Evolution of the Semantic Web. LREC E, Motta. Knowledge Publishing and Access on the Semantic Web: A Socio-Technological Analysis. IEEE Intelligent Systems, Vol.21, 3, (88-90). Waston M. dAquin, M. Sabou, M. Dzbor, C. Baldassarre, L. Gridinoc, S. Angeletou, and E. Motta. WATSON: A Gateway for the Semantic Web. Accepted for the poster session of ESWC Ontology Modularization M. dAquin, M. Sabou, and E. Motta. Modularization: a Key for the Dynamic Selection of Relevant Knowledge Components. ISWC 2006 workshop on Modular Ontologies (WoMO 2006). Ontology Matching M. Sabou, M. dAquin and E. Motta. Using the Semantic Web as Background Knowledge in Ontology Mapping. ISWC 2006 workshop on Ontology Mapping (OM 2006). Linking folksonomies to ontologies L.Specia and E. Motta. Integrating Folksonomies with the Semantic Web. Accepted for ESWC 2007.
IST NeOn-project.org Slide 47 Thank you!