Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open.

Slides:

Advertisements

Similar presentations

Presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology.

Advertisements

Ontology Assessment – Proposed Framework and Methodology.

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.

Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.

WP8: User Centred Applications Enrico Motta, Marta Sabou, Vanessa Lopez, Laurian Gridinoc, Lucia Specia Knowledge Media Institute The Open University Milton.

A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.

Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.

Using Watson for Building Intelligent Applications in E-learning Mathieu d’Aquin The Knowledge Media Institute, The Open University

OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.

Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

NetworkedPlanet Networked Information – Networked Knowledge Topic Maps & Web 3.0 © 2007 Networked Planet Limited. Web 3.0 Technology Platform to enable.

Search Engines and Information Retrieval

Exploiting the Semantic Web: Next Generation Semantic Web Applications in KMi Watson, PowerMagpie, PowerAqua, … Mathieu d’Aquin Laurian Gridinoc Vanessa.

The Semantic Web Week 13 Module Website: Lecture: Knowledge Acquisition / Engineering Practical: Getting to know.

Ontology-Based Applications in the Age of the Semantic Web Prof Enrico Motta, PhD Knowledge Media Institute The Open University Milton Keynes, UK.

Watson Supporting Next Generation Semantic Web Applications Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Marta Sabou, Sofia Angeletou, Enrico.

Advanced Distributed Learning. Conditions Before SCORM  Couldn’t move courses from one Learning Management System to another  Couldn’t reuse content.

IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.

Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.

ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.

Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.

Characterizing Semantic Web Applications Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK.

The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.

Semantic Web for E-Science and Education Enrico Motta Knowledge Media Institute The Open University, UK.

Alternatives to Metadata IMT 589 February 25, 2006.

Towards a new generation of semantic web applications Prof. Enrico Motta, PhD Knowledge Media Institute The Open University Milton Keynes, UK.

University of Jyväskylä – Department of Mathematical Information Technology Computer Science Teacher Education ICNEE 2004 Topic Case Driven Approach for.

SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.

Robots at Work Dr Gerard McKee Active Robotics Laboratory School of Systems Engineering The University of Reading, UK

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

 A set of objectives or student learning outcomes for a course or a set of courses.  Specifies the set of concepts and skills that the student must.

The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.

Intégration Sémantique de l'Information par des Communautés d'Intelligence en Ligne ISICIL.

1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.

Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.

Search Engines and Information Retrieval Chapter 1.

Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.

Knowledge representation

Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.

Towards an ecosystem of data and ontologies Mathieu d’Aquin and Enrico Motta Knowledge Media Institute The Open University.

The Brain Project – Building Research Background Part of JISC Virtual Research Environments (Phase 3) Programme Based at Coventry University with Leeds.

Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.

Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.

Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.

BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™

SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.

Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.

MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.

Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.

And the Watson Plugin for the NeOn Toolkit. IST NeOn-project.org The Semantic Web is growing… #SW Pages.

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK

Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University

Of 24 lecture 11: ontology – mediation, merging & aligning.

Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.

PROCESS ASSESSMENT AND IMPROVEMENT. Process Assessment  A formal assessment did not seem financially feasible at the onset of the company’s process improvement.

Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.

SEMANTIC WEB Presented by- Farhana Yasmin – MD.Raihanul Islam – Nohore Jannat –

Exploiting Large Scale Web Semantics

Exploring Scholarly Data with Rexplore

COMP62342: Ontology Engineering for the Semantic Web

Language Technologies and the Semantic Web: An Essential Relationship.

Presentation transcript:

Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open University

Aims of the Talk What is the Semantic Web –Perspectives The SW as a ‘web of data’ The SW as a new context in which to build semantic applications and an unprecedented opportunity in which to address some classic AI problems –Typical misconceptions What the SW is not! Semantic Web for Users –Applications that do something interesting and useful to users, by exploiting available web semantics

The Semantic Web as a ‘Web of Data’ Making data available to SW-aware software

Enrico Motta Enrico Motta Knowledge Technologies Semantic Web Ontologies Problem Solving Methods Knowledge Modelling Knowledge Management London Luton Airport LTN Luton, United Kingdom AquaLog

The web of SW documents

Current status of the semantic web million semantic web documents –Expressed in RDF, OWL, DAML+OIL 7K-10K ontologies –These cover a variety of domains - music, multimedia, computing, management, bio-medical sciences, upper level concepts, etc… Hence: –To a significant extent the semantic web is already in place –However, domain coverage is very uneven –Still primarily a research enterprise, however interest is rapidly increasing in both governmental and business organizations “early adopters” phase The above figures refer to resources which are publicly accessible on the web

CS Dept Data AKT Reference Ontology RDF Data Bibliographic Data Geography

A ‘corporate ontology’ is used to provide a homogeneous view over heterogeneous data sources. Often tackle Enterprise Information Integration scenarios Hailed by Gartner as one of the key emerging strategic technology trends –E.g., Garlik is a multi-million startup recently set up in UK to support personal information management, which uses an ontology to integrate data mined from the web on a large scale “Corporate Semantic Webs”

AquaLog

Applications that exploit large scale semantic content

The web of data

Gateways to the SW Application Semantic Web

Sophisticated quality control mechanism –Detects duplications –Fixes obvious syntax problems E.g., duplicated ontology IDs, namespaces, etc.. Structures ontologies in a network –Using relations such as: extends, inconsistentWith, duplicates Provides interfaces for both human users and software programs Provides efficient API Supports formal queries (SPARQL) Variety of ontology ranking mechanisms Modularization/Combination support Plug-ins for Protégé and NeOn Toolkit Very cool logo!

Case Study 1: Automatic Alignment of Thesauri in the Agricultural/Fishery Domain

Method Concept_A (e.g., Supermarket) Concept_B (e.g., Building) Scarlet Semantic Web Semantic Relation ( ) Deduce Access -SCARLET - matching by Harvesting the SW -Automatically select and combine multiple online ontologies to derive a relation

Two strategies Supermarket Building Supermarket Shop PublicBuilding Building Scarlet CholesterolOrganicChemical Cholesterol Steroid Lipid OrganicChemical Scarlet Steroid Deriving relations from (A) one ontology and (B) across ontologies. Semantic Web (A)(B)

Matching: AGROVOC UN’s Food and Agriculture Organisation (FAO) thesaurus descriptor terms non-descriptor terms NALT US National Agricultural Library Thesaurus descriptor terms non-descriptor terms Experiment

226 Used Ontologies htechsight/Technologies.daml

Evaluation 1 - Precision Manual assessment of 1000 mappings (15%) Evaluators: –Researchers in the area of the Semantic Web –6 people split in two groups Results: –Comparable to best results for background knowledge based matchers.

Evaluation 2 – Error Analysis

Case Study 2: Folksonomy Tagspace Enrichment

Tagging as opposed to rigid classification Dynamic vocabulary does not require much annotation effort and evolves easily Shared vocabulary emerge over time –certain tags become particularly popular Features of Web2.0 sites

Limitations of tagging Different granularity of tagging –rome vs colosseum vs roman monument –Flower vs tulip –Etc.. Multilinguality Spelling errors, different terminology, plural vs singular, etc… This has a number of negative implications for the effective use of tagged resources –e.g., Search exhibits very poor recall

Giving meaning to tags

1. Mapping a tag to a SW element "japan" What does it mean to add semantics to tags? 2. Linking two "SW tags" using semantic relations {japan, asia}

Applications of the approach To improve recall in keyword search To support annotation by dynamically suggesting relevant tags or visualizing the structure of relevant tags To enable formal queries over a space of tags –Hence, going beyond keyword search To support new forms of intelligent navigation –i.e., using the 'semantic layer' to support navigation

Concept and relation identification No END Remaining tags? Clustering Google Folksonomy Cluster tags Cluster 1 Cluster 2 Cluster n … 2 “related” tags Find mappings & relation for pair of tags Yes Analyze co-occurrence of tags Co-occurence matrix Pre-processing Tags Group similar tags Filter infrequent tags Concise tags Clean tags Wikipedia SW search engine

participant innovation event developer activity creator planning example application user admin resource typeRangecomponent interface partici- patesIn in-event archive Information Object has-mention-of Examples Cluster_1 : { admin application archive collection component control developer dom example form innovation interface layout planning program repository resource sourcecode}

Examples Cluster_2 : { college commerce corporate course education high instructing learn learning lms school student} education training 1,4 qualification corporate 1 institution university 2,3 college 2 postSecondary School 2 school 2 student 3 studiesAt course 3 offersCoursetakesCourse activities 4 learning 4 teaching

Faceted Ontology Ontology creation and maintenance is automated Ontology evolution is driven by task features and by user changes Large scale integration of ontology elements from massively distributed online ontologies Very different from traditional top-down- designed ontologies

Case Study 3: Reviewing and Rating on the Web

Revyu.com

expertise the source has relevant expertise of the domain of the recommendation-seeking; this may be formally validated through qualifications or acquired over time. experience the source has experience of solving similar scenarios in this domain, but without extensive expertise. impartiality the source does not have vested interests in a particular resolution to the scenario. affinity the source has characteristics in common with the recommendation seeker, such as shared tastes, standards, values, viewpoints, interests, or expectations. track record the source has previously provided successful recommendations to the recommendation seeker. Trust Factors

subjective affinityexpertise experience objective solution factors emphasised

Applying the framework to revyu.com Affinity –Operationalised as the degree of overlap in items reviewed, and in ratings given Experience –Proxy metric: Usage of particular tags (as proxies for topics) Experience scores based on tagging data Integrates also data from del.icio.us for those users who have chosen to publish their del.icio.us account on FOAF Expertise –Proxy metric: Credibility –Captures the social aspect of expertise: endorsement

Using trust factors for ranking reviews

PowerAqua and PowerMagpie

How does the Semantic Web relate to Artificial Intelligence research?

AI as Heuristic Search

The knowledge-based paradigm in AI “Today there has been a shift in paradigm. The fundamental problem of understanding intelligence is not the identification of a few powerful techniques, but rather the question of how to represent large amounts of knowledge in a fashion that permits their effective use” Goldstein and Papert,1977

Knowledge Representation Hypothesis in AI Any mechanically embodied intelligent process will be comprised of structural ingredients that we as external observers naturally take to represent a propositional account of the knowledge that the overall process exhibits, and independent of such external semantic attribution, play a formal but causal and essential role in engendering the behaviour that manifests that knowledge Brian Smith, 1982

Knowledge-Based Systems Large Body of Knowledge Intelligent Behaviour

The Knowledge Acquisition Bottleneck Large Body of Knowledge Intelligent Behaviour KA Bottleneck Knowledge

The Cyc project

Problem Solving Method Generic Task Domain Model Mapping Knowledge Application-specific Problem-Solving Knowledge Application Configuration Parametric Design Library of PSMs Mapping Ontology Ontology Structured libraries of reusable components Classification Scheduling Etc…

The next knowledge medium “An information network with semi-automated services for the generation, distribution, and consumption of knowledge” However, our approach based on structured libraries of problem solving components only addressed the economic cost of KBS development…

SW as Enabler of Intelligent Behaviour Intelligent Behaviour Both a platform for knowledge publishing and a large scale source of knowledge

KBS vs SW Systems Classic KBSSW Systems ProvenanceCentralizedDistributed SizeSmall/MediumExtra Huge Repr. SchemaHomogeneousHeterogeneous QualityHighVery Variable Degree of trustHighVery Variable

Key Paradigm Shift Classic KBSSW Systems IntelligenceA function of sophisticated, logical, task- centric problem solving A side-effect of being able to integrate different types of reasoning to handle size and heterogeneous quality and representation

Conclusions

Typical misconceptions… “The SW is a long-term vision…” –Ehm…actually… it already exists… “The SW will never work because nobody is going to annotate their web pages” –The SW is not about annotating web pages, the SW is a web of data, most of which are generated from DBs, or from web mining software, or from applications which produce SW technology “The idea of a universal ontology has failed before and will fail again. Hence the SW is doomed” –The SW is not about a single universal ontology. Already there are around 10K ontologies and the number is growing… –SW applications may use 1, 2, 3, or even hundreds of ontologies.

Large Scale Distributed Semantics Widespread production of formalised knowledge models (ontologies and metadata), from a variety of different groups and individuals –E.g., legal, bio-medical, governmental, environmental, music, art, multimedia, computing, etc.. –“Knowledge modelling to become a new form of literacy?” Stutt and Motta, 1997 This large scale heterogenous resource will enable a new generation of semantic-aware technologies These developments may provide a new context in which to address the economic barriers to KBS development The SW already exists to some extent, however there is still a way to go, before it will reach the required degree of maturity

Large Scale Distributed Semantics Much like AI, the semantic web will only succeed if it becomes ubiquitous and hidden “There's this stupid myth out there that A.I. has failed, but A.I. is everywhere around you every second of the day. People just don't notice it. You've got A.I. systems in cars, tuning the parameters of the fuel injection systems. When you land in an airplane, your gate gets chosen by an A.I. scheduling system. Every time you use a piece of Microsoft software, you've got an A.I. system trying to figure out what you're doing, like writing a letter, and it does a pretty damned good job. Every time you see a movie with computer-generated characters, they're all little A.I. characters behaving as a group. Every time you play a video game, you're playing against an A.I. system.” Rodney Brooks