Prof. Dimitar Trajanov 08 Jun 2011, CERTH, Greece Semantic Sky: Cloud services integration using semantic web technologies
Agenda o Introduction: o Semantic web technologies basics o Semitic sky architecture o Examples o Conclusion
Introduction
o Cloud computing refers to the on-demand provision of computational resources (data, software) via a computer network o Cloud Computing Stack SaaS - Software as a Service PaaS - Platform as a Service IaaS - Infrastructure as a Service Cloud computing
Cloud computing types
o Obtained from different sources: Web (Facebook, Twitter, …) Intranet ( , Enterprise applications, …) Local data (local documents, …) o The number of information sources is increasing rapidly Increased number of publicly available services Increasing number of cloud services with specialized functionalities Increased number of enterprise application o Depending on information type, we mainly take some actions, e.g. we share them or add them into a ToDo list The information we work with in our every day live
o Interchange data among information sources o Need of complex and composite actions o Actions require a certain amount of time (get/copy the data, change the context, transfer the data, execute an action in destination service) o Services and the data are placed on different locations and infrastructures The problem
o To develop a software platform which will provide the users with a unified and simple composite approach to the different services they use, and with a simple flow of information from one infrastructure to another. o To come to such a design, a large number of partial problems will have to be solved Mechanisms for detection of the entities which are found within texts and information that we get from different services. Based on the context in which the user is working, to offer actions (services) that can be performed on the entities. Integration with local working environment of the user. Motivation
o The system is called “SemanticSky”, because it is an environment where many cloud services will exist and interact with each-other o It is based on semantic web technologies o Reuse of known ontologies (FOAF, AIISO, University Ontology,GeoNames, …) Solution: Semantic Sky
o There are projects that are focused on the connectivity of different cloud infrastructures (mOSAIC, SITIO, …) o Microsoft Outlook plug-in Xobni ffers fast search and people-based navigation of archives. Mashin organizes information extracted from history contextually. o Google mail plug-in Related work
o Babylon-Enterprise is a web-configured client-server system based on a Windows program (Babylon- Enterprise Client) installed on the end-user’s workstation and an enterprise application server (Babylon-Enterprise Server). Gives the ability to access all enterprise information and data from every working environment. o Greplin is a personal search engine that allows you to search all your online data in one place. Related work
Semantic Web Technologies
Chapt er 1 A Semanti c Web Primer 13 A Layered Approach o The development of the Semantic Web proceeds in steps Each step building a layer on top of another Principles: o Downward compatibility o Upward partial understanding
Current Semantic Web Stack Chapt er 1 A Semanti c Web Primer 14
Semantic Web Open Standards o RDF – Store data as “triples” o OWL – Define systems of concepts called “ontologies” o Sparql – Query data in RDF o SWRL – Define rules o GRDDL – Transform data to RDF
RDF “Triples” o the subject, which is an RDF URI reference or a blank nodeRDF URI referenceblank node o the predicate, which is an RDF URI referenceRDF URI reference o the object, which is an RDF URI reference, a literal or a blank nodeRDF URI referenceliteralblank node Source: SubjectObject Predicate
RDBMS vs Triplestore SPO Person Table f_name jim nova chris lew ID l_name wissner spivack jones tucker Colleagues Table SRC-ID TGT-ID SubjectPredicateObject 001isAPerson 001firstNameJim 001lastNameWissner 001hasColleague isAPerson 002firstNameNova 002lastNameSpivack 002hasColleague isAPerson 003firstNameChris 003lastNameJones 003hasColleague isAPerson 004firstNameLew 004lastNameTucker
Merging Databases in RDF is Easy SPO SPO SPO
Ontologies o The term ontology originates from philosophy The study of the nature of existence o Different meaning from computer science An ontology is an explicit and formal specification of a conceptualization o Ontologies provide a shared understanding of a domain (semantic interoperability) overcome differences in terminology mappings between ontologies o There are many available onotologies for different domains
Chapt er 1 A Semanti c Web Primer 20 Typical Components of Ontologies o Terms denote important concepts (classes of objects) of the domain e.g. professors, staff, students, courses, departments o Relationships between these terms: typically class hierarchies a class C to be a subclass of another class C' if every object in C is also included in C' e.g. all professors are staff members
Further Components of Ontologies o Properties: e.g. X teaches Y o Value restrictions e.g. only faculty members can teach courses o Disjointness statements e.g. faculty and general staff are disjoint o Logical relationships between objects e.g. every department must include at least 10 faculty
Semantic Sky Architecture
System Overview Desktop-client Browser Plug-in Semantic Sky Resource Retrieval Action Invocation Semantic Sky Sparql Endpoint Ontology Cloud Service 1 Cloud Service 2 Cloud Service 3 Cloud plug-in
System Architecture
Knowledge base o RDF data store o Apache Lucene is used as indexing engine o Each triple (statement), rdf:class and rdf:property are indexed as separated entities (Lucene Document) o Extensible
Knowledge base extension o Owl/Rdf file upload Using Jena API to extract semantic resources Calls the indexer to index resources o Owl/Rdf URI Paste the link to the Owl or Rdf document Jena extract the resources and passes them to the Lucena indexer o SPARQL endpoints By providing a URL to the endpoint Connect to the endpoint address and fetch all data Lucene indexes the fetched data from the endpoint
Web service repository o Used for faster service discovery o Semantically annotated web services Service input types Service output types o Any ontology can be used for annotation of the web services o Extensible
Extending the WS Repository o Using existing web services Annotating using the SAWSDL standard Annotation tool developed Import the annotated WSDL file into the repository o Creating new web services Develop the web service Repeat the steps for existing web services o Using REST web services Tool for semantic mapping of REST services (in progress)
Extensibility in action o We have system that enables Task Management and exports web services for this. o We want to add new functionality about Task Management. o What do we do to enable this? Import the ontology for this domain, if there is no any Annotate the services (Preferably with our tool) Define actions It is on and can be used
Data Linking and Inference Engine o System entry point Accepts text Return semantic resources correlated with the text o For each token (word) in the text, we extract all resources related to it o Extraction is made using Apache Lucene Search o All Lucene entities retrieved from the search are converted to semantic resources
For each resource, get its properties Ontology SPARQL endpoints Find the resources for the text from the index Ontology index Group resources by type Type : [ {p1:v1,p2:v2,..,uri:#res1}, {p1:v1',p3:v3',..,uri:#res2} ] Data Linking and Inference Engine
Type : [ {p1:v1,p2:v2,..,uri:#res1}, {p1:v1',p3:v3',..,uri:#res2} ] Action Search Align resource types as inputs Semantically annotated web services Find all operations from the Repository for these inputs Semantic WS Repository Are there entries in the repository Find all compositions with these inputs and store them in the repository Assemble action XML result uid.... Action Search no yes
Operations Retrieval o Searching operations (web service methods) from the repository o Service compositions are made when possible o Uses the types of the extracted resources to find the operations o User_Defined_Input rdfs:Class used to denote that this input will be rendered as input text at the client side Implicit input type it will be placed in the inputs list, even when no resource from this type is extracted The user must provide the value for this type
Action Form User_Defined_Input
Find transformer for Resource type Transformers uid.... Type : [ {p1:v1,p2:v2,..,uri:#res1}, {p1:v1',p3:v3',..,uri:#res2} ] Transform the resource Transform the actions UI Generator
Action Invocation o The generated form contains all parameters for action invocation o Single service for action invocation It assembles the parameters and invokes the actual services The result is returned back to the user
Implementation details Data integration - Enterprise data - Opening the data
Example: University data o Most of today information systems (IS) store their data in relational databases o This data is published in a structured way, in RDF format on the Semantic Web o What we publish? basic information about the Faculties and deeper information about our CSE Faculty (Institutes, Modules, Programs, Courses, Subjects, Employees) o Few universities, most of them in the UK, have already started open data projects, which are still in development
Semantic data publishing o There are many tools for publishing the content of relational databases on the Semantic Web like D2R Server, Oracle Spatial 11g, Asio Semantic Bridge, SquirrelRDF and many others o We use the D2R Server o D2R Server enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL query language o
The steps
Open Linked data o Our goal is five star data – data linked to other people’s data to provide context o We connect to well known ontologies, which already have definitions for our types of data o D2RQ Mapping Language is a declarative mapping language for describing the relation between an ontology and an relational data model
Ontologies Ontologies o For describing our data we need few well known ontologies o The Web Ontology Language (OWL) o FOAF - ontology describing persons, their activities and their relations to other people and objects, it is used for the employees o The Academic Institution Internal Structure Ontology (AIISO) - provides classes and properties to describe the internal organizational structure of an academic institution o University Ontology – same purpose as AIISO, but contains some additional features needed for describing our data o GeoNames Ontology - makes it possible to add geospatial semantic information to the Word Wide Web
Sparql Endpoint o Changes in the mapping.n3 file for connecting with the ontologies have to be made manually o After the.n3 file is edited, it can be run with D2R Server and in the Sparql Endpoint, queries can be written using the prefixes from the ontologies o The Sparql Endpoint shows the data in triples: subject, predicate and object o
University Open Data Example o This query shows basic information about the Professor Trajanov and the courses he teaches o tech2.feit.ukim.edu.mk%2Fopen-data%2Fresource%2Fdbo.EMPLOYEES%2F64 tech2.feit.ukim.edu.mk%2Fopen-data%2Fresource%2Fdbo.EMPLOYEES%2F64
University Open Data Example o This query shows basic information about the subject Network Programming and the courses of that subject o tech2.feit.ukim.edu.mk%2Fopen-data%2Fresource%2Fdbo.SUBJECTS%2F1 tech2.feit.ukim.edu.mk%2Fopen-data%2Fresource%2Fdbo.SUBJECTS%2F1
D2R Server Mapping tool o Manually editing the.n3 file is time consuming, so we created application called the D2R Server Mapping Tool to connect to the ontologies o The user first enters the database which wants to be published, then the application generates.n3 file using the D2R server o The Mapping tool then converts the.n3 file to.rdf file, format which can be easily shown in a visual xml-alike tree o The user can choose some class or property from the tree and just add or remove reference from an ontology o Ontologies can also be added and removed from the application
D2R Server Mapping tool
Implementation details Data access Cloud plug-in Desktop application
Google Gadgets Embed application's UI into Gmail, Calendar, Spreadsheets and Sites, using the OpenSocial standard
What are Gmail Gadgets? o Custom HTML & JavaScript components o Run within an iframe o Extend Gmail with additional functionality o Implement the Google gadgets API o Two types of Gmail Gadgets Sidebar Gadgets Contextual Gadgets
Gmail Sidebar Gadget
Gmail Contextual Gadget o Displayed at the bottom of individual messages o Triggered by contextual clues subject sender body o Example: the YouTube contextual gadget Triggered whenever a YouTube link appears in the body.
Gmail Contextual Gadget Implementation o Extractor Detects contextual clues Determines which types of content will trigger the gadget Passes the triggering content to the gadget o Gadget Specification Takes action based on the content passed in from an extractor Client-side logic and UI
Semantic Sky Contextual Gadget o Uses an body extractor to extract data Triggered on every message * Extracts the body text o Sends the extracted text to Semantic Sky server via the services provided by the core module o Receives and parses the JSON response o Generates contextual action forms based on the response received o Renders the UI HTML in the gadget iframe *Except on s containing non ASCII letters. This is a known Google Gadget API bug
Generating Action Forms o Generated dynamically by the client script, depending on the response received from the server o A list of available actions for the identified entities, along with the types of their input parameters are received in the response JSON object. o The client script generates input fields for each input parameter. The input fields are either select fields, or plain text input fields depending on the type of the input parameter the field is generated for. o Select fields are pre-populated with entities identified and returned from the server that match the input type of the input parameter
Action Form User_Defined_Input
Action Form
Action Form Examples
Invoking Actions o Input values are extracted from each input field in the action form o Values are packed into a JSON object o Request is sent to the service responsible for receiving action invocation requests on the server o The server parses the received object, and invokes the requested action
Desktop context extracor Internet based infrastructure for collaboration with desktop application Desktop side interconnection with the services The communication with the public services is established with public API’s Semantic annotated web services connection with the Semnatic sky cloud The OpenCalais web service used for automatic semantic annotation Selected text search
Technology used and OS interaction C#.NET (framework v4.0) platform is used for developing the application Win32 API for interaction with the OS System hook intercepts windows messages and detects mouse and keyboard activity while the application runs in background. Hotkey activation of the application Initiates copying of the selected text in the Clipboard Gets whole text from the currently active window for the semantic search Activates the application Clipboard data retrieval (text or image) When the application is activated, data from the Clipboard is retrieved automatically. Processing of the data begins
Architecture The application is based on 4 main objects Sources Objects Types Actions Other parts of the system External ontology Web services
Sources Information from the external services based on the selected text search Classes which make the connection to the external web services on the cloud Facebook API Gmail API Wikipedia public SOAP web service Open Calais public SOAP web service Semantic sky RESTfull web services
Objects and types The information retrieved from the services is parsed into objects Every object has a type Object information depends on the type Object type is determined by semantic search on external ontology
Actions Based on the type, different actions can be performed on the objects Simple actions Write Write on facebook wall etc Semantic sky web services Gets all actions for the annotated objects
Use case of the application Select text on any window Hotkey click gets the selected text in the application for processing Found object are shown by type and by source Actions for the found objects on the right side of the application Execute actions
Use of the Gmail and Facebook API Semantic desktop application uses API libraries to connect to Gmail and Facebook Login is needed for both services to retrieve contacts Simple actions with contacts Write on the wall Write
Use of the Wikipedia and Open Calais Connection to Wikipedia information is established by SOAP web service Retrieves related terms to the input search string Open Calais service uses the CalaisDotNet library API Uses semantic search to recognize objects, their type and relevance
Use of the Semantic Sky source The desktop application is connected to the Semantic Sky source with 3 web services textAnnotations – gets information about the semantic resources found in the text input actionsForText – gets all actions for the semantic resources invokeAction – invokes the selected action for a specific semantic resource
Conclusion
Conclusion o Semantic Sky is the framework which enables connectivity and integration, not only of different cloud services, but also of local data placed on the user machine. o Automation of the use of different services o Intelligent engine that proposes actions that could (or should) be executed by the user o Google contextual gadget developed o Desktop application developed (includes additional cloud integration) o We join the Open Data trend by publishing some of the faculty data
o Extending the core system with some public cloud services o Develop browser plug-in o Add personalization o Add system learning by example o Creating semantic copy/paste for entity transfer between applications (copy a person form Facebook and paste it in your CRM) Future work
THANK YOU !!!