Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas December 2007.

Similar presentations


Presentation on theme: "Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas December 2007."— Presentation transcript:

1 Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas December 2007

2 Outline l Vision l XML l RDF l Ontology/OWL l Rules l Applications l Ontology Engineering l Web Services l Reference A semantic web primer: Antoniou and van Harmlen

3 Today’s Web l High recall, low precision: Too many web pages resulting in searches, many not relevant l Sometimes low recall l Results sensitive to vocabulary: Different words even if they mean the same thing do not results in same web pages l Results are single web pages not linked web pages

4 From Today’s Web to the Semantic Web l Machine understandable web pages l Activities on the web such as searching with little or no human intervention l Technologies for knowledge management, e-commerce, interoperability] l Solutions to the problems faced by today’s web - Retrieving appropriate web pages, sensitive to vocabulary etc. - Semantic web applications including

5 Knowledge Management l Corporation Need - Searching, extracting and maintaining information, uncovering hidden dependencies, viewing information l Semantic web for knowledge management - Organizing knowledge, automated tools for maintaining knowledge, question answering, querying multiple documents, controlling access to documents

6 Business to Consumer E-Commerce l Users shopping on the web; wrapper technology is used to extract information about user preferences etc. and display the products to the user l Use of semantic web: Develop software agents that can interpret privacy requirements, pricing and product information and display timely and correct information to the use; also provides information about the reputation of shops l Future: negotiation among the behalf of the user

7 Business to Business E-Commerce l Organizations work together and carrying out transactions such as collaborating on a product, supply chains etc. With today’s web lack of standards for data exchange l Use of semantic web: XML is a big improvement, but need to agree on vocabulary. Future will be the use of ontologies to agree on meanings and interpretations

8 Personal Agents l John is a president of a company. He needs to have a surgery for a serious but not a critical illness. With current web he has to check each web page for relevant information, make decisions depending on the information provided l With the semantic web, the agent will retrieve all the relevant information, synthesize the information, ask John if needed, and then present the various options to John and also makes recommendations

9 Semantic Web Technologies l Explicit metadata - XML, RDF, etc. l Ontologies l Logic l Agents

10 Explicit metadata l Metadata is data about data l Need metadata to be explicitly specified so that different groups and organizations will know what is on the web l Using metadata, one can then carry out various activities such as searching, integration and executing actions l Metadata specification languages include XML and RDF

11 Ontologies l Explicit and formal specification of conceptualization describes a domain of discourse l Consists of concepts and prelateships between them l Web searches can exploit ontologies to facilitate the search process l Ontology languages include XML, RDF, OWL

12 Logic l Logic can be used to specify facts as well as rules l New facts and derived from existing facts based on the inference rules l Descriptive Logic is the type of logic that has been developed for semantic web applications

13 Agents l Agents are essentially processes that have evolved from object-oriented programming; agent is an active objects l Agents will use metadata to find resources on the web; ontologies will be used to interpret statements; logic will be used for drawing conclusions l Agents will not completely replace humans; but will make the tasks of the humans much easier.

14 Semantic Web vs Artificial Intelligence l Goal of Artificial Intelligence is to build an intelligent agent exhibiting human-level intelligence l Goal of the semantic web is to assist the humans in their day to day online activities

15 Layered Approach: Tim Berners Lee’s Vision www.w3c.org

16 What is XML all about? l XML is needed due to the limitations of HTML and complexities of SGML l It is an extensible markup language specified by the W3C (World Wide Web Consortium) l Designed to make the interchange of structured documents over the Internet easier l Key to XML used to be Document Type Definitions (DTDs) - Defines the role of each element of text in a formal model l XML schemas have now become critical to specify the structure - XML schemas are also XML documents

17 XML Elements XML Statement John Smith is a Professor in Texas This can be expressed as follows: John Smith Texas

18 XML Elements Now suppose this data can be read by anyone then we can augment the XML statement by an additional element called access as follows. John Smith Texas All, Read

19 XML Attributes Suppose we want to specify to access based on attribute values. One way to specify such access is given below. <Professor Name = “John Smith”, Access = All, Read Salary = “60K”, Access = Administrator, Read, Write Department = “Security” Access = All, Read </Professor Here we assume that everyone can read the name John Smith and Department Security. But only the administrator can read and write the salary attribute.

20 XML DTD DTDs essentially specify the structure of XML documents. Consider the following DTD for Professor with elements Name and State. This will be specified as:

21 XML Schema While DTDs were the early attempts to specify structure for XML documents, XML schemas are far more elegant to specify structures. Unlike DTDs XML schemas essentially use the XML syntax for specification. Consider the following example:

22 XML Namespaces Namespaces are used for DISAMBIGUATION <CountryX: Academic-Institution Xmlns: CountryX = http://www.CountryX.edu/Instution DTD” Xmlns: USA = “http://www.USA.edu/Instution DTD” Xmlns: UK = “http://www.UK.edu/Instution DTD” <USA: Title = College USA: Name = “University of Texas at Dallas” USA: State = Texas” <UK: Title = University UK: Name = “Cambridge University” UK: State = Cambs

23 XML Databases l Data is presented as XML documents l Query language: XML-QL l Query optimization l Managing transactions on XML documents l Metadata management: XML schemas/DTDs l Access methods and index strategies l XML security and integrity management

24 Credentials in XML Alice Brown University of X CS Security John James University of X CS Senior

25 Policies in XML <policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘CS’]//Node()” priv = “VIEW”/> <policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘EE’] /Short-descr/Node() and //Patent [@Dept = ‘EE’]/authors” priv = “VIEW”/> <policy-spec cred-expr = - - - - Explantaion: CS professors are entitled to access all the patents of their department. They are entitled to see only the short descriptions and authors of patents of the EE department

26 Access Control Strategy l Subjects request access to XML documents under two modes: Browsing and authoring - With browsing access subject can read/navigate documents - Authoring access is needed to modify, delete, append documents l Access control module checks the policy based and applies policy specs l Views of the document are created based on credentials and policy specs l In case of conflict, least access privilege rule is enforced l Works for Push/Pull modes

27 System Architecture for Access Control User Pull/Query Push/result XML Documents X-AccessX-Admin Admin Tools Policy base Credential base

28 Third-Party Architecture Credential base policy base XML Source User/Subject Owner Publisher Query Reply document SE-XML credentials l The Owner is the producer of information It specifies access control policies l The Publisher is responsible for managing (a portion of) the Owner information and answering subject queries l Goal: Untrusted Publisher with respect to Authenticity and Completeness checking

29 Inference/Privacy Control Policies Ontologies Rules Semantic web engine XML, RDF, OWL Documents Web Pages, Databases Inference Engine/ Rules Processor Interface to the Semantic Web Technology By UTD

30 Example Policies l Temporal Access Control - After 1/1/05, only doctors have access to medical records l Role-based Access Control - Manager has access to salary information - Project leader has access to project budgets, but he does not have access to salary information - What happens is the manager is also the project leader? l Positive and Negative Authorizations - John has write access to EMP - John does not have read access to DEPT - John does not have write access to Salary attribute in EMP - How are conflicts resolved?

31 Privacy Policies l Privacy constraints processing - Simple Constraint: an attribute of a document is private - Content-based constraint: If document contains information about X, then it is private - Association-based Constraint: Two or more documents taken together is private; individually each document is public - Release constraint: After X is released Y becomes private l Augment a database system with a privacy controller for constraint processing

32 Why RDF? l XML cannot be used to specify semantics l Example: - Professor is a subclass of Academic Staff - Professor inherits all properties of Academic Staff l RDF was specified so that the inadequacies of XML could be handled l RDF uses XML Syntax l Additional constructs are needed for RDF

33 RDF l Resource Description Framework is the essence of the semantic web l Adds semantics with the use of ontologies, XML syntax l RDF Concepts - Basic Model l Resources, Properties and Statements - Container Model l Bag, Sequence and Alternative

34 RDF Basics l Resource: Everything is a resource - Person, Vehicle, etc. l Property: properties describe relationships between resources - E.g., Invented l Statement: (Object, Property, Value) Triple - Berners Lee invented the Semantic Web

35 RDF Container Model l Bag: Unordered container, may contain multiple occurrences - Rdf: Bag l Seq: Ordered container, may contain multiple occurrences - Rdf: Seq l Alt: a set of alternatives - Rdf: Alt

36 RDF Specification <rdf: RDF xmlns: rdf = “http://w3c.org/1999/02-22-rdf-syntax-ns#” xmlns: xsd = “http:// - - - xmlns: uni = “http:// - - - - <rdf: Description: rdf: about = “949352” Professor <rdf: Description rdf: about: “ZZZ” semantic web

37 RDF Specification l RDF specifications have been given for Attributes, Types Nesting, Containers, etc. l How can security policies be included in the specification l Example: consider the statement “Berners Les is the Author of the book Semantic Web” l Do we allow access to the connection between author and book? Do we allow access to the connection but not to the author name and book name?

38 RDF Policy Specification < rdf: RDF xmlns: rdf = “http://w3c.org/1999/02-22-rdf-syntax-ns#” xmlns: xsd = “http:// - - - xmlns: uni = “http:// - - - - <rdf: Description: rdf: about = “949352” Professor Level = L1 <rdf: Description rdf: about: “ZZZ” semantic web Level = L2

39 RDF Schema l Need RDF Schema to specify statements such as professor is a subclass of academic staff <rdfs: Class rdf: ID = “professor” The class of Professors All professors are Academic Staff Members.

40 RDF Schema: Security Policies l How can security policies be specified? <rdfs: Class rdf: ID = “professor” The class of Professors All professors are Academic Staff Members. Level = L

41 RDF Axiomatic Semantics l First order logic to specify formulas and inferencing - Built in functions (First) and predicates (Type) - Modus Ponens - From A and If A then B, deduce B l Example: All containers are Resources - Type(?C, Container)  Type(?c, Resource) - If we have Type(A, Container) then we can infer (Type A, Resource)

42 RDF Inferencing l While first order logic provides a proof system, it will be computationally infeasible l As a result horn clause logic was developed for logic programming; this is still computationally expensive l RDF uses If then Rules l IF E contains the triples (?u, rdfs: subClassof, ?v) and (?v, rdfs: subClassof ?w) THEN E also contains the triple (?u, rdfs: subClassOf, ?w) That is, if u is a subclass of v, and v is a subclass of w, then u is a subclass of w

43 RDF Query l One can query RDF using XML, but this will be very difficult as RDF is much richer than XML l Is there an analogy between say XQuery and a query language for RDF? l RQL – an SQL-like language has been developed for RDF l Select from “RDF document” where some “condition”

44 Ontology l Common definitions for any entity, person or thing l Several ontologies have been defined and available for use l Defining common ontology for an entity is a challenge l Mappings have to be developed for multiple ontologies l Specific languages have been developed for ontologies

45 Why RDF is not sufficient? l RDF was developed as XML is not sufficient to specify semantics - E.g., class/subclass relationship l RDF has issues also - Cannot express several other properties such as Union, Interaction, relationships, etc l Need a richer language l Ontology languages were developed by the semantic web community for this purpose l Essentially RDF is not sufficient to specify ontologies

46 OWL: Background l It’s a language for ontologies and relies on RDF l DARPA (Defense Advanced Research Projects Agency) developed early language DAML (DARPA Agent Markup Language) l Europeans developed OIL (Ontology Interface Language) l DAML+OIL combines both and was the starting point for OWL l OWL was developed by W3C

47 OWL Features l Subclass relationship l Class membership l Equivalence of classes l Classification l Consistency (e.g., x is an instance of A, A is a subclass of B, x is not an instance of B) l Three types of OWL: OWL-Full, OWL-DL, OWL-Lite l Automated tools for managing ontologies - Ontology engineering

48 OWL Specification (e.g., Classes) Faculty and Academic Staff Member are the same Associate Professor is not a professor Associate professor is not an Assistant professor

49 OWL Specification (e.g., Property) Courses are taught by Academic staff members

50 OWL Specification (e.g., Property Restriction) All first year courses are taught only by professors

51 Policies in OWL: Example Level = L1 Level = L2

52 Logic and Inference l First order predicate logic l High level language to express knowledge l Well understood semantics l Logical consequence - inference l Proof systems exist l Sound and complete l OWL is based on a subset of logic – descriptive logic

53 Why Rules? l RDF is built on XML and OWL is built on RDF l We can express subclass relationships in RDF; additional relationships can be expressed in OWL l However reasoning power is still limited in OWL l Therefore the need for rules and subsequently a markup language for rules so that machines can understand

54 Rule Markup l The various components of logic are expressed in the Rule Markup Language – RuleML l Both monotonic and nonmonotnic rules can be represented l Example representation of Fact P(a) - a is a parent p a

55 Types of Application l Horizontal Information Products at Elsevier: Integration l Data integration at Audi: Integration l Skill finding at Swiss Life: Search l Think Tank Portal at EnterSearch: Knowledge man agent l E-Learning: Knowledge management l Multimedia Collection at Scotland Yard: Searching l Online Procurement at Daimler Chrysler: E-Business l Device Interoperability at Nokia: Interoperability

56 Horizontal Information Products at Elsevier l Elsevier is publishing company based in Amsterdam - E.g., publisher of Computer Standards and Interface Journal that has papers on all kinds of computer related standards l Currently the journals and books are grouped by topics such as say operating systems, databases, etc. (or at a higher level, Biology, Chemistry, etc.) l Where do we then put the journal Computer Standards and Interfaces? l Need horizontal groupings also

57 Horizontal Information Products at Elsevier l Semantic web technologies are being used by Elsevier - RDF for document representation - RDF for ontologies - Query language based on RDF to query the documents and the ontologies - E.g. Life Science Thesaurus EMTREE - Other publishing companies are following in Elsevier’s direction

58 Data Integration at Audi l Integrate the data in multiple data sources to provide better customer relationship management and other services to improve profits l The databases are disparate and heterogeneous l Many current operations are carried out manually l Expensive and missed opportunities

59 Data Integration at Audi l Ontolotues are being specified to address semantic heterogeneous l E.g., SLR is a type of camera; one applications calls it SLR, another application calls it Olympus-OM-10 l When the latter application encounters the term SLR, it will query the ontology and determine that SLR is a camera l Details are given in Chapter 6

60 Skill Finding at Swiss Life l Swiss Life is an insurance company that developed a system to find all the skills in the company - E.g., John’s skills are on data management, ontology management l Challenging problem as people have multiple skills for different applications l Need the following capabilities - Cross listing of skills - Querying skills - - - - -

61 Skill Finding at Swiss Life l Ontologies are being developed to specify the skills and query languages to query the ontologies l E.g. - - -

62 Think Tank Portal at EnterSearch l EnterSearch is a consortium of corporations in Europe that provide IT for the energy companies l Similar to MCC in Austin TX l EnterSerach Portal currently describes the various research projects, papers etc. l XML representation is used for describing the web content l Need to represent semantics so that the corporations can get answers to useful questions of the form - “where do I put my computing resources to solve a problem?”

63 Think Tank Portal at EnterSearch l Semantic web technologies are being utilized – in particular ontoogies are developed for the following - Hardware - Software - Communications - E-Commerce - Agents - Market/Auction - Resource Allocation - - - - -

64 E-Learning l With the Internet and the web, we now have on-line universities, course offerings, tutoring etc. l Students should have the choice for selecting various courses in the order they want, provide they take the prerequisites l Semantic web technologies enable flexible access as well as integration of various data sources and processes to enable learning l Ontologies are being developed for learning applications - E.g., Contents of the courses - Description of the courses etc.

65 Multimedia Collection Indexing at Scotland Yard l Scotland Yard uses a database to keep track of the antiques that are stolen l While sophisticated indexing techniques have been developed, there is a problem with semantics l E.g., Red cushioned chair could also be described as Queen Anne chair l Ontologies for describing semantics l Need more details of the project

66 On-line Procurement at Daimler Chrysler l Daimler Chrysler interacts with numerous suppliers to develop a product l Standards developed by Rosetta.Net for E-Business are being used for interoperability - XML syntax, no semantics of the product descriptions are available l Ontologies for describing the various product descriptions including the semantics are the long term goal for seamless integration of the supply chain operation l Need more details of the project

67 Device Interoperability at Nokia l Nokia’s objective is to integrate multiple devices (cell phone, PDA, cars, laptop etc) to provide a pervasive computing environment l Objects is to locate the various services and understand the different devices and their functions - Need to describe the various services - Current technology provides syntactic descriptions l Semantic web technologies, through ontologies enable the understanding the devices and reasons about their functions l Need more details of the project

68 Common Threads and Challenges l Common Threads - Building Ontologies for Semantics - XML for Syntax l Challenges - Scalability, Resolvability - Security policy specification, Securing the documents and ontologies - Developing applications for secure semantic web technologies - Automated tools for ontology management - ONOTOLOGY ENGINEERING

69 What is Ontology Engineering? l Tools and Techniques to - Create Ontologies - Specify Ontologies - Maintain Ontologies - Query Ontologies - Evolve Ontologies - Reuse Ontologies - Incorporate features such as security, data quality, integrity

70 Manual Constructiob of Ontologues l Determine Scope l Consider Reuse l Enumerate Terms l Define Taxonomy l Define Properties l Define facets l Define Instances l Check for Anomalies

71 Reuseing Exitsing Ontologies l The goal is not to reinvent the wheel l Several ontologies have been developed for different domains l Codieid Bodies of Expert Knowledge l Integrated Vocabularies l Upper Level Ontologies l Topic Hierarchies l Linguistic Resources l Ontology Libraries

72 Semi/Automatics Methods for Ontology Generation l Much of the research is focusing on developing ontologies using tools from multiple heterogeneous data sources l Essentially extracting concepts and expanding on concepts from the data sources l Uses combination of data integration, metadata extraction, and machine learning techniques l E.g. Clustering of concepts, Classification of concepts etc. l Text Book describes Semantic Web Knowledge Management Architecture

73 Web Services l Web services can be utilized by any of the other applications discussed in this unit l Web services are invoked to carry out functions on the web including find locations, search for documents etc. l Simple services and compound services l Three components to the service - Service profile: Description of the service – what it does - Serviced model: how it does it - Service groundings: protocol for invoking the service l E.g., - - - - - -

74 Web service architecture Service requestor Service providers UDDI Publish Query Answer Request the service

75 Secure Web Service Architecture Confidentiality, Authenticity, Integrity Service requestor Service provider UDDI Query BusinessEntity BusinessService BindingTemplate BusinessService tModel PublisherAssertion

76 Directions l Need tools for developing semantic web technologies - XML documents, RDF documents, Ontologies, etc. l How to integrate the multiple ontologjes and tools? l Role of Agents – agents are processes that reasons with semantic web technologies l Semantic web services, data mining, knowledge management integrated


Download ppt "Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas December 2007."

Similar presentations


Ads by Google