Download presentation
Presentation is loading. Please wait.
1
XML on Semantic Web
2
Outline The Semantic Web Ontology XML Probabilistic DTD References
3
The Semantic Web (1/4) The first generation Web The second generation Web : current Web The third generation Web : Semantic Web The conceptual structuring of the Web in an explicit machine-readable way Requirements : Universal expressive power 、 Support for syntactic Interoperability 、 Support for Semantic Interoperability
4
The Semantic Web (2/4) Syntactic interoperability talks about parsing the data, and semantic interoperability means to define mappings between unknown terms and known terms in the data Semantic interoperability : requires standards syntactic form of document and semantic content A further representation and inference layer is needed on top of the currently available layers of the WWW : Ontology
5
The Semantic Web (3/4)
6
The Semantic Web (4/4)
7
Ontology (1/5) An explicit machine-readable specification of a shared conceptualization Crucial role : representation of a shared conceptualization of a particular domain reusable find pages that contain syntactically different but semantically similar words Construct : concepts (which are usually organized by taxonomies), relations, functions, axioms, instances
8
Ontology (2/5)
9
Ontology (3/5) Concepts : – Be anything about which something is said – Also known as classes (XOL, RDF(s), OIL, DAML+OIL), objects (OML), categories (SHOE) Taxonomies : – used to organize ontological knowledge using generalization and specialization relationships through which simple and multiple inheritance could be applied
10
Ontology (4/5) Relations and functions : – An interaction between concepts of the domain and attributes – Be called relations in SHOE 、 OML, roles in OIL – Functions are a special kind of relation Axioms : – Constraining information, verifying correctness, deducting new information – Also known as assertions (OML), rule, logic
11
Ontology (5/5) Instances : – Represent elements in the domain attached to a specific concept Measurement of the expressiveness : – XOL, RDF(s), SHOE, OML, OIL, DAML+OIL
12
XML (1/7) As a serialization syntax for other markup language, ex : SMIL 、 XOL 、 SHOE As semantic markup of Web-pages As a uniform data-exchange format
13
XML (2/7) Universal expressive power : anything can be encoded in XML if a grammar can be defined for it Syntactic interoperability : XML parser can parse any XML data and is usually a reusable component Semantic interoperability : there is no way of recognizing a semantic unit from a particular domain of interest (not yet widely recognized)
14
XML (3/7)
15
XML (4/7) Data exchange : – Build a model of the domain of interest – From the domain model a DTD or an XMLs is constructed Advantage : reusability of the parsing software components There exists multiple possibilities to encode a given domain model into a DTD, so the direct connection from the DTD to the domain model is lost and it cannot be easily reconstructed
16
XML (5/7)
17
XML (6/7) A direct mapping based on the different DTDs is not possible So we have to define the mappings between the different domain models, then between the different DTDs : – Reengineering of the original Domain Model from the DTD or XML Schema – Establishing mappings between the entities in the domain model – Defining translation procedures for XML Documents Using a more suitable formalism than pure XML can save much of the additional effort
18
XML (7/7)
19
Probabilistic DTD(1/11) Describes the most likely orderings of XML tags and that contains statistical properties for each tag Utilize association rule discovery algorithm and sequence mining techniques
20
Probabilistic DTD (2/11) Objectives : tagging all text documents and deriving an appropriate preliminary flat XML DTD – A knowledge discovery in textual databases (KDT) process to build clusters of semantically similar text units and then new documents can be converted into XML documents
21
Probabilistic DTD (3/11) UML schema : are initially conceived by experts serves as a reference for the DTD, but there is no guarantee that the final DTD will be contained in or contain this schema KDT process : – Tagging initial text documents – Domain knowledge constitutes such as thesaurus 、 preliminary UML schema, input to process – Pre-processing – Iterative clustering – Post-processing – Establishing a probabilistic DTD
22
Probabilistic DTD (4/11)
23
Probabilistic DTD (5/11) Pre-processing : – Setting the level of granularity – NLP processing such as tokenization 、 normalization 、 word stemming – Building text unit descriptors—a reduced feature space(now are chosen by engineer) – Mapping all text units into Boolean vectors of this feature space – Extract named entity
24
Probabilistic DTD (6/11) Clustering : – Performed in multiple iterations, each iteration outputs a set of clusters – All text unit vectors are clustered – Partition clusters into “acceptable” and “unacceptable” according to quality criteria – Members of “unacceptable” are input data to the next iteration
25
Probabilistic DTD (7/11) Post-processing : – “acceptable” clusters are semi-automatically assigned a label – Ultimately, cluster labels are determined by the engineer – All default cluster labels are derived from text unit descriptors – Automatically derived XML DTD from XML tags
26
Probabilistic DTD (8/11)
27
Probabilistic DTD (9/11) Establishing a probabilistic DTD : – Deriving the most likely ordering of the tags – Computing the statistically properties of each tag inside the document type definition Deriving the ordering of the tags – Backward Construction of DTD Sequences : builds “maximal” sequences – Forward sequence construction
28
Probabilistic DTD (10/11) Backward Construction of DTD Sequences – Starts with an arbitrary tag ﺡ and then identifies the tag most likely to appear before it – If no such tag exists, then shifts to the next sequence. If there is one, then the next iteration starts. If there are k tags, then duplicates k incomplete sequences. – Each tag X i leading to ﺡ with a confidence C i – If there is a C i larger than the others, then X i is the predecessor of ﺡ in the sequence – If C 0 where is the confidence where ﺡ has no predecessor is largest, then ﺡ is the first element – Confidence is the tag’s TagSupport multiplied by the accuracy
29
Probabilistic DTD (11/11)
30
References The Semantic Web—on the respective Roles of XML and RDF – Stefan Decker, Frank van Harmelen, Jeen Broekstra, Michael Erdmann, Dieter Fensel, Ian Horrocks, Michel Klein, Sergey Melnik Intelligent Information Agent with Ontology on the Semantic Web – Weihua Li Ontology Languages for the Semantic Web – Asuncion Gomez-Perez, Oscar Corcho Extraction of Semantic XML DTDs from Texts Using Data Mining Techniques – Karsten Winkler, Myra Spiliopoulou
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.