XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.

XML on Semantic Web

Outline The Semantic Web Ontology XML Probabilistic DTD References

The Semantic Web (1/4) The first generation Web The second generation Web ： current Web The third generation Web ： Semantic Web The conceptual structuring of the Web in an explicit machine-readable way Requirements ： Universal expressive power 、 Support for syntactic Interoperability 、 Support for Semantic Interoperability

The Semantic Web (2/4) Syntactic interoperability talks about parsing the data, and semantic interoperability means to define mappings between unknown terms and known terms in the data Semantic interoperability ： requires standards syntactic form of document and semantic content A further representation and inference layer is needed on top of the currently available layers of the WWW ： Ontology

The Semantic Web (3/4)

The Semantic Web (4/4)

Ontology (1/5) An explicit machine-readable specification of a shared conceptualization Crucial role ： representation of a shared conceptualization of a particular domain reusable find pages that contain syntactically different but semantically similar words Construct ： concepts (which are usually organized by taxonomies), relations, functions, axioms, instances

Ontology (2/5)

Ontology (3/5) Concepts ： – Be anything about which something is said – Also known as classes (XOL, RDF(s), OIL, DAML+OIL), objects (OML), categories (SHOE) Taxonomies ： – used to organize ontological knowledge using generalization and specialization relationships through which simple and multiple inheritance could be applied

Ontology (4/5) Relations and functions ： – An interaction between concepts of the domain and attributes – Be called relations in SHOE 、 OML, roles in OIL – Functions are a special kind of relation Axioms ： – Constraining information, verifying correctness, deducting new information – Also known as assertions (OML), rule, logic

Ontology (5/5) Instances ： – Represent elements in the domain attached to a specific concept Measurement of the expressiveness ： – XOL, RDF(s), SHOE, OML, OIL, DAML+OIL

XML (1/7) As a serialization syntax for other markup language, ex ： SMIL 、 XOL 、 SHOE As semantic markup of Web-pages As a uniform data-exchange format

XML (2/7) Universal expressive power ： anything can be encoded in XML if a grammar can be defined for it Syntactic interoperability ： XML parser can parse any XML data and is usually a reusable component Semantic interoperability ： there is no way of recognizing a semantic unit from a particular domain of interest (not yet widely recognized)

XML (3/7)

XML (4/7) Data exchange ： – Build a model of the domain of interest – From the domain model a DTD or an XMLs is constructed Advantage ： reusability of the parsing software components There exists multiple possibilities to encode a given domain model into a DTD, so the direct connection from the DTD to the domain model is lost and it cannot be easily reconstructed

XML (5/7)

XML (6/7) A direct mapping based on the different DTDs is not possible So we have to define the mappings between the different domain models, then between the different DTDs ： – Reengineering of the original Domain Model from the DTD or XML Schema – Establishing mappings between the entities in the domain model – Defining translation procedures for XML Documents Using a more suitable formalism than pure XML can save much of the additional effort

XML (7/7)

Probabilistic DTD(1/11) Describes the most likely orderings of XML tags and that contains statistical properties for each tag Utilize association rule discovery algorithm and sequence mining techniques

Probabilistic DTD (2/11) Objectives ： tagging all text documents and deriving an appropriate preliminary flat XML DTD – A knowledge discovery in textual databases (KDT) process to build clusters of semantically similar text units and then new documents can be converted into XML documents

Probabilistic DTD (3/11) UML schema ： are initially conceived by experts serves as a reference for the DTD, but there is no guarantee that the final DTD will be contained in or contain this schema KDT process ： – Tagging initial text documents – Domain knowledge constitutes such as thesaurus 、 preliminary UML schema, input to process – Pre-processing – Iterative clustering – Post-processing – Establishing a probabilistic DTD

Probabilistic DTD (4/11)

Probabilistic DTD (5/11) Pre-processing ： – Setting the level of granularity – NLP processing such as tokenization 、 normalization 、 word stemming – Building text unit descriptors—a reduced feature space(now are chosen by engineer) – Mapping all text units into Boolean vectors of this feature space – Extract named entity

Probabilistic DTD (6/11) Clustering ： – Performed in multiple iterations, each iteration outputs a set of clusters – All text unit vectors are clustered – Partition clusters into “acceptable” and “unacceptable” according to quality criteria – Members of “unacceptable” are input data to the next iteration

Probabilistic DTD (7/11) Post-processing ： – “acceptable” clusters are semi-automatically assigned a label – Ultimately, cluster labels are determined by the engineer – All default cluster labels are derived from text unit descriptors – Automatically derived XML DTD from XML tags

Probabilistic DTD (9/11) Establishing a probabilistic DTD ： – Deriving the most likely ordering of the tags – Computing the statistically properties of each tag inside the document type definition Deriving the ordering of the tags – Backward Construction of DTD Sequences ： builds “maximal” sequences – Forward sequence construction

Probabilistic DTD (10/11) Backward Construction of DTD Sequences – Starts with an arbitrary tag ﺡ and then identifies the tag most likely to appear before it – If no such tag exists, then shifts to the next sequence. If there is one, then the next iteration starts. If there are k tags, then duplicates k incomplete sequences. – Each tag X i leading to ﺡ with a confidence C i – If there is a C i larger than the others, then X i is the predecessor of ﺡ in the sequence – If C 0 where is the confidence where ﺡ has no predecessor is largest, then ﺡ is the first element – Confidence is the tag’s TagSupport multiplied by the accuracy

References The Semantic Web—on the respective Roles of XML and RDF – Stefan Decker, Frank van Harmelen, Jeen Broekstra, Michael Erdmann, Dieter Fensel, Ian Horrocks, Michel Klein, Sergey Melnik Intelligent Information Agent with Ontology on the Semantic Web – Weihua Li Ontology Languages for the Semantic Web – Asuncion Gomez-Perez, Oscar Corcho Extraction of Semantic XML DTDs from Texts Using Data Mining Techniques – Karsten Winkler, Myra Spiliopoulou

XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.

Similar presentations

Presentation on theme: "XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.

Similar presentations

Presentation on theme: "XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References."— Presentation transcript:

Similar presentations

About project

Feedback