Presentation is loading. Please wait.

Presentation is loading. Please wait.

E XV IS XML, UMA FERRAMENTA EMBLEMÁTICA NA ANÁLISE DOCUMENTAL Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho.

Similar presentations


Presentation on theme: "E XV IS XML, UMA FERRAMENTA EMBLEMÁTICA NA ANÁLISE DOCUMENTAL Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho."— Presentation transcript:

1 E XV IS XML, UMA FERRAMENTA EMBLEMÁTICA NA ANÁLISE DOCUMENTAL Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho 6 Março, 2009 Universidade de Aveiro

2 C ONTEXT Comprehension Visualization / Animation Slicing Metrics 6 Março, 2009 Universidade de Aveiro

3 M OTIVATION eXVisXML Comprehension Visualization / Animation Slicing Metrics 6 Março, 2009 Universidade de Aveiro

4 M OTIVATION eXVisXML Comprehension Visualization / Animation Slicing Metrics 6 Março, 2009 Universidade de Aveiro

5 M OTIVATION eXVisXML Comprehension Visualization / Animation Slicing Metrics 6 Março, 2009 Universidade de Aveiro

6 M OTIVATION eXVisXML Comprehension Visualization / Animation Slicing Metrics 6 Março, 2009 Universidade de Aveiro

7 M OTIVATION eXVisXML Comprehension Visualization / Animation Slicing Metrics 6 Março, 2009 Universidade de Aveiro

8 XML D OCUMENT V ISUALIZATION The role of the visualization technology (in PC and SE) is recognized as very fruitful. The use of SV features allows us to capture a great amount of information in a faster way Graphical representations cause a positive impact in learning process 6 Março, 2009 Universidade de Aveiro

9 XML D OCUMENT V ISUALIZATION Retrieve information from plain documents efficiently IS NOT AN EASY TASK Machine manipulation: XSL and other production-systems can easily extract information and transform them Human manipulation: It is not as easy as desirable The annotation is complex / Document is too big 6 Março, 2009 Universidade de Aveiro

10 XML D OCUMENT V ISUALIZATION Many tools appear to aid in the visualization of XML documents: XML Schema Designer (Microsoft) Xpath Analyzer (Altova) … Although these tools offer highlighted syntax, and easy manipulation (collapse/expand), their view is a hierarchical and textual. 6 Março, 2009 Universidade de Aveiro

11 T RADITIONAL XML D OCUMENT V ISUALIZATION 6 Março, 2009 Universidade de Aveiro

12 O UR P ROPOSAL FOR XML D OCUMENT V ISUALIZATION In this context, we want to get a visualization that makes easier the comprehension process. However, we should take care with the graphical or iconic representations hence it depends on problem domain. Inspired in Alma, the eXVisXML interface for the visual inspection of XML documents is divided into 3 main parts: 6 Março, 2009 Universidade de Aveiro

13 O UR P ROPOSAL FOR XML D OCUMENT V ISUALIZATION One window that displays the source document; One window exhibiting the textual hierarchy One window to show the tree associated with the source document (graphical); 6 Março, 2009 Universidade de Aveiro

14 O UR P ROPOSAL FOR XML D OCUMENT V ISUALIZATION 6 Março, 2009 Universidade de Aveiro

15 XML D OCUMENT S LICING Slicing concept appears in 1979, by Weiser. Its applied to a program considering a slicing criterion (a pair composed by a line number and a set of variables). The objective is to find the statements that possibly affect those variables. This technique can be also applied to XML documents. How? 6 Março, 2009 Universidade de Aveiro

16 XML D OCUMENT S LICING XML document + slicing criterion (a Xpath expression can be regarded as a slicing criterion, but simplified) A document slice is a new XML document composed by those elements that are strictly necessary to maintain the tree structure. 6 Março, 2009 Universidade de Aveiro

17 XML D OCUMENT S LICING It is proved, by Josep Silva, in Slicing XML documents, that slicing techniques applied to XML and DTD documents produce valid XML and DTD slices with the respect to the slicing criterion. 6 Março, 2009 Universidade de Aveiro

18 XML D OCUMENT S LICING Given the whole XML document of Romeo and Juliet screenplay and The slicing criterion Greg the result is: 6 Março, 2009 Universidade de Aveiro

19 XML D OCUMENTS S LICING 6 Março, 2009 Universidade de Aveiro

20 XML D OCUMENT M ETRICS Effective management of any process requires quantification, measurement, and modeling. Software metrics provide a quantitative basis for the development and validation of models of the software development process Metrics can be used to improve software productivity and quality 6 Março, 2009 Universidade de Aveiro

21 XML D OCUMENT M ETRICS In the field of XML, quality assessment is also relevant because the approach followed by engineers or end-users, to design the annotation schema or even to markup existent tests, is many times improvised and naïf. Concepts like well-formedness or validity are not sufficient to appraise XML documents. So, a set of metrics were defined to form the basis of the quality measurement of a XML document. 6 Março, 2009 Universidade de Aveiro

22 XML D OCUMENT M ETRICS Size Structure Complexity Structure Depth Fan-in / Fan-out Instability Tree impurity Attributes per Element Non-used components Text length 6 Março, 2009 Universidade de Aveiro

23 XML D OCUMENT M ETRICS Sucessor Graph Given a DTD, we say that a new component (element/attribute) is an immediate successor of the element under definition. Then, we introduce an arrow (oriented edge) from the element to the component. Example: 6 Março, 2009 Universidade de Aveiro

24 S UCESSOR G RAPH ( R OMEO AND J ULIET SCREENPLAY ) 6 Março, 2009 Universidade de Aveiro

25 XML D OCUMENT M ETRICS Size Given a DTD, its size (i.e. the value for this metric) is the total number of nodes in the SG (number of DTD components). 6 Março, 2009 Universidade de Aveiro

26 XML D OCUMENT M ETRICS Structure complexity Where e is the number of edges in the SG, n is the number of nodes in the SG and n_idref is the number of IDREF attributes. 6 Março, 2009 Universidade de Aveiro

27 XML D OCUMENT M ETRICS Structure Depth According to Meike Klettke, in Metrics for XML document collections, a SG with a depth much higher than 7 is complex and reveals a bad DTD design. 6 Março, 2009 Universidade de Aveiro

28 XML D OCUMENT M ETRICS Fan-in / Fan-out For the graph as a whole, the average and the maximum values for those parameters can be useful to spot unusual nodes, which can be inspected to detect the anomaly and fix the problem. Elements with a high Fan-in/Fan-out value are more complex than other elements with a lower value. 6 Março, 2009 Universidade de Aveiro

29 XML D OCUMENT M ETRICS Instability A node with a low instability allows us to conclude that it is less dependent of other nodes, while many nodes are depend on it. 6 Março, 2009 Universidade de Aveiro

30 XML D OCUMENT M ETRICS Tree Impurity A tree impurity of 0% means that a graph is a tree and a tree impurity of 100% means that it is a fully connected graph. 6 Março, 2009 Universidade de Aveiro

31 XML D OCUMENT M ETRICS Attributes per Element The AttrsEle(DTD) metric allows us to figure out the average number of attributes defined per element in the DTD. The AttrsEle(XML) metric, applied directly to the XML document, allows us to figure out the average number of attributes actually used per effective elements present in the XML document. 6 Março, 2009 Universidade de Aveiro

32 XML D OCUMENT M ETRICS Non-used Components if Attr(DTD) represents the set of attributes defined in the DTD, and Attr(XML) represents the set of actual attributes (the attributes used in the XML document instance), then NonAttr(XML) is the set of non-used attributes. 6 Março, 2009 Universidade de Aveiro

33 XML D OCUMENT M ETRICS Text Length where, length(PCDATA) computes the total length of the document's text (the sum of the length of all text fragments, i.e., text associated with element tags, or untagged text), and nPCDATA is the number of text fragments (the number of PCDATA leaves that appear in the XML document tree). 6 Março, 2009 Universidade de Aveiro

34 M ETRIC R ESULTS (R OMEO AND J ULIET SCREENPLAY ) MetricValue Size27 Structure Complexity13 Structure Depth7 Fan-in (node scene) 3 Fan-out (node scene )6 Instability3,3% Tree impurity58,9% Attributes per Elem (DTD)0,08 Attributes per Elem (XML)0,027 Non Used Components (Elem)1 ( stagedir element ) Text Length (Elem)37,46 Text Length (Attr)1 6 Março, 2009 Universidade de Aveiro

35 C ONCLUSION 6 Março, 2009 Universidade de Aveiro


Download ppt "E XV IS XML, UMA FERRAMENTA EMBLEMÁTICA NA ANÁLISE DOCUMENTAL Daniela da Cruz, Pedro Rangel Henriques Departamento de Informática Universidade do Minho."

Similar presentations


Ads by Google