Presentation is loading. Please wait.

Presentation is loading. Please wait.

Master Informatique 1 Dr. Vu Le AnhStructural indexes of XML Databases Dr. Vu Le Anh

Similar presentations


Presentation on theme: "Master Informatique 1 Dr. Vu Le AnhStructural indexes of XML Databases Dr. Vu Le Anh"— Presentation transcript:

1 Master Informatique 1 Dr. Vu Le AnhStructural indexes of XML Databases Dr. Vu Le Anh lavu@ntt.edu.vn

2 Master Informatique 2 Dr. Vu Le AnhStructural indexes of XML Databases Outline 1.Motiviation 2.Regular queries processing over XML datasets 3.Indexes over XML datasets 4.Structural indexes 5.Structural indexes for distributed XML datasets 6.Summary

3 Master Informatique 3 Dr. Vu Le AnhStructural indexes of XML Databases NCBI GEO dataset GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. About 600 gigabyte (Feb - 2009). Data are stored in XML datasets A map of gene is written in XML file, and its XML graph.

4 Master Informatique 4 Dr. Vu Le AnhStructural indexes of XML Databases Virtual observatory A collection of interoperating data archives and software tools which utilize the internet to form a scientific research environment in which astronomical research programs can be conducted. IVOA (International Virtual Observatory Alliance)  Building an international community Using very big XML datasets for storing, exchanging data

5 Master Informatique 5 Dr. Vu Le AnhStructural indexes of XML Databases Problem Efficient query processing over Big (Distributed) XML - Databases Two “interesting” ideas: 1.Storing the XML database in relational database. Rewriting XML a az XML queries  SQL and Datalog. Rewriting and combining the results. 2.Indexing the XML database. Using the indexes for query processing.

6 Master Informatique 6 Dr. Vu Le AnhStructural indexes of XML Databases Data Graph – Data Model for XML Data graph: directed, rooted, labelled graph. : set of nodes. : set of label values : set of edges : set of basic edges. : set of reference edges. : the root. : labeling function

7 Master Informatique 7 Dr. Vu Le AnhStructural indexes of XML Databases Publication XML document John ABC Dr.Ben Tom … Dr. Kiss DEF Dr. Baker XYZ

8 Master Informatique 8 Dr. Vu Le AnhStructural indexes of XML Databases XML - Datagraph

9 Master Informatique 9 Dr. Vu Le AnhStructural indexes of XML Databases Regular queries Query language for XML: –XQuery, XPath, UnQL, Lorel, XQL, XML-QL, etc. Build around regular expressions. 3 basic operations: –Concatation:. or / –Union: | –Interation: * For short: _ - some label value // - (_)* some sequence of label values Example: //(Student | Professor)//Paper/Title

10 Master Informatique 10 Dr. Vu Le AnhStructural indexes of XML Databases Regular queries Pair of nodes (u, v) matches R regular query, if there is a rout from u to v, in which the label sequence of the rout matching R. The result of R : I the input-set and O the output-set, (u, v) matches R} General case: I={root} és O={V}. Every R regular expression can be represented by a finite, not determined automata (NFA), which computes L(R) language. Query graph is the graph representing the automata.

11 Master Informatique 11 Dr. Vu Le AnhStructural indexes of XML Databases Query processing based on the automata The query graph of //B/D: Input: I={0}; Output: O={0,1,…,15} A A B 0 1 8 CB26 AD913 A D BE 3 10 7 14 D CA F 4 5 1211E 15 * BD q0q0 q1q1 q2q2 q0q0 q0q0 q0q0 q0q0 q0q0 q 0 q 2 q1q1 q0q0 The result = {(0,3),(0,11),(0,13)}

12 Master Informatique 12 Dr. Vu Le AnhStructural indexes of XML Databases Transform to Edge Labeled graph Node labeled graphEdge labeled graph Query graph is a edge labeled graph. Transform data graph to edge labeled graph.

13 Master Informatique 13 Dr. Vu Le AnhStructural indexes of XML Databases State-Data (SD) graph SD graph = Query graph JOINING Data graph SD graph may be not connective. SD-Nodes: (data-node, state-node) SD- labeled edges: Constructing from the matching of labels of data-edges and node-edges.

14 Master Informatique 14 Dr. Vu Le AnhStructural indexes of XML Databases Joining R:= a/(b|c)*/a and data graph s0s0 s1s1 s2s2 a b c a Query graph: Data graph: 5 4 3 2 1 a c a a b SD-graph: 1,s 0 2,s 0 2,s 1 1,s 1 2,s 2 a b 3,s 1 c 4,s 2 a 5,s 2 a 5,s 1 a a 3,s 0 4,s 1 Result: (1,4), (1,5) a

15 Master Informatique 15 Dr. Vu Le AnhStructural indexes of XML Databases SD-graph representation on relational database [KissVu05] Main results: –The data graph and query graph can be represented by tables –SD graph (table) = Joining data table and query table. –Computing the result based on the SD-table. –Regular query processing  DATALOG + SQL –Building the index to support SQL computation.

16 Master Informatique 16 Dr. Vu Le AnhStructural indexes of XML Databases 1. Step: Transform data graph to edge labeled graph

17 Master Informatique 17 Dr. Vu Le AnhStructural indexes of XML Databases 2. step: Query graph representation

18 Master Informatique 18 Dr. Vu Le AnhStructural indexes of XML Databases 3. lépés: Using DATALOG, SQL for the computation

19 Master Informatique 19 Dr. Vu Le AnhStructural indexes of XML Databases 4. step: Computation in Relational Databases results: {4,5,6}

20 Master Informatique 20 Dr. Vu Le AnhStructural indexes of XML Databases Classes of XML indexes 1.Indexing the basic values –The basis values are indexing (Ex: data(//emp/salary)) –Using B + -tree 2.Indexing the text values –Keywords should be indexed 3.Indexes for XML -Tree –Quickly checking and computing the label sequence of rout between some pair of nodes. –Applying it for near-tree XML datasets. 4.Structural indexes. –Simulating the datagraph by smaller one to reduce the cost of computation

21 Master Informatique 21 Dr. Vu Le AnhStructural indexes of XML Databases XML-tree pre/post computing [Dietz82] Tree preorder/postorder walking for computing (pre(x),post(x)) (1,7) (2,4) (3,1) (4,2) (5,3) (6,6) (7,5) x is a descendent of y pre(x) < pre(y) és post(x) > post(y)

22 Master Informatique 22 Dr. Vu Le AnhStructural indexes of XML Databases Tree- Structure Improvement [Li&Moon VLDB 2001] Every x node: (order(x), size(x)) (1,100) (10,30) (11,5) (17,5) (25,5) (41,10) (45,5) x is a descendent of y order(x) < order(y) és order(y) <= order(y) + size(x)

23 Master Informatique 23 Dr. Vu Le AnhStructural indexes of XML Databases Regular query processing over XML –tree and near tree Very efficient  based on tree-structured indexes [KissVu06]: Applying for near-tree XML dataset Link graph: Connecting between link nodes. Using tree-structured indexes for the basic structure

24 Master Informatique 24 Dr. Vu Le AnhStructural indexes of XML Databases Family of Structural indexes

25 Master Informatique 25 Dr. Vu Le AnhStructural indexes of XML Databases 1-index [Milo & Suciu, LNCS 1997] Idea: Grouping all “equivalent” data-nodes into an index-node. Computing the index nodes  bi-simulation equivalent ≡ ekvivalencia helyett. Index graph is smaller than the data-graph Working for every regular queries. A bi-simulation computing = PTIME.

26 Master Informatique 26 Dr. Vu Le AnhStructural indexes of XML Databases Bisimulation A  bi-simulation: –x1 és x2 have the same label –If x1  x2 and (y1,x1) is an edge, then there exists edge (y2,x2), in which y1  y2. y1y1y2y2 a  x1x1 a  x2x2 b b

27 Master Informatique 27 Dr. Vu Le AnhStructural indexes of XML Databases Example 1-index 1 paper 2,4,8,13 section 3,5,9,14 title 6,10 algorithm 7 proof 11 proof 12 uses 15,16 17,18 about exp 1-index 1 paper 4 section 5 title 6 algorithm 7 proof 8 section 9 title 10 11 proof 12 uses algorithm 13 section 14 15 16 17 18 about title 2 section 3 title exp Data Graph /paper/section/algorithm

28 Master Informatique 28 Dr. Vu Le AnhStructural indexes of XML Databases Using 1-index? Good: Working for all regular queries. Bad: Not small enough !!! Idea: The index graph is designed only for the most frequently in use queries. The index graph is very small now !!! New equivalent relationship between nodes should be defined If the query is not support, re-check on the data graph

29 Master Informatique 29 Dr. Vu Le AnhStructural indexes of XML Databases Structural indexes and a given set of queries Important : –//a0/a1/…/ai (i<=k), not longer than k A(k)-index –Dinamikus indexek APEX, D(k)-index –//S0/S1/…/Sk, SAPE queries DL-1, DL-A*(k)-index –Forward-backward queries F&B-index

30 Master Informatique 30 Dr. Vu Le AnhStructural indexes of XML Databases A(k)-Index [Kaushik et al. 02] A //a0/a1/…/ai (i<=k) A k-biszimulation. A  k (k-biszimuláció): –u  0 v, ha u és v if they have same label, –u  k v if u  k-1 v and If (u’,u) is an edge, there exists edge (v’,v): u’  k-1 v’ If (v’,v) is an edge, there exists edge (u’,u): u’  k-1 v’

31 Master Informatique 31 Dr. Vu Le AnhStructural indexes of XML Databases A(k)-index imdb movie director name tv director name {1} {2} {3} {4} {5} {6,8} {7,9} A(2)-index (1-index) 1 2 3 4 5 6 7 8 9 imdb movie director name tv director name director name Data graph imdb movie tv director name {1} {2} {5} {3,6,8} {4,7,9} A(0)-index imdb movie director tv director name {1} {2} {3} {5} {6,8} {4,7,9} A(1)-index

32 Master Informatique 32 Dr. Vu Le AnhStructural indexes of XML Databases Split Operation R AB C3 C6 C1C2 C4C5 R AB C2,C3C1 C4C5,C6 R AB C2,C3C1 C4,C5,C6 R AB C1,C2,C3 C4,C5,C6 Adatgráf A(2) (=1-index) A(1) A(0)

33 Master Informatique 33 Dr. Vu Le AnhStructural indexes of XML Databases Refinement (1. step) R AB C3 C6 C1C2 C4C5 R AB C2,C3C1 C4C5,C6 R AB C2,C3C1 C4,C5,C6 R AB C1,C2,C3 C4,C5,C6 Data gráph A(2) (=1-index) A(1) A(0)

34 Master Informatique 34 Dr. Vu Le AnhStructural indexes of XML Databases Refinement (2. step) R AB C3 C6 C1C2 C4C5 R AB C2,C3C1 C4C5,C6 R AB C2,C3C1 C4,C5,C6 R AB C1,C2,C3 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0)

35 Master Informatique 35 Dr. Vu Le AnhStructural indexes of XML Databases DL-1-index [KissVu06] //S0/S1/…/Sk (SAPE = Simple Alternation Path Expression). Dinamikus index (Dynamic labelling).

36 Master Informatique 36 Dr. Vu Le AnhStructural indexes of XML Databases A //(d|e)/f SAPE query 0 12 645 3 78 910111213 a bb d c de f e f f f d g Data Graph A SAPE query: //(d|e)/f R := S 0 / S 1 S 0 = { d,e } ; S 1 = { f } A (4,9), (5,10), (6,11) és (7,12) matching R. The result: T G (R) = {9,10,11,12}

37 Master Informatique 37 Dr. Vu Le AnhStructural indexes of XML Databases Example: DL 1-index support //(K|L) és //(B|C)/E queries 0 1234 5678 9101112 A B E E C F C D E M L NK The data graph and the 1-index are the same. 0 A 1,2,3,4 K,L,M,N 5,6,7,8 B,C,D 9,10,11,12 E,F DL-1- index at the begin. 0 A 1,2 K,L 3,4 M,N 5,6 B,C 7,8 C,D 9,10 E 11,12 E,F 0 A 1,2 K,L 5,6 B,C 9,10 E 3,4 M,N 78 1112 C F D E (a)(b) (c)(d) R 1 = //(K|L) supportR 2 = //(B|C)/E Support

38 Master Informatique 38 Dr. Vu Le AnhStructural indexes of XML Databases A DL-A*(k)-index [KissVu06] 1.The A(i)-index is a special case of DL- A*(k). 2.DL-A*(k)-index support for a given not longer k SAPE queries.

39 Master Informatique 39 Dr. Vu Le AnhStructural indexes of XML Databases DL-A*(1)-index support A //(K|L) and //(B|C)/E queries 0 1234 5678 9101112 A B E E C F C M L K D E N Data graph the begin index: //(K|L) - refinement: //(B|C)/E -refinement:

40 Master Informatique 40 Dr. Vu Le AnhStructural indexes of XML Databases Experiments 1.DL-1 vs. 1-index 2.DL-A*(k) vs. A(k)-index 2 datasets: -XMark: 100 Mb, 1.681.342 nodes. -TreeBank: 82Mb, 2.437.667 nodes.

41 Master Informatique 41 Dr. Vu Le AnhStructural indexes of XML Databases

42 Master Informatique 42 Dr. Vu Le AnhStructural indexes of XML Databases Distributed XML-tree XML- tree = Fragments – sub trees. Servers stores some fragments. There are linking edges between fragments. Questions: Finding efficient protocol for regular query processing? Waiting time – Computing time Applying structural indexes?

43 Master Informatique 43 Dr. Vu Le AnhStructural indexes of XML Databases //a/b//a processing on XML –tree using 2 servers

44 Master Informatique 44 Dr. Vu Le AnhStructural indexes of XML Databases Flow modell (SPIDER algoritmus) Beginning from the root. (F, q)  (F’, q’): 1.Processing on F stops. 2.Processing on F’ with state q’. 3.If finish processing over F’, then send the result to F. 4.F continues Waiting time!

45 Master Informatique 45 Dr. Vu Le AnhStructural indexes of XML Databases 2 phases parallel modell Servers: Computing every possible states on it own site. Sending to the coordinator the link edge Coordinator examines the link edges and request the results from servers Severs send the results to coordinator. The computing time !!!

46 Master Informatique 46 Dr. Vu Le AnhStructural indexes of XML Databases 1- phase parallel model [KissVu07] The coordinator builds the structural Tree-index for whole system for determine connective (F,q) states. Processing on the index first for computing connective states Good: Efficient processing Bad: The index may be big.

47 Master Informatique 47 Dr. Vu Le AnhStructural indexes of XML Databases Structural Tree-index A F0F0 0 F3F3 1 2 A B8 F2F2 F4F4 F1F1 3 45 10 6 12 14 13 1511AC D CB F E D D B A A E 7 F5F5 Fa-index AF0F0 AF2F2 BF3F3 BF4F4 DF1F1 DF5F5 ε AB AC A ε q0q0 q 0 q 1 (F 2,q 1 ), (F 2,q 2 ): is not connective q0q0 q0q0 q0q0 q 0 q 1 Connective states: (F 0,q 0 ), (F 1,q 0 ), …

48 Master Informatique 48 Dr. Vu Le AnhStructural indexes of XML Databases Experiments 19 Linux local-servers. Waiting time: 1IP : 2P : SP = 1 : 1.94 : 37.52 Computing time: 1IP : 2P : SP = 1 : 1.77 : 2.75

49 Master Informatique 49 Dr. Vu Le AnhStructural indexes of XML Databases Native XML database systems http://www.rpbourret.com/xml/XMLDatabaseProds.htm#native http://www.rpbourret.com/xml/XMLDatabaseProds.htm#native Termék Fejlesztő License Adatbázistípus Qizx/dbQizx/db XMLMind Commercial Proprietary Sedna XML DBMSSedna XML DBMS ISP RAS MODIS Free Proprietary Sekaiju / YggdrasillSekaiju / Yggdrasill Media Fusion Commercial Proprietary SQL/XML-IMDBSQL/XML-IMDB QuiLogic Commercial Proprietary (native XML and relational) Sonic XML ServerSonic XML Server Sonic Software Commercial Object-oriented (ObjectStore). TaminoTamino Software AG Commercial Proprietary. Relational through ODBC. TeraText DBSTeraText DBS TeraText Solutions Commercial Proprietary TEXTML ServerTEXTML Server IXIASOFT, Inc.Commercial Proprietary TigerLogic XDMSTigerLogic XDMS Raining Data Commercial Pick TimberTimber University of Michigan Open Source (non-commercial only) Shore, Berkeley DB TOTAL XMLTOTAL XML Cincom Commercial Object-relational VirtuosoVirtuoso OpenLink Software Commercial Proprietary. Relational through ODBC XDBMXDBM Matthew Parry, Paul Sokolovsky Open Source Proprietary XDBXDB ZVON.org Open Source Relational (PostgreSQL) XediX TeraSolutionXediX TeraSolution AM2 Systems Commercial Proprietary X-Hive/DBX-Hive/DB X -Hive Corporation Commercial Proprietary. Relational through JDBC XindiceXindice Apache Software Foundation Open Source Proprietary xml.gax.comxml.gax.com GAX Technologies Commercial Proprietary Xpriori XMSXpriori XMS Xpriori Commercial Proprietary XQuantum XML Database ServerXQuantum XML Database Server Cognetic Systems Commercial Proprietary XStreamDB Native XML DatabaseXStreamDB Native XML Database Bluestream Db. Soft. Corp. Commercial Proprietary Xyleme Zone ServerXyleme Zone Server Xyleme SA Commercial Proprietary

50 Master Informatique 50 Dr. Vu Le AnhStructural indexes of XML Databases Summary 1.Big XML is used in many applications 2.Our problem: Efficient processing regular queries over XML databases. 3.Two ideas: 1.Using Relational databases 2.Building special indexes for XML databases

51 Master Informatique 51 Dr. Vu Le AnhStructural indexes of XML Databases Summary 4. Tree - index can be applied for XML tree and XML- near tree (using link graph) 5. Structural indexes: Simulate the data-graph by the smaller ones – index graphs. Construction based on the equivalent relationships. 6. Structural indexes is designed for support only a given of queries. 7. It can be applied in distributed XML database query processing (Cloud, Social networks)

52 Master Informatique 52 Dr. Vu Le AnhStructural indexes of XML Databases References [Chung et al., SIGMOD 2002] –Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim, APEX: an adaptive path index for XML data, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin [doi>10.1145/564691.564706]10.1145/564691.564706 [Dietz82] –Dietz, P. F. 1982. Maintaining order in a linked list. In Proceedings of the Fourteenth Annual ACM Symposium on theory of Computing (San Francisco, California, United States, May 05 - 07, 1982). STOC '82. ACM, New York, NY, 122-127. DOI= http://doi.acm.org/10.1145/800070.802184http://doi.acm.org/10.1145/800070.802184 [Goldman & Widom VLDB 97] –Goldman, R. and Widom, J. 1997. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proceedings of the 23rd international Conference on Very Large Data Bases (August 25 - 29, 1997). M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, and M. A. Jeusfeld, Eds. Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, 436-445. [Kaushik et al. 02] –Raghav Kaushik, Pradeep Shenoy, Philip Bohannon, Ehud Gudes, "Exploiting Local Similarity for Indexing Paths in Graph-Structured Data," Data Engineering, International Conference on, p. 0129, 18th International Conference on Data Engineering (ICDE'02), 2002 [Kiss05] –Attila Kiss, Vu Le Anh A solution for regular queries on XML Data, (PUMA Volume 15 (2005), Issue No. 2, pp.179- 202. [Kiss06] –Attila Kiss, Vu Le Anh: Efficient Processing SAPE Queries Using the Dynamic Labelling Structural Indexes. ADBIS 2006: 232-247ADBIS 2006 [Kiss07] –Attila Kiss, Vu Le Anh: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes. ADBIS Research Communications 2007ADBIS Research Communications 2007 [Li&Moon VLDB 2001] –Li and Moon, 2001 Li, Q., Moon, B., 2001. Indexing and querying XML data for regular expressions. In: Proceedings of VLDB 2001, pp. 367–370. [Milo & Suciu, LNCS 1997] –Milo, T., Suciu, D. (1999), "Index structures for path expressions", 7th International Conference on Database Theory (ICDT), pp.277-95. [Paige &Tarjan 87] –Paige, R. and Tarjan, R. E. 1987. Three partition refinement algorithms. SIAM J. Comput. 16, 6 (Dec. 1987), 973-989. DOI= http://dx.doi.org/10.1137/0216062http://dx.doi.org/10.1137/0216062

53 Master Informatique 53 Dr. Vu Le AnhStructural indexes of XML Databases Thank you!


Download ppt "Master Informatique 1 Dr. Vu Le AnhStructural indexes of XML Databases Dr. Vu Le Anh"

Similar presentations


Ads by Google