DEXA EGOV 2005 Conference Personalized Access to Multi-version Norm Texts in an eGovernment Scenario Fabio Grandi, Maria Rita Scalas Alma Mater Studiorum.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML DOCUMENTS AND DATABASES
1 3D_XML A three-Dimensional XML-based Model Khadija Ali, Jaroslav Pokorný Czech Technical University Prague - Czech Republic.
Search Engines and Information Retrieval
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Computer Systems & Architecture Lesson Software Product Lines.
● Problem statement ● Proposed solution ● Proposed product ● Product Features ● Web Service ● Delegation ● Revocation ● Report Generation ● XACML 3.0.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
CS370 Spring 2007 CS 370 Database Systems Lecture 2 Overview of Database Systems.
An eGovernment system for temporal- and semantic-aware access to norms SWEG 2006 – The Semantic Web meets eGovernment 2006 AAAI Spring Symposium Series,
Search Engines and Information Retrieval Chapter 1.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Querying Structured Text in an XML Database By Xuemei Luo.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Semantic based P2P System for local e-Government Fernando Ortiz-Rodriguez 1, Raúl Palma de León 2 and Boris Villazón-Terrazas 2 1 1Universidad Tamaulipeca.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Semantic Web Techniques for Personalization of eGovernment Services SemWAT st International ER Workshop on Semantic Web Applications: Theory and.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
XML and Database.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Temporal Data Modeling
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
Information Retrieval in Practice
CIS 375 Bruce R. Maxim UM-Dearborn
Building a Data Warehouse
Databases (CS507) CHAPTER 2.
DHTML.
CS 405G: Introduction to Database Systems
Fabio Grandi, Maria Rita Scalas,
A Generalized Modeling Framework for Schema Versioning Support
Datab ase Systems Week 1 by Zohaib Jan.
WEBIST 2005 – International Conference on Web Information Systems and Technologies Efficient Management Of Multi-Version XML Documents For E-Government.
The Object-Oriented Database System Manifesto
Light-weight Ontology Versioning with Multi-temporal RDF Schema
Third International Conference on Health Informatics
Dynamic Multi-version Ontology-based Personalization
IADIS International Conference e-Society 2005
Methodology – Physical Database Design for Relational Databases
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
The Valid Ontology: a simple OWL Temporal Versioning Framework
9/22/2018.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Storing and Querying XML Documents Without Using Schema Information
Chair of Tech Committee, BetterGrids.org
Multi-temporal RDF Ontology Versioning
An eGovernment system for temporal- and semantic-aware access to norms
ece 627 intelligent web: ontology and beyond
MANAGING DATA RESOURCES
Semantic Web Techniques for Personalization of eGovernment Services
Data Model.
Malte Dreyer – Matthias Razum
Effective Representation and Efficient Management of Indeterminate Dates Fabio Grandi University of Bologna, Italy Federica Mandreoli University of.
Fabio Grandi DEIS - Univ. of Bologna, Italy
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

DEXA EGOV 2005 Conference Personalized Access to Multi-version Norm Texts in an eGovernment Scenario Fabio Grandi, Maria Rita Scalas Alma Mater Studiorum - Università degli Studi di Bologna Federica Mandreoli, Riccardo Martoglia Enrico Ronchetti, Paolo Tiberio Università degli Studi di Modena e Reggio Emilia

Overview Our research activities concern the implementation of Web information systems for eGovernment applications Development of eGovernment initiatives: more and more on-line resources and services are being made available by Public Administrations (PAs) We make use of temporal database and semantic Web techniques to provide personalized access to such resources and services In particular, we consider multi-version norm texts (stored in XML format) available in Web repositories DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Personalized access to multi-version norms DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Importance of versioning Temporal concerns are ubiquitous in the law domain: A norm text changes in time due to subsequent modifications, but keeps its identity The ability to model temporal dimensions is essential for the management of evolving norms it is crucial to reconstruct the consolidated version of a norm also past versions are still important 2 new version Original norm text 3 new version 1 time DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Importance of versioning Applicability (semantic) versioning also plays an important role some norms or some of their parts have or acquire a limited applicability personalized version of the norm A version only containing articles which are applicable to a citizen’s personal case Art. 1 (unemployed) xxy yyx yxyx yyyxx xyyx Art. 2 (self-employed) aab bbab abab abba ab Art. 3 (retired) qwqq ww wqqw wq ww Self-employed DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Objectives Development of an effective and efficient Web information system where: norms are represented as XML documents dynamics of norms in time is captured limited applicability of norms is captured selective access and reconstruction of versions is supported by a query engine Aimed at: enabling citizens to access personalized versions of multiversion resources improving and optimizing the involvement of citizens in the eGovernance process DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Approach Definition of a temporal XML model including a temporal multiversion XML schema temporal manipulation operations applicability extensions (semantic versioning) Design, implementation and evaluation of system prototypes supporting the model First system, based on “stratum” approach on top of a commercial DBMS Ongoing research: second system, “native” approach includes semantic annotations in multiversioning DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

The temporal XML data model Based on XML Schema Follows the hierarchical organization of norm texts contents-section-article-paragraph At each level of the hierarchy, the history of changes is represented by the versions produced: The temporal pertinence is represented by timestamps, i.e. temporal elements encoded as multiple 3-dim intervals (TA) A reference to the modifying (active) norm is added (an_ref) Supports ancestor-descendant inheritance Timestamps of a node are inherited by its descendants Along the hierarchy, redefinitions can only involve a restriction of the temporal pertinence DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

The temporal XML schema Law Num – R Type – R 4 Temporal Dimensions: Publication time time of publication on the Official Journal Validity time time the norm is in force Efficacy time time the norm can be applied Transaction time time the norm is stored in the information system Publication – R Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O Title Contents Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O TA An_ref – O Ver Num – R Section Num – R Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O TA An_ref – O Ver Num – R Num – R Heading Article Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O TA An_ref – O Ver Num – R Heading Paragraph Num – R Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O TA An_ref – O Ver Num – R DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

An example document <norm num="2624/1999" type="Law"> <title>Cereals Importation</title> <contents publication="2001-01-01" vt_start="2001-01-01" tt_start="2001-01-10" et_start="2001-01-01" > … <article num="1"> <ver num="1"> <ta/ vt_start="2001-01-01" tt_start="2001-01-10" tt_end="2001-06-01" et_start="2001-01-01" > <ta/ vt_start="2001-01-01" et_start="2001-01-01" et_end="2001-06-10" … > <ta/ vt_start="2001-01-01" vt_end="2001-06-10" et_start="2001-06-10" … > <paragraph num="1"><ver num="1" > …Art. 1 before modification… </ver> </paragraph> … </ver> <ver num="2" an_ref="LD135/2000" > <ta/ vt_start="2001-06-10" tt_start="2001-06-01" et_start="2001-06-10" > <paragraph num="1"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> <paragraph num ="2"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> </article> DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

The “Stratum” approach Based on two components: XML document management facilities offered by Oracle 9i document-size granularity structural and textual constraints software stratum built on top temporal aspects reconstruction Extensive experimental results on the system behavior show: good performance ability to manage large collections of XML multi-version documents DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Query and modification operators Full search and reconstruction functionalities FOR $a IN path WHERE constraints on $a RETURN const-tree(document($a), temporal specs) constraints can contain keyword-based text selections const-tree operator for the reconstruction of a temporally consistent norm version (consolidated act; involves temporal selections) temporal specs may involve a temporal predicate for each of the supported dimensions Two basic operators for the management of norm modifications: to change the textual contents of a norm portion deletion, insertion, replacement of (a part of) the norm to modify the temporal pertinence of a given version time extension or suspension of (a part of) the norm DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Example of reconstruction (current version) <norm num="2624/1999" type="Law"> <title>Cereals Importation</title> <contents publication="2001-01-01" vt_start="2001-01-01" tt_start="2001-01-10" et_start="2001-01-01" > … <article num="1"> <ver num="1"> <ta/ vt_start="2001-01-01" tt_start="2001-01-10" tt_end="2001-06-01" et_start="2001-01-01" > <ta/ … > <ta/ … > <paragraph num="1"><ver num="1" > …Art. 1 before modification… </ver> </paragraph> … </ver> <ver num="2" an_ref="LD135/2000" > <ta/ vt_start="2001-06-10" tt_start="2001-06-01" et_start="2001-06-10" > <paragraph num="1"><ver num="1"> …Art. 1 after modification… <ver> </paragraph> <paragraph num ="2"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> </article> ( NOT INCLUDED ) DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

The “Native” approach Based on a Temporal XML Query Processor: provides all the temporal, structural, textual and applicability query facilities in a single component exploits ad-hoc data structures and algorithms finer granularity (“tuple”) embedded “light” DBMS libraries structural joins algorithms allows users to store and reconstruct on-the-fly XML norm texts satisfying the four types of constraints DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Semantic versioning Extension of the multi-version model based on temporal dimensions to include a semantic versioning dimension Aimed at providing personalized access to norms wrt applicability Civic ontology: a classification of citizens based on the distinctions introduced by successive norms (founding acts) that imply some limitations in their applicability DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Semantic versioning At this stage of the project, we manage “tree-like” ontologies class taxonomies induced by the IS-A relationship we exploit the pre-order and post-order properties of trees New versioning dimension Applicability of different parts of a norm text to the relevant classes of the civic ontology DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Semantic versioning Applicability is inherited by descendant nodes unless locally redefined By means of redefinitions we can also introduce, for each part of a document, complex applicability properties Extensions with respect to ancestors Restrictions with respect to ancestors DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Example of full search John Smith is a self-employed citizen. He is interested in the text of all the norms ... ... which contain paragraphs dealing with health care, ... ... which were valid and in effect between 2002 and 2004, ... ... and which are applicable to his case (civic class 7). Structural constraint Textual constraint Temporal constraint Applicability constraint 4 orthogonal constraints DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Example of full search Structural constraint Textual constraint FOR $a IN norm WHERE textConstr ($a//paragraph//text(), ’health AND care’) AND tempConstr (’vTime OVERLAPS PERIOD(’2002-01-01’,’2004-12-31’)’) AND tempConstr (’eTime OVERLAPS PERIOD(’2002-01-01’,’2004-12-31’)’) AND applConstr (’class 7’) RETURN $a Structural constraint Textual constraint Temporal constraint Applicability constraint 4 orthogonal constraints DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Finer storage granularity Each document is split into ad-hoc structures (tuples), providing a finer access granularity to optimize time and space requirements Tuple ( id, ) < structural attributes > < temporal attributes > < text > < appicability attributes > Each constraint is verified at query time on the corresponding attributes DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

structural attributes applicability attributes Finer storage granularity ID structural attributes temporal attributes text applicability attributes … 4 level startPos Health care … text 3 AA 15/12/1979 UC 20/12/1979 F 01/01/1980 pt ttEnd ttStart etEnd etStart vtEnd vtStart DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

norm//paragraph//text() Example of full search Civic ontology Normative DB Norm Article 1 Article 2 TA Ver 1 AA=3 Par 1 Par 2 … norm//paragraph//text() TA TA Ver 1 TA Ver 2 Ver 1 AA=4 ‘class 7’ … AA=3,8 AA Health care… …text X Health care… …text Y Health care… …text Z DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

“Native” approach benefits The native approach is able to access and retrieve only the strictly necessary data selection relies on ad-hoc and temporally-enhanced structures uses finer granularity of managed data wrt standard XML engines Only the parts which satisfy the temporal and applicability constraints are used for the reconstruction of the retrieved documents There is no need to retrieve whole XML documents and build spaceconsuming structures such as DOM trees to manipulate them, as required in the stratum approach Enhanced query processing efficiency Reduced memory requirements DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Evaluation benchmark Three XML document sets Variable document size 5000 documents (120MB) 10000 documents (240MB) 20000 documents (480MB) Variable document size min = 2KB avg = 24KB max = 125KB Five different query types Queries on keywords (structural + textual constraints) Q1 – keywords in contents Q2 – keywords in type and contents Temporal queries (structural + temporal constraints) Q3 – conditions on publication, validity and transaction time Mixed queries (structural + textual + temporal constraints) Q4, Q5 – with keywords and temporal conditions Five variants with personalized access support Qx-A – with additional applicability constraints DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Performance evaluation The selectivity of the query predicates strongly influences the performance of the stratum approach Q2, Q3: large amounts of documents containing some (typically small) relevant portions have to be retrieved The native approach shows to be faster and more reliable in all cases Performance is more uniform Retrieval of useless document parts is avoided DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Performance evaluation Very high efficiency in solving personalization queries The system manages applicability-based personalized access by means of simple comparisons involving pre/post encodings 0.5-1% slower than the original versions 3-4% storage space overhead required DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Performance evaluation time 1741 msec 1366 msec 1046 msec 5000 docs 10000 docs 20000 docs Scalability tests The answer time grows sublinearly with the number of documents Good scalability of the system in every type of query context DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Conclusions We presented our research work concerning the design and implementation of efficient Web-based information systems for eGovernment applications We developed a first platform (“stratum” approach) for temporal management of multi-version norm texts on top of a commercial DBMS We migrated such a system towards a more efficient platform (“native” approach) for which a specialized Temporal XML Query Processor has been designed and implemented The new prototype provides for advanced functionalities personalized access to documents on the basis of the digital identity of citizens relying on semantic versioning We proved our approach to be very efficient in a large set of experimental situations and showed excellent scale-up figures with varying load configurations DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario

Future Work Extensions of the current framework more advanced application requirements may include a more sophisticated ontology definition, possibly versioned Development of a complete technological infrastructure usable in a large Web-based eGovernment scenario identification, classification and reconstruction services Assessment of our developed systems in a concrete working environment with real users with a large repository of real norms Extension to a more general application domain (Web personalization via ontology-based user profiling) DEXA EGOV05 - Grandi Mandreoli Martoglia Ronchetti Scalas Tiberio - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario