1 Thomas Triebsees, Department of Computer Science Thomas Triebsees University of the German Federal Armed Forces Munich Department of Computer Science Winnipeg, 31th August 2007 Towards Automatic Document Migration: Semantic Preservation of Embedded Queries
2 Thomas Triebsees, Department of Computer Science Agenda I.Research Context and Motivation II.Our Approach 1.Property Specification and Tracing 2.Automated Query Evalutation and Construction III.Results IV.Conclusions
3 Thomas Triebsees, Department of Computer Science I.Research Context and Motivation
4 Thomas Triebsees, Department of Computer Science Research Context Task: Semantic preservation high degree of process reliability necessary (trustworthyness) amount of documents requires automation document representations (formats) change still: most QA done hand-crafted
5 Thomas Triebsees, Department of Computer Science Example Property – Link Consistency Calculation documents harvest WWW store source calc05 calc.pdf start.html Website Calculation Calculation documents Aim: improve portability source calc05 calc.pdf start.html Website Calculation style.css
6 Thomas Triebsees, Department of Computer Science Example Property – Link Consistency Calculation html index.html calc05 resources calc.pdf calc05 source calc05 calc.pdf start.html Website Calculation Calculation documents harvest WWW store Calculation documents style.css
7 Thomas Triebsees, Department of Computer Science Semantic Queries Queries embedded in documents; Formalize semantic preservation: - evaluation - construction? Examples: URLs query server/directory structure style sheets (CSS) query XML/HTML documents XPath expressions query XML documents … Calculation documents Calculation htmlindex.html calc05 resources calc.pdf calc05 style.css
8 Thomas Triebsees, Department of Computer Science II.Our Approach – Semantic Evaluation and Construction of Embedded Queries
9 Thomas Triebsees, Department of Computer Science Our Approach migration process source documentstarget documents property specifications preservation requirements Framework tracing property matching property matching automated verification notification What are the relevant properties? What are the different representation forms? (1) (2) What is to be preserved? (3) Implement transformation: Notify system on transformation steps (4) Trace relevant object histories. Verify preservation requirements w.r.t. source and target objects.
10 Thomas Triebsees, Department of Computer Science (1) Property Specification LinksTo Calculation documents link_source link_anchor link_target Concept + Interface Context LinkAbsContext LinkRel store Calculation documents Calculation htmlindex.html calc05 resources calc.pdf calc05 style.css source calc05 calc.pdf start.html Website Calculation style.css define role names for property assign roles in different implementations
11 Thomas Triebsees, Department of Computer Science pres K ( {s → link_source, a → link_anchor, t → link_target}, LinksTo (s, a, t), {LinkAbs,LinkRel}, {LinkRel}) Expressed semi-formally using concepts and contexts: When transforming a link source, a link anchor, and a link target to a new representation, preserve the concept LinksTo for these objects in the context LinkRel. (2) Expressing Preservation Requirements Requirement: When transforming a website, translate all absolute links to relative links while preserving link consistency. Expressed formally:
12 Thomas Triebsees, Department of Computer Science (3) Tracing Semantic Properties - Preservation LinksTo Calculation documents link_source link_anchor link_target LinkAbsLinkRel store Calculation documents pres K ( {s → link_source, a → link_anchor, t → link_target}, LinksTo (s, a, t), {LinkAbs,LinkRel}, {LinkRel}) Calculation htmlindex.html calc05 resources calc.pdf calc05 style.css source calc05 calc.pdf start.html Website Calculation style.css
13 Thomas Triebsees, Department of Computer Science Preservation of Embedded Queries Targets: Semantic preservation of link consistency links can be evaluated semantically only valid URLs are accepted as links links can be constructed automatically only valid URLs are constructed constructions allow for formal proofs w.r.t. preservation requirement Tools: Automata Theory (Finite State Automata, FSA) Graph Theory Steps: (1)Formalize queried structure for link evaluation and construction (2)Formalize syntactically valid URLs (3)Combine both Can be generalized to other applications Integrating embedded queries
14 Thomas Triebsees, Department of Computer Science Specification of Queried Structure (1) Formalize queried structure - vertices (objects) yield query semantics - labels carry URL substrings - generate finite state automaton
15 Thomas Triebsees, Department of Computer Science Specification of Queried Structure
16 Thomas Triebsees, Department of Computer Science Grammar for URI-references Specification of Syntactically Valid URLs (2) Formalize syntactically valid URLs - reduce URI-reference grammar - construct query automaton
17 Thomas Triebsees, Department of Computer Science Specification of Syntactically Valid URLs Construction of Query automaton
18 Thomas Triebsees, Department of Computer Science Combine both – Full link automaton - basically: Let both automata run in parallel - match non-terminal transitions of URL automaton with appropriate transitions of struture automaton (3) Combine both
19 Thomas Triebsees, Department of Computer Science Integration and Benefit LinksTo Calculation documents link_source link_anchor link_target LinkAbsLinkRel store Calculation documents evaluation construction Calculation htmlindex.html calc05 resources calc.pdf calc05 style.css source calc05 calc.pdf start.html Website Calculation style.css working provably correct
20 Thomas Triebsees, Department of Computer Science III.Results
21 Thomas Triebsees, Department of Computer Science
22 Thomas Triebsees, Department of Computer Science IV.Conclusions and Outlook
23 Thomas Triebsees, Department of Computer Science I.Automated evaluation and construction of embedded queries II.Based on formal, automata-theoretic constructions -> provable correctness III.Integration into framework for semantic preservation IV.Future work: Computing structures on demand Regular expressions as queries Include extensions like CSS or XPath predicates
24 Thomas Triebsees, Department of Computer Science Subject to your questions… Thomas Triebsees Universität der Bundeswehr München Department of Computer Science