/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Web Information Systems Engineering Flavius Frasincar
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Contents What is a Web Information System (WIS)? WIS Features Problem: Data Management in WIS Solution: Model-Driven Methodology (with Tasks Separation) Methodologies for WIS: –Strudel Methodology –Hera Methodology Summary
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, World Wide Web 1990: Tim Berners Lee ( ) invents the World Wide Web The Web success is based on: –hypermedia (link) nature: links allow for a natural and flexible access to information according to the associative nature of human mind –global availability –interoperability –simplicity –free etc.
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Web Information Systems (WISs) 1998: Tomas Isakowitz at al. coined the term Web Information Systems for: “information systems that are based on Web technology” WISs are different from traditional information systems as they “have the potential of reaching a wider audience” through different platforms There is an even increased need to integrate data as the data sources are distributed over the Web and they are possibly heterogeneous
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Three Generations of WISs First Generation: are based on hand-crafted HTML –Difficult to maintain (update) Second generation: generate HTML on demand by automatically filling templates –Data is machine readable/transformable –Difficult to make the data machine understandable Third generation: Semantic Web Information Systems (SWISs) are WISs based on Semantic Web technology (RDF, OWL etc.) –Data is machine understandable
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Present the Deep Web Deep Web vs. Surface Web: 500 times larger 1000 times better quality
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, WIS Features Data-intensive: integrate data from multiple heterogeneous sources Pervasive: support different platforms e.g. network (T1, 128K, 56K), display (PC, Palm, WAP Phone) User Adaptable: consider user’s preferences and user’s state of mind while interacting with the system Flexible: support semistructured data Automatic: need little or no human intervention User interactive: e.g. online shops (Amazon)
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Problem: Data Management WIS are hard to specify and implement Methodologies exist for manual WIS design but few of them target automation Difficult tasks to perform: –Multiplatform support –Automatic updates –Automatic site reconstruction (WIS Adaptation) –Optimize WIS performance (WIS Optimization) –Enforce WIS integrity constraints (WIS Analysis) –Achieve flexibility, extensibility etc.
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Semistructured Data It is characterized by: –Irregular structure: missing or additional attributes, multiple attributes –Few type constraints: attributes with different types in different objects, heterogeneous collections –Rapidly evolving schema or missing schema It is typically modeled by a DLG (Directed Labeled Graph) Examples: HTML, XML, RDF, LaTeX Bib etc.
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Solution: Tasks Separation Isolate and automate common tasks for WIS design: –Choose and access the data (data integration and retrieval) to be presented –Design the navigational structure for this data –Design the visual aspects of the presentation Use a model-driven approach for task specification (the fairy says it brings “wisdom” [theory], “richness”[money], and “beauty” [judge it yourself] – Stefano Ceri)
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, WIS Presentation Generation Srategies Static (eager approach): presentations are materialized completely, each page is precomputed Dynamic or On-demand (lazy approach): after each link “click” the next page to be presented is computed
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Methodologies Dexter-based: HDM (Hypermedia Design Method) ER-based: RMM (Relationship Management Methodology) OMT-based: OOHDM UML-based: OO-H (Conallen), UWE (UML-based Web Engineering), W2000 (HDM extension) RDF-based: XWMF (eXtensible Web Modeling Framework), Hera Other: Strudel, Araneus, WebML (Web Modeling Language), Autoweb, Trellis, XAHM (XML-based Adaptive Hypermedia Model ), WSDM, W3DT etc.
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Strudel Methodology AT&T
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Strudel Architecture
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Input Data Declarative spec… Mary Fernandez Dan Suciu 2000 VLDB Strudel is a … Languages Methods … … Catching the … Mary Fernandez Daniela Florescu 1998 SIGMOD The Strudel … WIS …
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Semistructured Data Model Direct Labeled Graph (DLG)
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, STRUQL (Site TRansformation Und Query Language) where Root ”publications” r, r ”pub” x, x l v { where l=“year” link YearPage(v) ”year” v, YearPage(v) ”paperPage” x, RootPage() ”yearPage” YearPage(v) collect RootPage{RootPage()}, YearPage{YearPage(v)} } …
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Site Graph
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, STRUDEL Template Language RootPage collection: <sfor p in yearPage order=ascend key=year> YearPage collection: PaperPage collection:,
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, STRUDEL +/- + : Tasks separation (content and presentation) Declarative specifications (enables presentation content adaptation) Verification of integrity constraints (e.g. “All paper pages are reachable from RootPage”) -: Intermixes schema and content defintion in the data graph Does not separate navigation from visual details of the presentation Does not use standard technologies
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Hera Methodology TU/e
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Hera Architecture
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Hera Presentation Methodology Conceptual Design Application Design Presentation Design Adaptation Design Conceptual Model Application Model Presentation Model Transformation
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Conceptual Model (CM) Provides a uniform semantic view over different data sources that are integrated within a given Web application Consists of hierarchies of concepts relevant within the given domain Concept relationships are: –Attribute relationships: refer to literal values that characterize a concept –Reference relationships: refer to other concepts
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Example: CM
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Example: CM in RDF/XML <rdf:Property rdf:ID="creates" sys:cardinality="multiple" sys:inverse="created_by">
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Application Model (AM) Captures the logical (navigational) aspects of the presentation Based on the concept of slice which contains attributes and possibly other slices –A slice is a meaningful presentation unit –A slice is associated to a concept from CM Slice relationships are: –Aggregation relationships: embed a set of slices (abstraction for index, tour, indexed guided tour etc). –Reference relationships: link abstraction with an anchor specified
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Example: AM
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Example: AM in RDF/XML <rdfs:Class rdf:ID="Slice.technique.main" slice:owner=“CM#Technique" slice:main="Yes"> <rdfs:Class rdf:ID="S.painting.picture" slice:owner=“CM#Painting" slice:attr-ref=“CM#picture"> <rdfs:Class rdf:ID="Slice.painting.main" slice:owner="CM #Painting">
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Adaptation Captures two kinds of adaptation –Adaptability takes into account the device capabilities and user preferences (UAProf = User Agent Profile) –Adaptivity means that the presentation changes itself according to the “state of the user’s mind” while being browsed (UM = User Model) Adaptation based on conditioning the appearance of slices using UAProf and/or UM Adaptivity uses AHAM (Adaptive Hypermedia Application Model) update rules for updating UM
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Adapted Application Model
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Presentation Model Defines the physical appearance of the presentation Based on the concept of region which contains attributes and possibly other regions: –Each region has a rectangular area associated –Slices are translated to regions, one slice can be mapped to several regions Slice relationships are materialized with: –Navigational relationships –Spatial relationships –Temporal relationships
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Presentation Model
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Presentation in Browsers HTML SMIL WML HyperText Markup Language Synchronized Multimedia Integration Language Wireless Markup Language
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Implementation Models are represented in RDF and they are serialized in RDF/XML User Agent Profile (UAProf): a Composite Capability/Preference Profiles (CC/PP) vocabulary to model device capabilities and user preferences XSLT processor for transforming between different model instances (stylesheet-based transformation) –Xalan (XSLT 1.0) –Saxon (XSLT 2.0): multiple output files support
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Data Transformations Step 0: Preparation –Substep 0.1: Application Model Unfolding creates the skeleton of an AM instance –Substep 0.2: Application Model Adaptation adds slice visibility conditions to the previous skeleton –Substep 0.3: Main Transformation Specification Generation builds the specification for the next step Step 1: Main Transformation populates the AM with the input CM instance Step 2: Presentation Generation produces code for different browsers (HTML, WML, SMIL)
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Data Transformations
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Hera +/- + : Tasks separation (content, navigation, and presentation) Model-based specifications (enables presentation content adaptation) Uses standard technology: RDF, RDF/XML, XSLT -(Future Work): Specifications are semi-formal (difficult to check integrity constraints) Does not (yet) support user interaction
/ department of mathematics and computer science TU/e eindhoven university of technology ISAApril 17, Summary What is a Web Information System (WIS) Features of WIS: data intensive, pervasive etc. Design methodologies for WIS: –Strudel (from industry) –Hera (from university) Model-based approach for WIS design WIS design tasks separation: –Data Selection –Navigation –Presentation