Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Site Management Based on Declarative Specifications Alon Levy University of Washington Joint work with: Strudel: Dana Florescu (INRIA), Mary Fernandez,

Similar presentations


Presentation on theme: "Web Site Management Based on Declarative Specifications Alon Levy University of Washington Joint work with: Strudel: Dana Florescu (INRIA), Mary Fernandez,"— Presentation transcript:

1 Web Site Management Based on Declarative Specifications Alon Levy University of Washington Joint work with: Strudel: Dana Florescu (INRIA), Mary Fernandez, Dan Suciu (AT&T), Khaled Yagoub (INRIA) Tiramisu: Corin Anderson and Dan Weld (UW)

2 Problem: Building Web sites Building Web sites involves three tasks: – Selecting and managing the site’s content – Organizing the site’s structure (pages and links) – Designing the graphical presentation of pages. In current tools, these tasks are (mostly) interdependent. Strudel’s key ideas: –Separate the three tasks. –Manage content and structure declaratively.

3 Content Management and Graphical Presentation Content may be derived from multiple sources: – Databases: relational, object-oriented – Semi-structured sources (XML, Word, Excel, bibtex). Classical data integration problem! (see Tsimmis, Garlic, Information Manifold, Tukwila) Graphical presentation: –Need to integrate with tools that create animations, images, Java applets. –Create sets of similar HTML pages using templates.

4 Web-Site Structure The structure includes: Set of pages and contents of each page, and Links between the pages.

5 Current practice Current tools separate only content management from presentation: Content managed by database: Embed queries in HTML templates Simple tools to view and modify structure at the extensional level. WYSIWYG tools for managing presentation. But they still cannot: explicitly manage site's global structure, or flexibly choose content-management system As a result it’s hard to: modify the structure of a web-site, build multiple versions for different classes of users, enforce integrity constraints.

6 Talk Outline Problem definition Strudel architecture Advantages of declarative specifications: –Specifying and verifying integrity constraints. –Automatic generation of run-time plans for managing data-intensive web sites. Tiramisu: –Separating the design tool from the implementation. –Using a collection of tools to build a site.

7 Strudel Evolution Strudel (Nov. 96)[AT&T] Strudel-R (INRIA) Tiramisu (Sept. 98) (U. Washington) Strudel AT&T Release http://www.research.att.com/sw/tools/strudel

8 Strudel Architecture and System

9 Strudel Features: –Integrates content from multiple sources. –High-level declarative language for managing site’s structure (StruQL). Advantages: –Derives multiple sites from the same data. –Supports easy restructuring and modification. –Provides platform for: Enforcing integrity constraints Designing policies for efficient run-time management of sites.

10 Strudel Architecture

11 Data Model Strudel is based on a semi-structured data model: labeled directed graphs. nodes in the graph represent objects, labels on arcs represent attribute names, named collections. Why semi-structured data? raw data is often semi-structured (and I don’t mean that it’s embedded in HTML) convenient for data integration (a ` la TSIMMIS) web-sites are ultimately graphs.

12 The StruQL Query Language A StruQL query is a function from a set of input graphs to an output graph. A StruQL expression contains two parts: A query component, and A restructuring component. Formally: INPUT graph names WHERE conjunction of regular path expression atoms CREATE name the nodes in the output graph using Skolem functions LINK specify the links in the resulting graph. StruQL evolved into XML-QL, (see WWW8 Conference)

13 Example Raw Data Article 1: –Date: 8/1/97 –Title: “Clinton announces new …” –Priority: Headline –Category: USA News –Images: im1.gif, im.gif –Text: “President Clinton announced…” –Related article: article2 Article 2: –Date: 8/2/97 –Title: “FDA approves new cure for…” –Priority: Top Story –Category: Health –Video: vid1.avi –Text: “The Federal Drug Administration…”

14 CNN Web-site Query (part 1) Input graph of articles INPUT CNN-ARTICLES Create web page for each article WHERE Articles(a), note arc variable: l art -> l -> t, l in { "Title", "Abstract", "Date", "Text", "Image", "Topimage", "RelatedSite"}, a -> "Category" -> c CREATE ArticlePage(a) LINK ArticlePage(a) -> l -> t {WHERE a -> "RelatedArticle" -> r LINK ArticlePage(a) -> "RelatedArticle" -> ArticlePage(r)}

15 CNN Site Schema RootPage() ArticlePage(a) CategoryPage(c) CategoryEntry(c) RootPageEntry(a) Data(t): a -> l -> t, l in { "Title", "Abstract",…} a-> priority-> “headline” a-> category->c Data(t):- a -> l ->t l in {“title”, “top-image”}

16 CNN Web-site Query (part #2) CREATE RootPage {WHERE a -> "Priority" -> "headline", l in { "Title", "Date", "Topimage"} CREATE RootEntry(a) LINK RootPage -> "HeadlineStory" -> RootEntry(a), Link each headline story to its title, date, top image and full article RootEntry(a) -> "FullStory" -> ArticlePage(a), RootEntry(a) -> l -> t}

17 HTML Templates,

18 CNN Sports Query INPUT CNN WHERE TopCategory(c), c -> "CategoryName" -> cn, cn="Sports", c -> "SubTopic" -> top, Articles(a), a -> l -> t, l in { "Title", "Abstract", "Date", "Text", "Image", "Topimage", "RelatedSite"}, a -> "Category" -> c, c=top CREATE ArticlePage(a) LINK ArticlePage(a) -> l -> t

19 StruQL Details Regular path expressions are constructed by a grammar: R <- “a” |  | R1.R2 | R1|R2 | R1* | L | _ Atoms in the WHERE clause are of the form X -> R -> Y or C(X) The LINK clause includes atoms of the form: LINK f(X) --> “new link” --> g(X) or LINK f(X) --> L --> g(X) Queries can be nested, inheriting the WHERE clauses of their outer blocks. Note separation between querying part and restructuring part!

20 More on StruQL Bare bones language for semi-structured data: includes the essential features. More expressive than Lorel or UnQL (e.g., can reverse graphs) Conceptually and in practice: separation between query component and restructuring component is important. Containment is decidable for StruQL-WHERE (Florescu, Levy & Suciu, PODS-98)

21 Advantages of Declarative Specifications

22 Enforcing Integrity Constraints We often want to verify some constraints on site structure: –all articles from the last two days are reachable from the root –all paths to confidential data must go through an authentication node –Good site design principles are summarized as integrity constraints [Lohse, CACM, 98]. When site specs are long, constraints are hard to enforce. Want to verify constraints intentionally.

23 Intentional IC Verification Formally, we want to check whether: S(D) |= IC S is the site specification (e.g., StruQL Query) IC is a formula describing the constraint:  a, Article(a) & date(a) > today-2 => Root -> * -> ArticlePage(a). for any instance D of the underlying data. Results: –Sound and complete algorithms for verification of a class of integrity constraints (path constraints). –Algorithms will also propose corrections when IC’s are violated.

24 Run-time Management of Sites When do we compute web pages? –Static approach: completely precompute site Doesn’t work for large sites, forms, hard to update. –Dynamic approach: compute pages on request Users may wait, a lot of repeated computation, structure of the site is not exploited. Current tools use one of the extremes, or specify policy per collection of pages. –The specification is implicit in code. Our goal: use site specification to automatically find optimal strategy.

25 Possible Run-time Optimizations View materialization Function caching: –when web sites represent hierarchically structured data, successive queries in the site differ only in their projected attributes. Simplification under preconditions: –previous queries on the path may have already verified some conditions for current query. Lookahead computation: –often it is possible with little cost to compute the data necessary for subsequent pages.

26 Problem Statement Given: –site specification –knowledge about browsing patterns –cost function Produce: –Operational plan: operational schema + a set of queries to compute on a given page request. Results: (in Strudel-R): framework + –Performance study of the optimizations. –Algorithm for generating operational plans. –Identification of many open problems.

27 Strudel Experience --> Tiramisu

28 Experiences with Strudel (except for the lousy GUI) Integrating data from multiple sources when building a Web site is a prime concern. Sources are semi-structured! Declarative specification of site structure is very important because: site creation is a highly iterative process site owners often need redesign after experience from deployment we often generate multiple versions of sites from the same data. Design of web-sites is done in a top-down fashion. Strudel can’t be the all encompassing web-site management tool.

29 Tiramisu: the Second Generation Strudel and its siblings (Araneus, YAT, WebOQL, WIRM) force the design and implementation of the site to be done in the same tool. Furthermore, there will always be tools that are specialized for specific tasks. Tiramisu: –Separate design phase from implementation. –Allow the implementation to be done by a set of cooperating tools.

30 Tiramisu Architecture E/R style diagram of site (site schema) Implementation manager Tool (Strudel) Tool (ASP) Tool (FrontPage) data source data source data source mediator wrapper web site

31 Screenshot of a TERD

32 Conclusions Web-site management is an important area for Database research. First-generation systems (Strudel, Araneus, YAT, WebOQL) offer important advantages: –Easy modification, creation of multiple versions –enforcing constraints, run-time management Second generation: (Tiramisu) –Emphasize design phase of site –Implement with a collection of cooperating tools.


Download ppt "Web Site Management Based on Declarative Specifications Alon Levy University of Washington Joint work with: Strudel: Dana Florescu (INRIA), Mary Fernandez,"

Similar presentations


Ads by Google