Web Engineering & Web Information Systems Technology Geert-Jan Houben
Content Web information systems evolution WIS engineering XML & XQuery RDF & OWL Web services WS protocols
Web Information Systems
Web Information System Information System based on Web technology (Web-based, Web-aware, Web-enabled etc.) Information system Exchanges information with Object System (= business process) Stores and manages information: data-intensive Requires careful engineering of information exchange
Web Information System Web technology can be used as front-end, e.g. application is available on the Web (or Intranet) via a browser Enables easy use and maintenance of (personalized) end-user access Web metaphor is appealing for end-users Requires different techniques for engineering the system’s interfaces
Web Information System Web technology can also be used in back-end of information system Organize (connect) the data inside the system using Web technology Use World Wide Web as provider of data (or Intranet) Typically highly volatile information (distributed and heterogeneous) Requires different techniques for engineering the implementation
Examples Real-estate sales Employee databases Museum databases Digital libraries Mail order catalogs Reservation systems Auctions, virtual marketplaces EPG (Electronic TV Program Guide) Ref: Special section on Web Information Systems in Communications of the ACM, July 1998, Vol. 41, No. 7
Evolution in WIS Technology
Evolution in hypermedia First: standalone special-purpose systems Now: Web-based From authoring to designing to generating From static to dynamic (generated from database query result) From single site to portals (integrated access service) From read-only to interactive and often collaborative (read-write)
Evolution in Web Languages HTML written by author Easy, uniform interface Large effort for maintenance Not suited for changing information Automatically generating information First, using templates (and databases) Later, using XML and XSLT transformations Automatic processing of information Explicit metadata (RDF) Agreement on meaning (ontologies) Semantic Web: from human-readable via machine-readable to machine-processable
SemWeb “Layer Cake”
Other Views WebML: “A Web-enabled software system whose main purpose is to publish and maintain large amounts of data” exploratory, browsing-oriented, personalized interfaces (highly volatile) data stored by means of DBMS technology OOHDM: “WWW brought new generation of IS” hypermedia navigation through heterogeneous information space operations querying or affecting that information constant change, new navigation and services “Web-based applications, first good hypermedia applications” RMM: “History of graphics designers + programmers” Nielsen: “On the Web, the only constant is change. A site that works perfectly as long as its stays the same will quickly die.” “Healthy navigation structure key to success”
WIS Engineering
Device Dependency HTML SMIL WML
WIS Engineering Methodology Design of WIS requires careful engineering of information exchange between IS and OS Implies engineering of front-end (interface) and back-end (storage & retrieval) Professional applications: “from art to engineering” well-founded (software) engineering methodologies model-driven
Ref: wwwis.win.tue.nl/~hera Hera: motivation Methodologies exist for manual hypermedia presentation design, Hera targets automated presentation Automated presentation is important for databased content (the ‘deep web’) as opposed to manually crafted content (the ‘surface web’): most WIS are data driven Ref: wwwis.win.tue.nl/~hera
Hera Methodology Model-driven methodology, defines design phases: Conceptual Design that results in Conceptual Model (CM, describes data content used for generation of hypermedia presentations) construction Application Design that results in Application Model (AM, describes the navigation structure and functionality) construction Presentation Design that results in Presentation Model (PM, describes spatial layout and rendering of hypermedia presentations) construction
Hera Models Models fully specify application; hence, there is no need of additional programming Models are used by a generic Hera engine for generation of WIS application pages (by on-demand instantiations of model subsets)
Hera Architecture Defines how the models are used for automatic generation of hypermedia presentation
Conceptual Model Provides a uniform semantic view over different data sources that are integrated within a given Web application. Consists of hierarchies of concepts relevant within the given domain, their properties, and relations.
Conceptual Model Defines the data content in terms of RDFS (concepts, attributes, properties)
Application Model Navigation structure of a hypermedia application on top of CM Hypermedia dynamics (navigation structure updates and application functionality) of a hypermedia application
Slices
AM Example
Data Manipulations Defined as SeRQL queries Used for processing forms (handle user input) Q1 creates instances of SelectedPainting according to the SelectForm form content CONSTRUCT {P}<rdf:type>{acm:SelectedPainting>} FROM {P}<rdf:type>{cm:Painting}; <cm:aname>{Paname} WHERE Paname IN SELECT Faname FROM {SF}<form:aname>{Faname}, {SF}<rdf:ID>{FormName} WHERE FormName = “SelectForm” creates
Hera Implementation HPG 2.0 (Hera Presentation Generator, dynamic version) implemented in Java as a servlet Uses RDF API HP Jena for RDF data transformations based on RDFS models (CM, AM) Can use XForms processor Uses Sesame as main content repository and application context repository; uses SeRQL/RQL as query languages Set of graphical tools for designers for CM and AM based on Visio
XML & XQuery
HTML = Hypertext Language The <b> X23 </b> new camera replaces the <b> X22 </b>. It comes equipped with a flash (worth by itself <i>53.99 $</i>) and provides great quality for only <i>359.99 $</i>. Ref Name Price X23 Camera 359.99 R2D2 Robot 19350.00 Z25 PC 1299.99 hard Information System HTML Text + presentation Where is the data ?
XML for Semistructured Data Ref Name Price X23 Camera 359.99 R2D2 Robot 19350.00 Z25 PC 1299.99 ... Information System <product-table> < product reference=”X23"> <designation> camera </designation> <price unit=Dollars> 359.99 </price> <description> … </description> </product> < product reference=”R2D2"> <designation> Robot </designation> <price unit=Dollars> 19350 </price> ... </product-table> easy Data + Structure: more flexible XML
Complex data Structure is irregular (missing/extra data) Schema does not exist or is unknown Schema is rapidly evolving Relational and ODB models are too rigid Standard is a document/hypertext language HTML Solution: semistructured data model XML data model consists of a type definition language, a query/update language and more
XML XML: eXtensible Mark-up Language W3C and most industrial companies [B2B] Main idea: separate content and presentation Use tags to represent structure and semantics HTML: a fixed set of tags complicates the identification of information elements XML allows to define data structures: Tags with freely chosen names No predefined tags enables definition, transmission, validation and interpretation of data between applications (and organizations) Freely chosen attributes Ref: w3c.org
<purchaseOrder orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </items> </purchaseOrder>
XML Documents elements and attributes elements are ordered attribute values are strings well-formed documents (e.g. proper nesting) namespaces: vocabularies for tags valid documents: DTD, Schema
DTD: a grammar Catalog Product* Product Name Price? Cat (Part Quantity)* Part BasicPart + ComposedPart BasicPart Name ComposedPart Name (Part Quantity)*
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation xml:lang="en"> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="USAddress"> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> ... </xsd:schema>
... <xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="partNum" type="SKU" use="required"/> </xsd:complexType> <!-- Stock Keeping Unit, a code for identifying products --> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:schema>
Typing XML Not really, the true spirit of the Web, but essential for data management: query optimization, user interfaces, applications Differences with standard database typing Collections are sequences instead of sets Types may be very large (e.g., from integration) Data is more irregular so types should be more permissive New issues sometimes: you have the data, extract its type: an approximate type
<skills> <people> <person> <name>Bob</name> <know-how>XML</know-how> </person> <name>Peter</name> <know-how>RDF(S)</know-how> </people> <seminars> <seminar> <topic>XML</topic> <participant> <name>Karin</name> <name>Alice</name> </participant> </seminar> </seminars> </skills>
//person/name[../know-how=“XML"] $union$ //seminar[topic=“XML"]/participant/name
XPath Path expressions in OO databases Semistructured: missing parts /Students/Student/Status Semistructured: missing parts /Students//Status conditions /Students/Student[Status=“U4”] Indexing, wildcards Selection, string manipulation, aggregation, attribute existence, union
XSLT XSL: XML Stylesheet Language (XSLT: XSL Transformations) declarative language for transforming XML documents using an XSLT processor
XQuery http://www.w3.org/XML/Query “the” standard for XML querying Goal: “data model for XML documents, a set of query operators on that data model, and a query language based on these query operators” General query language (next to XPath + XSLT) Based on XPath
XQuery Path Expressions In the second chapter of the document named “zoo.xml”, find the figure(s) with caption “Tree Frogs”. document(“zoo.xml”)/chapter[2]// figure[caption=“Tree Frogs”] Find captions of figures that are referenced by <figref> elements in the chapter of “zoo.xml” with title “Frogs”. document(“zoo.xml”)/chapter[title=“Frogs”]// figref/@refid->fig/caption
XQuery Element Constructor Generate an <emp> element that has an “empid” attribute. The value of the attribute and the content of the subelements are specified by variables that are bound in other parts of the query. <emp empid={$id}> {$name} {$job} </emp>
XQuery FLWR Expression List each publisher and the average price of its books. FOR $p IN distinct(document(“bib.xml”)//publisher) LET $a := avg(document(“bib.xml”)/book[publisher=$p]/price) RETURN <publisher> <name>{$p/text()}</name> <avgprice>{$a}</avgprice> </publisher>
RDF & OWL
Web Data Integration WIS repository (back-end) typically assembled from different heterogeneous sources, e.g. databases, files, WWW To manage (coordinate) data from different sources, metadata helps to structure the data
Metadata Describing the data and its availability Sometimes provided by sources Needed by IS Engineering metadata: Meaning Validity Quality Specifying “logistics” of data
Resource Description Framework W3C standard for metadata description Describes the “meaning” of data like Web sites, parts of HTML pages, etc. Makes data “machine - understandable” – allows automated data processing Framework that allows you to make simple assertions about anything: distributed and extensible (as is the Web) “meaning” expressed via “subclass of” Ref: www.w3.org/RDF, www.w3.org/TR/rdf-primer
Basic RDF Model Recognizes 3 object types: Resources – always named by URI, e.g. web site, part of web page, others Properties – an attribute of a Resource, its characteristics Statements – Resource + Property + Property Value
Basic RDF Model Example RDF representation of the sentence: “Ora Lassila is the creator of the resource www.w3.org/Home/Lassila.” Statement: Subject (Resource) www.w3.org/Home/Lassila Predicate (Property) Creator Object (Literal) “Ora Lassila”
Basic RDF Model Example Diagram of the statement: Creator www.w3.org/Home/Lassila Ora Lassila
RDF and XML RDF can be implemented using XML The example of complete XML for the previous example is: <?xml version=“1.0”> <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:s=http://description.org/schema/> <rdf:Description about=www.w3.org/Home/Lassila> <s:Creator>Ora Lassila</s:Creator> </rdf:Description> </rdf:RDF>
Structured Value Example “The employee with ID 85740, Ora Lassila, with Email lassila@w3.org, is the creator of the resource www.w3.org/Home/Lassila” In XML it is: <rdf:RDF> <rdf:Description about=“www.w3.org/Home/Lassila”> <s:Creator> <rdf:Description about=“www.w3.org/staffid/85740”> <v:Name>Ora Lassila</v:Name> <v:Email>lassila@w3.org</v:Email> </rdf:Description> </s:Creator> <rdf:Description> </rdf:RDF> www.w3.org/staffid/85740 www.w3.org/Home/Lassila Ora Lassila Lassila@w3.org Creator Name Email
RDF - more It is possible to make statements about statements It is possible to refer a collection of resources (containers) of 3 types: Bag – a property has multiple values, order has no significance Sequence – a property has multiple value, order is significant Alternative – list of literals/resources representing alternatives for single property
RDF Query Language Querying RDF metadata RQL, SeRQL (with Sesame) SQL/XQL style approach, viewing RDF metadata as relational or XML database [RDF Query Specification (IBM)] viewing Web descriptions by RDF metadata as knowledge base, applying knowledge representation and reasoning techniques [W3C related] RQL, SeRQL (with Sesame)
Meaning: Ontologies Ontology = a vocabulary with associated meaning (“shared understanding”) Possibility to define synonyms, specializations and other relationships Use of same ontology = contract on meaning of words (tags, attributes) Often, industry or domain dependent
OWL Web Ontology Language used to explicitly represent meaning of terms in vocabularies and relationships between terms: ontology ontology engineering beyond XML and RDF(S) revision of DAML+OIL
Stack XML: surface syntax for structured documents (no semantic constraints on meaning) XML Schema: restricting structure of XML documents RDF: datamodel for objects (resources) and relationships, provides simple semantics for this datamodel RDF Schema: vocabulary for describing properties and classes of RDF resources, with semantics for generalization-hierarchies OWL: adds vocabulary for describing properties and classes, e.g. relations between classes (disjoint), cardinality (exactly one), equality, richer typing of properties, characteristics of properties (symmetry), enumerated classes
OWL Sublanguages OWL Lite: classification hierarchy and simple constraints OWL DL: maximum expressiveness while retaining computational completeness and decidability (description logics) OWL Full: maximum expressiveness and syntactic freedom of RDF with no computational guarantees
Web Services
Web Services Distributed computing model on asynchronous messaging (XML) Support dynamic application integration over the Web XML message for exchanging data and accessing services On-the-fly software creation through the use of loosely coupled, reusable software components Software can be delivered and paid per-use as opposed to package products
Principles XML Message Exchange Message Transport Message Nature XML Namespaces (URI) HTTP Message Nature Request Result of a request Errors Application Client Web Service <B> XML Web Service <C> Web Service <D>
Design Principles Wrapping services (applications) Web-based protocols Client XML HTTP-SOAP Design Principles Wrapping services (applications) Web-based protocols Web-services based on HTTP Protocols can traverse firewalls, can work in a heterogeneous environment Interoperability SOAP defines a common standard that allows different systems to interoperate XML-based (XML Schema) Machine-readable documents Web Service <D>
Design Principles Modularity Availability Machine-readable description Repository Application Client XML HTTP-SOAP Design Principles Modularity Service components are useful in themselves, reusable, composable Availability Services are available to systems that wish to use them Services must be exposed outside of the particular system they are available in (wrapping) Machine-readable description Used to identify the interface, the location and access information Published Searchable service repositories of service descriptions Web Service Application
Related Technologies Comparison with Data Wrapping Same idea (providing a transparent interface to legacy systems) Data (data models and query languages) vs. services (procedures, functions) Comparison with other RPC Existing RPC (DCOM, RMI or CORBA) Same idea (interface, dynamic discovery, protocols) Interoperability problems DCOM: Microsoft RMI: Java Technical problems CORBA: very complex
Main Components/Actors Directory Client Service Provider Description Application Web Discovery Publication Interaction
Three Main Components/Protocols Publication and Discovery: UDDI Service Description: WSDL Messaging: SOAP Transport: HTTP, SMTP, FTP Three Main Components/Protocols UDDI (Universal Data Description Interface) Directory for recording and searching the description of Web services Provides a mechanism for clients to find Web services WSDL (Web Services Description Language) XML description of a Web service Defines services as collections of network endpoint or ports A port is defined by associating a network address with a binding (servers) A collection of ports defines a service SOAP (Simple Object Access Protocol) Is a message layout specification that defines a uniform ways of passing XML-encoded data Based on HTTP
Publication and Discovery: UDDI Service Description: WSDL Messaging: SOAP Transport: HTTP, SMTP, FTP Directory Client Service Provider Description Application Web Discovery Publication Interaction UDDI WSDL SOAP
Basic Usage Scenario 2 http get 1 register WSDL file (manually) (manual) Web service lookup Client Write client application Run client Directory (UDDI) Publish Web service 1 register WSDL file (manually) Web Service Provider 2 http get 3 WSDL file 4 SOAP request 5 SOAP response Run Server
Web Service Implementation HTTP Server Web Service Provider SOAP Application Requestor (SOAP client) SOAP Message (HTTP transport) Application server (Web service-enabled) Provides implementation of services and exposes it through WSDL/SOAP (Wrapping) Implementation in Java, as EJB, as .NET(C#), etc. SOAP server Implements the SOAP protocol HTTP server Standard Web server SOAP client Implements the SOAP protocol on the client site
Web Service Classification ebXML Rosetta Net Biztalk Communication and transport services W3C Recommendations SOAP, XML, Namespace Technical services Services for message publication and exchange Examples: WSDL, UDDI Business services Specific for an activity area Defined and used by a group of companies working on the same activity area Examples: ebXML, RosettaNet, BizTalk Enterprise Web Service Business Service SOAP TCP-IP-HTTP Technical Service WSDL UDDI BPML
Web Service Examples Google http://www.google.com/apis/ Free but limited access
Web Service Examples Amazon Free but limited access http://associates.amazon.com/exec/panama/associates/join/developer/resources.html Free but limited access Search and management of Amazon products Access products Add products Customize presentations
Web Service Examples SellerEngine (http://www.sellerengine.com/) Uses Amazon Web Services Brings Amazon data to a desktop in real time (create new listings and upload them to Amazon in seconds using a easy to use interface)
WS Protocols
Protocol: SOAP Simple Object Access Protocol Communication protocol via Internet between applications: data exchange and data structures Format for sending messages Platform and language independent Based on XML Messages with two types of elements Pre-defined tags Application-specific tags Directory Client Service Provider Description Application Web Discovery Publication Interaction UDDI WSDL SOAP
Protocol: SOAP SOAP: Communication and Transport Services SOAP document Envelop: message type and destination Data type: representation of data types Conventions for the RPC and for the result or error sending Rules for the SOAP transport on HTTP Transport SOAP HTTP TCP/IP POST: … Host: … Content-type: … Content-length: … <Envelope> <Body> </Body> </Envelope> Message depends on Web services
Protocol: WSDL Web Services Description Language XML-based language for describing Web services and how to access them Specification of the location of the service Specification of the operations (or methods) the service exposes Directory Client Service Provider Description Application Web Discovery Publication Interaction UDDI WSDL SOAP
Protocol: WSDL Description of Web Services in XML format Abstract description of operations and their parameters (messages) Concrete description Binding to a concrete network protocol (SOAP) Specification of endpoints for accessing the service Types: structure of messages Abstract Messages: used by operations Binding: concrete protocol Concrete Service: collection of related ports Ports: Binding and a network address
Protocol: UDDI Universal Description, Discovery and Integration Directory service where businesses can register and search for Web services (described in WSDL) Communication via SOAP Directory Client Service Provider Description Application Web Discovery Publication Interaction UDDI WSDL SOAP
Protocol: UDDI UDDI Support UDDI is a cross-industry effort driven by all major platform and software providers like Dell, Fujitsu, HP, Hitachi, IBM, Intel, Microsoft, Oracle, SAP, and Sun, as well as a large community of marketplace operators, and e-business leaders Over 220 companies are members of the UDDI community Microsoft (uddi.microsoft.com) IBM (ibm.com/services/uddi) HP (uddi.hp.com) SAP (udditest.sap.com)
Example: Flight Reservation If the industry published an UDDI standard for flight rate checking and reservation, airlines could register their services into an UDDI directory Travel agencies could then search the UDDI directory to find the airline's reservation interface When the interface is found, the travel agency can communicate with the service immediately because it uses a well-defined reservation interface
Microsoft’s UDDI uddi.microsoft.com Search Results Search of a Web Service: Xmethods Search Results
Microsoft’s UDDI Description of the selected service Returns book price from Barnes and Noble online store, given ISBN
Microsoft’s UDDI WSDL Description of the Web service
Service Development Two main technologies Java (EJB server) .NET