1 Semantic Annotation for Web Content Adaptation Unit 14 of Spinning the Semantic Web
2 Introduction Necessary for Web contents to be adapted for transparent access from a variety of client agents (cellular phones, PDA) –A large, full-color image may be reduced with regard to size and color depth, removing unimportant portions of the content, when accessed by certain devices Better presentation and faster delivery to client devices Transcoding: transformation of information from one form to another –Web content transcoding –Crucial for universal Web access under varying conditions that may depend on client capabilities, network connectivity, or user preferences
3 Composite Capabilities/ Preferences Profiles (CC/PP)
4 Introduction CC/PP stands for Composite Capabilities/Preferences Profile, and is a system for expressing device capabilities and user preferences. The goal of the CC/PP framework is to specify how client devices express their capabilities and preferences (the user agent profile) to the server that originates content (the origin server). The origin server uses the "user agent profile" to produce and deliver content appropriate to the client device. In addition to computer-based client devices, particular attention is being paid to other kinds of devices such as mobile phones.
5 Devices The web is accessed by various devices: –PC, NoteBook, PDA, Mobile Phone… Each one having different capabilities –Hardware : screen size/color, audio, bandwidth… –Software : mpeg, mp3, 3GPP, AMR…
6 CC/PP & RDF The CC/PP framework starts with RDF and then overlays a CC/PP-defined set of semantics that describe profiles. CC/PP, RDF based profiler, is a collection of information of capabilities of hardware platform and system software, and preferences of the user.
7 Advantages of CC/PP By only sending required content, no time or bandwidth is wasted sending unwanted content. This can also lead to faster page loading times. A server can provide information to a more diverse range on browsers. This can not only be beneficial in economic terms, but also in terms of site accessibility. You give the users what they want, not what you think they want. So many…
8 Deployment(Client & Server Proxies)
9 Deployment (Server Proxy only)
10 Deployment (Client Proxy only)
11 Deployment (Ideal Approach)
12 CC/PP Query
13 Content adaptation
14 Two ways to use CC/PP profiles Selection If the web server has a set of pre-written web pages, suitable for a number of different devices, then the profile can be used to decide which of these pre-written pages is most suitable for the web browser. Transformation Web page content can be kept in a neutral format (e.g. XML). This can then be transformed into an appropriate format, using the profile to decide what that format is.
15 CC/PP Implementations DICE Hewlett Packard DELI Intel Inria Keio University - Portal UMBC JIGSAW X-Smiles Browser So many…
16 Demonstrations An example of RDF file and graphfilegraph A Demo Page presenting the functionality of the CC/PP protocolA Demo Page presenting the functionality of the CC/PP protocol
17 Reference ojects/pda_doc_layout/seminar-html/ ojects/pda_doc_layout/seminar-html/
18 External Annotation Framework
19 Annotation Schemes Inline annotation: embed annotations in a Web document –Created as extra attributes of document elements HTML browsers ignore unknown attributes in a HTML document –Ease of annotation maintenance, eliminating the bookkeeping task annotations with their target documents –Require annotators to have document ownership External annotation: separate annotation from the original document –Raise no issues related to document ownership –Facilitate the sharing and reuse of annotations across documents –Avoid the mixing of contents and metadata
20 Applications of Web Content Annotation Discovery –Accurate searches of Web resources Qualification –Descriptions of users’ preferences regarding privacy Adaptation – the focus of this unit
21 Overview of An Annotation- Based Transcoding
22 External Annotation Files Contain metadata that address a part of a document to be annotated –XPath and XPointer are used to associate annotated portions of a document with annotating descriptions A reference may point to a single element or a range of elements If a target element has an ID attribute, the attribute can be used for direct addressing with the need for a long path expression Use RDF as the fundamental syntax of annotation files –User preferences and device capability: Composite Capability/Preference Profiles (CC/PP) –Document profiles (
23 Framework of External Annotation
24 Association How to select an annotation file for a Web document –Implicitly by means of a structural analysis of the subject document –Explicitly by means of tag An annotation file can be associated with a single document file, but the relation is not limited to one-to-one –Many annotation files for one Web document –One annotation file for multiple Web documents Useful when it is necessary to annotate common parts of Web documents, such as page headers, company logo images, and sidebar menus
25 Annotation-Based Transcoding System
26 Overview Content can be adapted on a content server, a proxy, or a client terminal –An adaptation engine should not be forced to reside in any particular location Use a proxy-based approach for content adaptation
27 Transcoding Architecture Intermediary –Computational entities that reside along the Web transaction path –Facilitate an approach to making ordinary information streams into smart streams that enhance the quality of communication An intermediary processor or a transcoding proxy can operate on a document to be delivered and transform the contents with reference to associated annotation files
28 Authoring-Time Transcoding Requirement for authoring-time transcoding –WYSIWYG editor –Let the annotator to navigate from an existing annotation to a portion of an annotated document designated by XPath / Xpointer –Verify the results of content adaptation through a previewer Authoring-time transcoding is crucial when annotations are employed for content adaptation, rather than discovery or qualification of contents –Content adaptation often changes the structure of original documents as the results of transcoding
29 Authoring-Time Transcoding
30 WYSIWYG Annotation Tool
31 HTML Page Splitting for Small- Screen Devices
32 Annotation Vocabulary An annotation vocabulary for HTML page splitting needs to be specified to constrain the possibilities for decomposition, combination, and partial replacement of contents Annotation of Web Content for Transcoding Alternatives –Provide alternative representations of a document or any set of its elements –Color image grayscale image –A transcoding proxy selects the alternative that best suits the capabilities of the requested client device Elements in the annotated document can then be altered either by replacement or by on-demand conversion
33 Annotation Vocabulary (Cont.) Splitting Hints –An HTML file that can be shown as a single page on a normal desktop PC may be divided into multiple pages on clients with smaller display screens –pcd:Group: specifies a set of elements to be considered as a logical unit and provides hints for determining appropriate page break points Selection Criteria –Help a transcoding module select, from alternatives, the one that best suits the client device –pcd: role value attribute (proper content, side menu, decoration…) –pcd:importance priority (low important content may be ignored or displayed in a smaller font)
34 Annotation Descriptions
35 Adaptation Engine Run on an intermediary server called WBIWBI Flow chart –Upon receipt of the request from a client browser, an original page is retrieved for the first time from a content server. –The editor component of the plugin tries to find the locations of annotation files: If it is specified in a link element in an HTML header section, retrieve the designated annotation file. Lookup in a table for the mapping between an URL of the original page and that of an annotation. If it is found, retrieve the designated annotation file. Otherwise, the original page is returned as it is and the session is terminated.
36 Adaptation Engine (Cont.) Flow Chart (Cont.) –The generator component of the plugin generates a current page to be returned. Taking account of client capabilities included in an HTML request header, the generator extracts a portion of a document object tree and returns a sub-tree to the client
37 Adaptation Engine – System Flow
38 Application to Real-Life HTML Pages The Web page used as an example is a news page from a corporate Web site The news page consists of three tables stacked from top to bottom. –The top and middle tables correspond respectively to a header menu and a search form. –The bottom table is used for layouting.
39 Layout of A Real-Life News Page
40 Annotations for Splitting the News Page
41 Annotation for fragmentation of an actual news page
42 Screen copy of a small display preview on an authoring tool
43 Comparison of display contents on a small-screen device
44 Splitting Result The page splitting not only reduces the content to be delivered, but also places the primary content near the top of the fragmented page that is provides with navigational features –Placing navigational features (menu bars etc.) near the top of pages –Placing key information at the top of pages –Reducing the amount of information on the page page fragmentation based on semantic annotation will be more appropriate than page transformation done by solely syntactic information (removing white spaces, shrinking or removing images…) –Semantic rearrangement is one of the critical limitations of the syntactic transformation approach. –The navigational features achieved by this semantic annotation are noteworthy from the perspective of Web content accessibility.
45 Issues Consistency between an Original Document and Its Annotation –Necessary to provide a way of keeping them synchronized Extensibility –Custom-tailored transcoding module that runs without any external meta-information. –Using a general-purpose transformation engine, such as XSLT, which employs externally provided transformation rules –Task-specific semantics Roles such as header, auxiliary, and layouter supplement semantics that cannot be fully prescribed in the definitions of Web document
46 Comparison of transcoding approaches in terms of extensibility