Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova.

Similar presentations


Presentation on theme: "Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova."— Presentation transcript:

1 Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova and Che Wilcox Human Computer Interaction Group BCL Technologies Inc. Santa Clara, CA 95050 www.bcltechnologies.com fuad@bcltechnologies.com www.bcltechnologies.com

2 Overview of the talk Web pages vs. document layout Why do we need layout information? Web page summarization for handheld devices The future: Marrying Ontology with XML Conclusion and Future Work

3 Related Work Handcrafting TranscodingAdaptive Re-authoring Handcrafting involves typically crafting web pages by hand by a set of content experts for device specific output. Transcoding replaces HTML tags with suitable device specific tags, such as HDML, WML and others. The research on web page re- authoring can explicitly use natural language processing or use non- NLP techniques.

4 Web Page Summarization for Handheld Devices Web Page Summarization for Handheld Devices Web Page Data Structure Content Analysis Content Processing for Re-authoring Verbatim Transcode Summarize Node Merging Representing the Complete Web page When to Summarize? Creating a label Creating a Summary

5 Web Page Summarization for Handheld Devices Web Page Summarization for Handheld Devices

6 The Future: Marrying Ontology with XML We assume that we have layout information for a web page What do we do then? How do we use this information? How do that information help us in getting better re-authoring solutions? We then define an ontology for that domain! We define an XML to code that information

7 To define an ontology for the domain of web pages What is Ontology and How do We Define it? Ontology is a specification of a conceptualization. Ontology establishes a joint terminology between members of a community of interest. These members can be human or automated agents. A list of elements Concept hierarchy Concept association Rules or axioms

8 A List of Elements in the Web Domain

9 Concept Hierarchy and so on…

10 Concept Association and so on…

11 Rules or Axioms and so on…

12 Web Page Summarization for Handheld Devices using Ontology Web Page Summarization for Handheld Devices using Ontology Web Page Data Structure Content Analysis Content Processing for Re-authoring Verbatim Transcode Summarize Node Merging Representing the Complete Web page When to Summarize? Creating a label Creating a Summary Output Level Decided Use Ontology to re- format the web page XML Structure Derived Device Specific Display

13 What is the Advantage of using Ontology? What is the Advantage of using Ontology? It improves the quality of the output in many ways. It becomes possible to capture the contextual relationship among various components within the document It leads to better understanding of the information contained within the document. This additional information can be used in other processes, such as document categorization and contextual search.

14 Future Work It is assumed that the future of mobile browsing lies in the adoption of semantic web technology. Before that realizes, the proposed approach offers a workable compromise to generate high fidelity re-authored web pages. This is an exploratory paper offering a specific pathway to the future of web page re-authoring provided accurate layout information is available. Currently, it is beyond the capability of any algorithm to achieve this level of accuracy. However, approximations to that accuracy are attainable and even practical. It will be interesting to discuss other possibilities in this space.

15 Conclusions Some ideas about how to produce better web page re-authoring solutions by using linguistic knowledge and ontology assuming accurate layout information for web pages is available. It is shown that such an approach will produce high quality intelligent summary for web pages allowing fast and efficient web browsing on small display handheld devices.


Download ppt "Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova."

Similar presentations


Ads by Google