Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova.

Slides:



Advertisements
Similar presentations
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Advertisements

TU/e technische universiteit eindhoven Hera: Development of Semantic Web Information Systems Geert-Jan Houben Peter Barna Flavius Frasincar Richard Vdovjak.
Visual Scripting of XML
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Team Submission 16 May 2005 Dominique Hazaël-Massieux, Dan Connolly Summarized by.
The Semantic Web. The Web Today Designed for Human to read Cannot express meaning Architecture: URL –Decentralized: Link structure Language: html.
The Acquisition and Sharing of Domain Knowledge Contained in Software with a Compliant SIK Architecture by Prof. dr. Vasile AVRAM Academy of Economic Studies.
XHTML, XForms, XML Events & Device Independence Based on W3C Specs as of April 2002 Marc Abrams
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
The Web of data with meaning... By Michael Griffiths.
TU/e technische universiteit eindhoven Hypermedia Presentation Adaptation on the Semantic Web Flavius Frasincar Geert-Jan Houben
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Discovering Computers Fundamentals, 2011 Edition Living in a Digital World.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
A Context-Based Mediation Approach to Compose Semantic Web Services Michael Mrissa, Chirine Ghedira, Djamal Benslimane, Zakaria Maamar, Florian Rosenberg,
Session 7 Page 11 ECE361 Engineering Practice Brainstorming, Trades, Evaluation, and Conceptual Capture.
Alternatives to Metadata IMT 589 February 25, 2006.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Web Document Analysis: How can Natural Language Processing Help in Determining Correct Content Flow? Hassan Alam, Fuad Rahman and Yuliya Tarnikova Human.
Module 3: Business Information Systems Chapter 11: Knowledge Management.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Dept. Computer Science, Korea Univ. Intelligent Information System Lab. XML clustering methods Sohn Jong-Soo Intelligent Information.
Assuming Accurate Layout Information is Available: How do we Interpret the Content Flow in HTML Documents? Hassan Alam and Fuad Rahman Human Computer Interaction.
Content Extraction from HTML Documents A. Rahman H. Alam R. Hartono Document Analysis and Recognition Team (DART) BCL Computers Inc. Santa Clara, Calif,
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Mihir Daptardar Software Engineering 577b Center for Systems and Software Engineering (CSSE) Viterbi School of Engineering 1.
Exploring a Hybrid of Support Vector Machines (SVMs) and a Heuristic Based System in Classifying Web Pages Santa Clara, California, USA Ahmad Rahman, Yuliya.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Query Processing In Multimedia Databases Dheeraj Kumar Mekala Devarasetty Bhanu Kiran.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Challenges in Web Document Summarization: Some Myths and Reality A. Rahman H. Alam Document Analysis and Recognition Team (DART) BCL Computers Inc. Santa.
Network Ontology Ramesh Subbaraman Soumya Sen UPENN, TCOM 799.
Semantic Information Assurance for Distributed Knowledge Management A Business Process Perspective Presented By: Syed Asif Raza Suraj Bista
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
Page 1 WWRF Briefing WG2-br2 · Kellerer/Arbanowski · · 03/2005 · WWRF13, Korea Stefan Arbanowski, Olaf Droegehorn, Wolfgang.
1 5 Nov 2002 Risto Pohjonen, Juha-Pekka Tolvanen MetaCase Consulting AUTOMATED PRODUCTION OF FAMILY MEMBERS: LESSONS LEARNED.
A Semantic-Web based Framework for Developing Applications to Improve Accessibility in the WWW Michail Salampasis Dept. of Informatics TEI of Thessaloniki.
Understanding the Flow of Content in Summarizing HTML Documents A. Rahman H. Alam R. Hartono Document Analysis and Recognition Team (DART) BCL Computers.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.
Introduction to XML By Manzur Ashraf (Shovon) Dept. of Computer Science & Engineering (BUET)
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
©2003 Paula Matuszek CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek (610)
Information Extractors Hassan A. Sleiman. Author Cuba Spain Lebanon.
W eb Document Manipulation for Small Screen Devices: A Review Hassan Alam, and Fuad Rahman Human Computer Interaction Group BCL Technologies Inc. Santa.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
Getting Started with HTML
StYLiD: Structured Information Sharing with User-defined Concepts
Multi-agent system for web services
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
How to publish in a format that enhances literature-based discovery?
Presentation transcript:

Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova and Che Wilcox Human Computer Interaction Group BCL Technologies Inc. Santa Clara, CA

Overview of the talk Web pages vs. document layout Why do we need layout information? Web page summarization for handheld devices The future: Marrying Ontology with XML Conclusion and Future Work

Related Work Handcrafting TranscodingAdaptive Re-authoring Handcrafting involves typically crafting web pages by hand by a set of content experts for device specific output. Transcoding replaces HTML tags with suitable device specific tags, such as HDML, WML and others. The research on web page re- authoring can explicitly use natural language processing or use non- NLP techniques.

Web Page Summarization for Handheld Devices Web Page Summarization for Handheld Devices Web Page Data Structure Content Analysis Content Processing for Re-authoring Verbatim Transcode Summarize Node Merging Representing the Complete Web page When to Summarize? Creating a label Creating a Summary

Web Page Summarization for Handheld Devices Web Page Summarization for Handheld Devices

The Future: Marrying Ontology with XML We assume that we have layout information for a web page What do we do then? How do we use this information? How do that information help us in getting better re-authoring solutions? We then define an ontology for that domain! We define an XML to code that information

To define an ontology for the domain of web pages What is Ontology and How do We Define it? Ontology is a specification of a conceptualization. Ontology establishes a joint terminology between members of a community of interest. These members can be human or automated agents. A list of elements Concept hierarchy Concept association Rules or axioms

A List of Elements in the Web Domain

Concept Hierarchy and so on…

Concept Association and so on…

Rules or Axioms and so on…

Web Page Summarization for Handheld Devices using Ontology Web Page Summarization for Handheld Devices using Ontology Web Page Data Structure Content Analysis Content Processing for Re-authoring Verbatim Transcode Summarize Node Merging Representing the Complete Web page When to Summarize? Creating a label Creating a Summary Output Level Decided Use Ontology to re- format the web page XML Structure Derived Device Specific Display

What is the Advantage of using Ontology? What is the Advantage of using Ontology? It improves the quality of the output in many ways. It becomes possible to capture the contextual relationship among various components within the document It leads to better understanding of the information contained within the document. This additional information can be used in other processes, such as document categorization and contextual search.

Future Work It is assumed that the future of mobile browsing lies in the adoption of semantic web technology. Before that realizes, the proposed approach offers a workable compromise to generate high fidelity re-authored web pages. This is an exploratory paper offering a specific pathway to the future of web page re-authoring provided accurate layout information is available. Currently, it is beyond the capability of any algorithm to achieve this level of accuracy. However, approximations to that accuracy are attainable and even practical. It will be interesting to discuss other possibilities in this space.

Conclusions Some ideas about how to produce better web page re-authoring solutions by using linguistic knowledge and ontology assuming accurate layout information for web pages is available. It is shown that such an approach will produce high quality intelligent summary for web pages allowing fast and efficient web browsing on small display handheld devices.