Understanding the Flow of Content in Summarizing HTML Documents A. Rahman H. Alam R. Hartono Document Analysis and Recognition Team (DART) BCL Computers.

Slides:



Advertisements
Similar presentations
Layouts Using Tables Web Design – Section 4-5 Part or all of this lesson was adapted from the University of Washingtons Web Design & Development I Course.
Advertisements

Authoring Languages and Web Authoring Software 4.01 Examine web page development and design.
Java.  Java is an object-oriented programming language.  Java is important to us because Android programming uses Java.  However, Java is much more.
Web Programming Presentation on: Flash. How Flash Came to Be Created by Jonathan Gay, current VP of Flash and Generator at Macromedia Created by Jonathan.
Chapter Concepts Review Markup Languages
Xiaobin Zheng April 13 th, Outline Mobile search Mobile Web Types of services Case Study: Google Search for mobile Yahoo! Search for mobile Conclusion.
E-commerce and Information Technology in Hospitality and Tourism Chapter 3 Connecting to the World Copyright 2004 by Zongqing Zhou, PhD Niagara University.
Discovering Computers Fundamentals, 2011 Edition Living in a Digital World.
LYU0001 Wireless-based Mobile E-Commerce on the Web Supervisor: Prof. Michael R. Lyu Prepared By: Wat Hong Fai, Tony Yan Wai Keung, Harris.
Web Clipping Presentation By: Alex Jacobs, Philip Kim, Nathan Po Web Clipping.
Objectives Overview Define system development and list the system development phases Identify the guidelines for system development Discuss the importance.
Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova.
System Integration (Cont.) Week 7 – Lecture 2. Approaches Information transfer –Interface –Database replication –Data federation Business process integration.
October 16, 2007HighEdWebDev2007 Single Source Website for Full Spectrum Access Rick Ells University of Washington
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
UNDERSTANDING WEB AND WEB PROJECT PLANNING AND DESIGNING AND EFFECTIVE WEBSITE Garni Dadaian.
Design of Handheld Devices
An Introduction to WAP/WML. What is WAP? WAP stands for Wireless Application Protocol. WAP is for handheld devices such as mobile phones. WAP is designed.
And Mobile Web Browsers
Chapter 4 Computer Software.
Web Document Analysis: How can Natural Language Processing Help in Determining Correct Content Flow? Hassan Alam, Fuad Rahman and Yuliya Tarnikova Human.
Different ways to implement CSS. There are four different ways to use CSS in your web pages: – Inline CSS – Embedded CSS/Internal CSS – Linked CSS/External.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Software development. Chapter 1 – What is software development?
COMPUTER SOFTWARE Section 2 “System Software: Computer System Management ” CHAPTER 4 Lecture-6/ T. Nouf Almujally 1.
ULI101 – XHTML Basics (Part II) What is Markup Language? XHTML vs. HTML General XHTML Rules Block Level XHTML Tags XHTML Validation.
Lectured By: Vivek Dimri Assistant Professor, CSE Dept. SET, Sharda University, Gr. Noida.
New Technologies Wireless Communication Really Personal Computers Network Object-Oriented Processing The Changing Internet The Next Big Thing.
Pervasive e-commerce with XML Babak Esfandiari Carleton University Ottawa, Canada.
Assuming Accurate Layout Information is Available: How do we Interpret the Content Flow in HTML Documents? Hassan Alam and Fuad Rahman Human Computer Interaction.
CIS 375—Web App Dev II WAP. 2 Introduction to WAP WAP ________________________ is an application communication protocol that uses a ______ Browser in.
What are Webservices?. Web Services  What are Web Services?  Examine important Web Services acronyms (UDDI, SOAP, XML and WSDL)  What are the benefits.
XBRL eXtensible Business Reporting Language By: Jeff Elston Jake White and Garrett Allen.
Content Extraction from HTML Documents A. Rahman H. Alam R. Hartono Document Analysis and Recognition Team (DART) BCL Computers Inc. Santa Clara, Calif,
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 2 Web Site Design Principles
Web Site Design Principles
XML BIS4430 – unit 10. XML Origins Extensible Markup Language (XML) 1998 Inspired by Standard Generalized Markup Language (SGML) and HTML. SGML defines.
XML Basics Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Extensible Meta Language Markup Language.
Midterm Review WEB DESIGN. FLASH What is Flash? –Flash is a multimedia graphics program specifically for use on the web –Flash enables you to create interactive.
HTML and XML Behind Web Authoring Tools. 2 Objectives Introduce HTML Learn HTML Step by step Introduce XML.
Browsing MITA Seminar 2003 Mikko Pohja & Alessandro Cogliati.
Internet Web Publishing III. Intro to Cascading Style Sheets Patricia Roberts.
Challenges in Web Document Summarization: Some Myths and Reality A. Rahman H. Alam Document Analysis and Recognition Team (DART) BCL Computers Inc. Santa.
Generating HTML Format Reports for Travel Demand Models May 18, 2009 Chunyu Lu Gannett Fleming, Inc.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
FYP: LYU0001 Wireless-based Mobile E-Commerce on the Web Supervisor: Prof. Michael R. Lyu By: Tony, Wat Hong Fai Harris, Yan Wai Keung.
Overview of HTML and XML. Contents n History n Usage n Examples n Advantages n Disadvantages.
McLean HIGHER COMPUTER NETWORKING Lesson 6 Types of Browsers & WAP Explanation of browser functions Wireless access to the Internet Description of.
XML:Managing data exchange. 2 Central problems of data management Capture Storage Retrieval Exchange.
Introduction to XML By Manzur Ashraf (Shovon) Dept. of Computer Science & Engineering (BUET)
Java for Mobile Phones Alexandr Koloskov Lead Developer Reaxion, Corp. Copyright 2001 © Reaxion, Corp.
CS562 Advanced Java and Internet Application Introduction to the Computer Warehouse Web Application. Java Server Pages (JSP) Technology. By Team Alpha.
Chapter 12 Information Systems and Program Development Discovering Computers Technology in a World of Computers, Mobile Devices, and the Internet.
XML and E-Commerce What is XML? XML means “Extensible Markup Language” extensible - not fixed format like HTML Enables you to define your own customized.
Introduction to HTML. _______________________________________________________________________________________________________________ 2 Outline Key issues.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
Introduction to Mobile Applications. Wireless Applications Personal Time and KnowledgeManagemnt Personal Health & Security PersonalNavigation Remote Monitoring.
Cascading Style Sheets (CSS) EXPLORING COMPUTER SCIENCE – LESSON 3-5.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
UNIT 14 1 Websites. Starter 2 1 Starter 1 Divide the students into groups. Ask them to make lists. Ask Students to read their lists. Discuss the most.
The Brenkoweb provides the excellent online programming tutorial for the programmer in various languages like as PHP, SQL, HTML, ASP, Javascript,
W eb Document Manipulation for Small Screen Devices: A Review Hassan Alam, and Fuad Rahman Human Computer Interaction Group BCL Technologies Inc. Santa.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
CHAPTER ELEVEN Information System Development and Programming Languages Copyright © Cengage Learning. All rights reserved.
Revolutionary Wireless Internet Technology
Week 01 Comp 7780 – Class Overview.
Code Expert-Web design & Development Product by: Codexoxo Source:
Presentation transcript:

Understanding the Flow of Content in Summarizing HTML Documents A. Rahman H. Alam R. Hartono Document Analysis and Recognition Team (DART) BCL Computers Inc. Santa Clara, Calif, USA

Basic Problem Statement How do we summarize web based documents? Does HTML structure gives us any clue to the understanding of the content? Does flow of content has anything to do with the main message?

Why Summarization? Display area of handheld devices i.e. PDAs and Cell phones is too small for useful web browsing Download times is still too slow for comfortable browsing using wireless devices Cost factor is still too high

Current need? Viewing website using small screen handheld devices Since web sites are written using HTML codes, we need to translate these to systems that the wireless devices can support.

Current Solutions Handcrafting: –Custom Web Sites are typically crafted by hand by a set of content experts Transcoding: –Thranscoding replaces HTML tags with suitable device specific tags (HDML, WML etc)

Handcrafting Automation –Use of XML. There is no standard XML tagset (Document Type Definition – DTD) in use by vendors. XML has been available to web designers for the last 10 years. Examination of websites shows little use of document structural elements. –Web masters see themselves as artists rather than programmers. –XML may meet the same fate as SGML, an earlier attempt to create structured documents.

Handcrafting Take an existing website and make it available to wireless access. Aether Systems, Mshift and 2Roam currently offer these types of solutions. Use a proprietary graphical interface to ease the development of wireless applications from scratch. Covigo and iConverse offer these type of solutions. Let the user do all coding in languages such as C++ or Java. ThinAirApps offers this type of solution.

Handcrafting Labor intensive Expensive. Typically less than 1% of a web site gets converted to wireless content.

Transcoding Transcoding was introduced in Japan during It was widely rejected by the Japanese users. Recently, Google and Pixo introduced this solution for the US market, but have so far failed to attract attention of end users.

The Alternate Solution Separate the content into smaller segments Generate a summary of these segments Prioritize these summaries from individual segments Put together to form a summary of the overall document

Summarization vs. Transcoding Long displays Long download times Finding information difficult No mapping of the importance of content in the original document

Steps to Summarization Structural analysis: Understanding the relationship of the various segments with the document Decomposition: Breakdown on these segments into operational units Contextual Analysis: Employment of context to revise the segmentation (Continued=>)

Steps to Summarization (Continued) Labeling => Segment Summary: Extraction of a low level summary of the segment Priority: Estimating importance of these segments Table of Content (TOC) => Document Summary: Putting together a summary of the document

Supported Devices and Formats PDAs (HTML3.2) Cell phones –USA/Europe: WAP –Japan iMode (NTT DoCoMo) J-Sky (J-Phone) EZWeb (KDDI)

Conclusion It is a good idea to use flow of content in understanding web documents Content can be used effectively to summarize web documents HTML structure is a good starting point, but not enough to understand context Summarization offers significant advantages over transcoding Summarization also helps in faster browsing experience