Challenges in Web Document Summarization: Some Myths and Reality A. Rahman H. Alam Document Analysis and Recognition Team (DART) BCL Computers Inc. Santa.

Slides:



Advertisements
Similar presentations
HTML for Bloggers and Content Managers Presented by Purple Pen Productions.
Advertisements

HTML Basics Customizing your site using the basics of HTML.
Layouts Using Tables Web Design – Section 4-5 Part or all of this lesson was adapted from the University of Washingtons Web Design & Development I Course.
HTML popo.
UI Best Practices Application Developer’s Intro School Week 1 Day 1.
HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
Authoring Languages and Web Authoring Software 4.01 Examine web page development and design.
Principles of Web Design 5 th Edition Chapter Nine Site Navigation.
LAYOUT OF PAGE ELEMENTS September 28 th, PATTERNS Common ways to use the Layout Elements of Visual Hierarchy, Visual Flow, Grouping and Alignment,
Chapter 2 Web Site Design Principles Principles of Web Design, Third Edition.
Web Clipping Presentation By: Alex Jacobs, Philip Kim, Nathan Po Web Clipping.
Chapter 2 Web Site Design Principles Principles of Web Design, 4 th Edition.
Assuming Accurate Layout Information for Web Documents is Available, What Now? Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman, Yuliya Tarnikova.
Glencoe Digital Communication Tools Create a Web Page with HTML Chapter Contents Lesson 4.1Lesson 4.1 Get Started with HTML (85) Lesson 4.2Lesson 4.2 Format.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
Typography Web Design Professor Frank. Characteristics of Type on the Web Magazine/book typography – 1200 dpi Computer screens – 85 ppi (maximum)
HTML Code. What we will cover Basic HTML Body Font Images Hyperlinks Tables Frames.
Chapter 14 Introduction to HTML
Unit 2, Lesson 5 Website Development Tools AOIT Web Design Copyright © 2008–2012 National Academy Foundation. All rights reserved.
UNDERSTANDING WEB AND WEB PROJECT PLANNING AND DESIGNING AND EFFECTIVE WEBSITE Garni Dadaian.
The Internet as a Publishing Channel Teppo Räisänen LIIKE/OAMK.
HTML and Designing Web Pages. u At its creation, the web was all about –Web pages were clumsily assembled –Web sites were accumulations of hyperlinked.
An Introduction to WAP/WML. What is WAP? WAP stands for Wireless Application Protocol. WAP is for handheld devices such as mobile phones. WAP is designed.
What is Web Design?  Web design is the creation of a Web page using hypertext or hypermedia to be viewed on the World Wide Web.
Chapter 12: The Internet The ultimate direct. Internet Facts U.S. firms spend $14.7 billion on Internet advertising in 2005 By 2010, they are expected.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Orion Project Proposal HTML Tutorial Website. Define.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Week 2 Web Site Design Principles. 2 Design for the Computer Medium Craft the look and feel Make your design portable Design for low bandwidth Plan for.
By Amisha Pardasani. Contents Introduction to Wireless Application Protocol Introduction to Wireless Markup Language WML Formatting Links and Images Input.
HTML HTML stands for "Hyper Text Mark-up Language“. Technically, HTML is not a programming language, but rather a markup language. Used to create web pages.
Assuming Accurate Layout Information is Available: How do we Interpret the Content Flow in HTML Documents? Hassan Alam and Fuad Rahman Human Computer Interaction.
Content Extraction from HTML Documents A. Rahman H. Alam R. Hartono Document Analysis and Recognition Team (DART) BCL Computers Inc. Santa Clara, Calif,
Chapter 2 Web Site Design Principles
Web Site Design Principles
HTML Introduction Thane Terrill Summer 1998 July 1998Thane B. Terrill The Internet The Internet is world-wide system of inter-connected computer systems.
Web Accessiblity Carol Gordon SIU Medical Library.
CSCI 1101 Intro to Computers 7.1 Learning HTML. 2 Introduction Web pages are written using HTML Two key concepts of HTML are:  Hypertext (links Web pages.
Just A Few More Fun Objectives 1 Having Some Fun With Java Script 2 Using Style Sheets.
Everything in it’s right place Revisiting website accessibility Jeff Coburn Senior Web Specialist Institute for Community Inclusion.
Generating HTML Format Reports for Travel Demand Models May 18, 2009 Chunyu Lu Gannett Fleming, Inc.
Design Principles for the Web Lavanya Koppaka. Why follow design principles? Structure the information being presented Increase the readability Ease of.
Learning HTML Presented By: Wayne Helle What Is HTML? Learning Basic Tags... Formating Your Text... Working With Images and Links... Simple Form Boxes..
Html Tables Basic Table Markup. How Tables are Used For Data Display Tables were originally designed to display and organize tabular data (charts, statistics,
Chapter 2 Web Site Design Principles Principles of Web Design, Third Edition.
ECA 228 Internet/Intranet Design I Intro to Markup.
Designing web pages for handheld mobile devices Improving the client experience.
Chapter 2 Web Site Design Principles Principles of Web Design, 4 th Edition.
Understanding the Flow of Content in Summarizing HTML Documents A. Rahman H. Alam R. Hartono Document Analysis and Recognition Team (DART) BCL Computers.
Use CSS to Implement a Reusable Design Selecting a Dreamweaver CSS Starter Layout is the easiest way to create a page with a CSS layout You can access.
Web Application Programming Presented by: Mehwish Shafiq.
DIGITAL DESIGN Digital Design is the art and process of creating a single Web page or entire Web sites and may involve both the aesthetics and the mechanics.
1 © Netskills Quality Internet Training, University of Newcastle Using Style Sheets in Dreamweaver CS © Netskills, Quality Internet Training, University.
Introduction to HTML. _______________________________________________________________________________________________________________ 2 Outline Key issues.
Presentation On HTML & Podcast Done by: Shamelia Young & Sheriece Williamson.
Web Page Design 1 Information Technology ClassAct SRS enabled. Web Page Design This presentation will explore: creating web pages structure, formatting.
The Good, the Bad & the Ugly: Style and design in Website creation Chris Webster: Information Officer and Website Manager at the EARL Consortium for Public.
Writing Your Own Web Page: Using HTML and FrontPage Chapter 10.
Cascading Style Sheets (CSS) EXPLORING COMPUTER SCIENCE – LESSON 3-5.
WEB ACCESSABILITY Web Accessibility in Reality. List of Content Background –What is the issue? Moving on –How can me learn more? Some QuickTips –What.
Human Computer Interaction: World Wide Web Rebecca W. Boren, Ph.D. Introduction to Human Factors & Ergonomics Engineering IEE 437/547 November 2, 2011.
Major Responsive Design Problems and Solutions -By webresponsivedesigns.comwebresponsivedesigns.com.
W eb Document Manipulation for Small Screen Devices: A Review Hassan Alam, and Fuad Rahman Human Computer Interaction Group BCL Technologies Inc. Santa.
Web Accessibility. Why accessibility? "The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect."
Chapter 2 Web Site Design Principles
Features of Authoring Tools
Introduction to JavaScript
Teaching slides Chapter 6.
Chapter 2 Web Site Design Principles
Presentation transcript:

Challenges in Web Document Summarization: Some Myths and Reality A. Rahman H. Alam Document Analysis and Recognition Team (DART) BCL Computers Inc. Santa Clara, Calif, USA

Basic Problem Statement What are web based documents? What is summarization? Textual summarization vs. content summarization What myths do we have about summarization? What is the reality?

Why Summarization? Display area of handheld devices i.e. PDAs and Cell phones is too small for useful web browsing Download times is still too slow for comfortable browsing using wireless devices Cost factor is still too high

Where is the Money? 1.2 billion web pages 2 hours/site to adapt an existing page for wireless, it will take 2.4 billion work-hours At $20 per hour is assumed, this effort requires an investment of around $50 billion

Current need? Viewing website using small screen handheld devices Since web sites are written using HTML codes, we need to translate these to systems that the wireless devices can support.

Myths Web summarization is easy No scanning No image processing No Word or character level recognition HTML has structural elements Already in electronic formats

Current Solutions Handcrafting: –Custom Web Sites are typically crafted by hand by a set of content experts Transcoding: –Thranscoding replaces HTML tags with suitable device specific tags (HDML, WML etc)

Handcrafting Take an existing website and make it available to wireless access. Aether Systems, Mshift and 2Roam currently offer these types of solutions. Use a proprietary graphical interface to ease the development of wireless applications from scratch. Covigo and iConverse offer these type of solutions. Let the user do all coding in languages such as C++ or Java. ThinAirApps offers this type of solution.

Handcrafting Labor intensive Expensive. Typically less than 1% of a web site gets converted to wireless content.

Transcoding Transcoding was introduced in Japan during It was widely rejected by the Japanese users. Recently, Google and Pixo introduced this solution for the US market, but have so far failed to attract attention of end users.

The Alternate Solution Separate the content into smaller segments Generate a summary of these segments Prioritize these summaries from individual segments Put together to form a summary of the overall document

Summarization vs. Transcoding Long displays Long download times Finding information difficult No mapping of the importance of content in the original document

Steps to Summarization Segmentation – A tree Problems –Tables –Frames –Java Script –Graphics –Other Artifacts –Over segmentation –Under segmentation –Poor coding –Browsers are too good! Ccontent CTable CRow CCol etc….. CTable Etc…

Steps to Summarization Labeling –Main Story –Links –Navigation Bars –Advertisement Bars –Other Stories –Forms –Images Visual cues Size of font Headlines Boldness Color Links, Flashing Italic (I) Emphasized Underlines. Problems Graphics OCR Java scripts CSS

Steps to Summarization Labeling => Segment Summary: Extraction of a low level summary of the segment Priority: Estimating importance of these segments Table of Content (TOC) => Document Summary: Putting together a summary of the document

Conclusion Content can be used effectively to summarize web documents Content summarization is more complex than textual summarization HTML structure is a good starting point, but not enough to understand context Summarization offers significant advantages over transcoding Summarization also helps in faster browsing experience There is a lot of money in this!