Using HTML Textual and Structural Data for Web Image Search Cheng Thao, Ethan Munson, Jim Dabrowski, Nikolas D. Bohne University of Wisconsin-Milwaukee.

Slides:



Advertisements
Similar presentations
Introduction to Web Design Lecture number:. Todays Aim: Introduction to Web-designing and how its done. Modelling websites in HTML.
Advertisements

HTML popo.
Learning HTML. > Title of page This is my first homepage. Tells Browser This is an HTML page Basic Tags Tells Browser End of HTML page Header information.
HyperText Markup Language (HTML). Introduction to HTML Hyper Text Markup Language HTML Example The structure of an HTML document Agenda.
CREATED BY : VIRAL M.KORADIYA. Anchor elements are defined by the element. The element accepts several attributes, but either the Name or HREF attribute.
WeB application development
HTML Creating Web pages. HTML Hyper Text Markup Language Not programming, but a markup language using tags to format text in Web browsers.
HTML Creating Web pages. HTML Hyper Text Markup Language Not programming, but a markup language using tags to format text in Web browsers.
HTML and XHTML Controlling the Display Of Web Content.
CIS101 Introduction to Computing Week 07. Agenda Your questions JavaScript text Resume project HTML Project Three This week online Next class.
 2008 Pearson Education, Inc. All rights reserved. 1 Introduction to HTML.
Creating Web Pages Getting Started. Overview What Web Pages Are How Web Pages are Formatted Putting Graphics on Web Pages How Web Pages are Linked Linking.
How to Create Top Ranking Searchable and Accessible Documents Chris Pollett and Elizabeth Tu April, 2010.
HTML: PART ONE. Creating an HTML Document  It is a good idea to plan out a web page before you start coding  Draw a planning sketch or create a sample.
Chapter 14 Introduction to HTML
HYPERTEXT MARKUP LANGUAGE (HTML) Vijaya K Pandey.
Computer Applications I Unit 3 Study Guide 1 Introduction to Formatting, Alignment and Page Setup.
Slide 1 Today you will: think about criteria for judging a website understand that an effective website will match the needs and interests of users use.
CpSc 462/662: Database Management Systems (DBMS) (TEXNH Approach) HTML Basics James Wang.
CS105 Introduction to Computer Concepts HTML
Creating a Simple Page: HTML Overview
HTML HTML stands for "Hyper Text Mark-up Language“. Technically, HTML is not a programming language, but rather a markup language. Used to create web pages.
Learning HTML. HTML Attributes HTML elements can have attributes Attributes provide additional information about an element Class – specifies a class.
Programming in HTML.  Programming Language  Used to design/create web pages  Hyper Text Markup Language  Markup Language  Series of Markup tags 
Essential Tags Web Design – Sec 3-3 Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development I” Course materials.
HTML Overview Part 4 – Tables 1. HTML Tables  Tables are defined with the tag pair.  A table is divided into rows with tag pairs. o tr stands for "table.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
1 CSC 121 Computers and Scientific Thinking David Reed Creighton University HTML and Web Pages.
Html Basic Codes Week Two. Start Your Text Editor Windows use 'Notepad’ Macintosh use 'Simple Text'
Programming in HTML.  Programming Language  Used to design/create web pages  Hyper Text Markup Language  Markup Language  Series of Markup tags 
Copyright © 2013 MyGraphicsLab / Pearson Education STRUCTURE AND HTML TAGS MyGraphicsLab: Adobe Dreamweaver CS6 ACA Certification Preparation for Web Communication.
A Basic Web Page. Chapter 2 Objectives HTML tags and elements Create a simple Web Page XHTML Line breaks and Paragraph divisions Basic HTML elements.
Just Enough HTML How to Create Basic HTML Documents.
INTRODUCTION. What is HTML? HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language,
1 What is HTML? Standardized codes Web pages SGML Descriptive markup Tags.
Chapter 8 Introduction to HTML and Applets Fundamentals of Java.
CS105 INTRODUCTION TO COMPUTER CONCEPTS HTML Instructor: Cuong (Charlie) Pham.
Copyright 2007, Information Builders. Slide 1 Understanding Basic HTML Amanda Regan Technical Director June, 2008.
 2008 Pearson Education, Inc. All rights reserved Introduction to XHTML.
HTML: Hyptertext Markup Language Doman’s Sections.
HTML Darby Tien-Hao Chang Department of Electrical Engineering National Cheng Kung University.
XHTML TAGS I Basic Tags. North Lake College 2 by Sean Griffin Sample XHTML Code.
CPSC 203 Introduction to Computers Lab 33 By Jie Gao.
Ali Alshowaish. What is HTML? HTML stands for Hyper Text Markup Language Specifically created to make World Wide Web pages Web authoring software language.
LEARNING HTML PowerPoint #1 Cyrus Saadat, Webmaster.
Basic Table Elements. 2 Objectives Define table elements Describe the steps used to plan, design, and code a table Create a borderless table with text.
Introduction to HTML UWWD. Agenda What do you need? What do you need? What are HTML, CSS, and tags? What are HTML, CSS, and tags? html, head, and body.
Spiderman ©Marvel Comics Creating Web Pages (part 1)
HTML CS 105. Page Structure HTML elements control the details of how a page gets displayed. Every HTML document has the following basic structure: … …
What is HTML? HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language, it is a markup.
Creating Tables in a Web Site HTML 4 Created by S. Cox.
Department of Computer Science, Florida State University CGS 3066: Web Programming and Design Spring
Introduction to Web Authoring Ellen Cushman /wra210.htm Class mtg. #2.
1999, COMPUTER SCIENCE, BUU Introduction to HTML Seree Chinodom
HTML. INDEX Introduction to HTML Creating Web Pages Commands And Tags Web Page.
1 R3 R1 R5 R4 R6 R2 B B A A Looking at the Code Under the View menu Select Source.
Introduction to Web Authoring Bill Hart-Davidson AIM: billhd30 Session 2
CIS101 Introduction to Computing Week 07 Spring 2004.
Lab 3 Html basics.
Elements of HTML Web Design – Sec 3-2
Elements of HTML Web Design – Sec 3-2
Uppingham Community College
Elements of HTML Web Design – Sec 3-2
HTML Vocabulary.
COMPUTING FUNDAMENTALS
Computers and Scientific Thinking David Reed, Creighton University
Multimedia Information Retrieval
An Introduction to HTML Pages
AN INTRODUCTION BY FAITH BRENNER
Presentation transcript:

Using HTML Textual and Structural Data for Web Image Search Cheng Thao, Ethan Munson, Jim Dabrowski, Nikolas D. Bohne University of Wisconsin-Milwaukee

Which image is George Bush or has George Bush?

Which images are similar to this image?

<IMG SRC=" alt="" BORDER=0 VSPACE=3 HSPACE=3> Bill Cosby <IMG SRC=" alt="" BORDER=0 VSPACE=3 HSPACE=3> Betty White <IMG SRC=" alt="" BORDER=0 VSPACE=3 HSPACE=3> Tom Brokaw <IMG SRC=" alt="" BORDER=0 VSPACE=3 HSPACE=3> Pres. George Bush <IMG SRC=" alt="" BORDER=0 VSPACE=3 HSPACE=3> Ed McMahon <IMG SRC=" alt="" BORDER=0 VSPACE=3 HSPACE=3> Bob Barker Does the HTML source tell which images is George Bush?

Introduction -image search is difficult - performance is slow - image identification is a complex, inaccurate task -most research on image search has emphasized analysis of image content -few Web image search engines - commercial: Alta Vista, Google - research: WebSeek -little research in textual image search

HTML overview -HTML document composed of: -head -title -meta -body -paragraph, -table, -text, -link, -image, …

Sample HTML HTML overiew first paragraph Simple Table Here is a photo of George Bush.

Previous work - Previous work - Yelena Tsymbalenko -studied HTML constructs and determine what can be used in image search. -found the following to be effective - title of the page - image filename - image alt attribute

Research Goals -What HTML features make good clues to the content of images? - Structural features (document, table) - File names or URLs - Formatting of material (bold, heading) - How can clues be combined into a single relevance rating?

Image Search Study Process -Downloading pages with matching text - Use existing search engine to identify matches - These pages provide a corpus of images - We download pages so that our corpus remains static -Download acts as a snap shot -Clue extraction -Analyze each page in corpus for all possible clues to image content -Human relevance ratings -human rates if an image is relevant to the query -Statistical analysis to find clue-based relevance functions

Downloading Software query URLs Process: Downloading Web Pages queries Search Engine Web Pages images Web pages and images are saved to local disk.

<query> George George Bush Bush </query><query> Bill Bill Clinton Clinton </query> Design: Queries in XML Multiple queries are stored in an XML file Engine: 1=Altavista, 2=Excite, 3=Hotbot, 4=Google Method: 1 =or, 2 = and, 3 = expression Search for George Bush using Alta Vista and must have all the words Search for Bill Clinton using Hotbot and search for exact expression

Process: Clue Extraction Extraction Software cluesqueries Clues Extraction Software clues

Data to be analyzed For each page –Query used to find page –Source URL –For each image Source URL Attributes Position in document –For each clue Whether clue feature occurs in document at all If feature occurs with text matching the query –Position in document for each occurrence

Relevant Rating Software Query & image Relevant/ not Process: Relevance Rating queries Human Presents images from each query to the user from the database, and record the human relevance rating back to the database.

Clues : global Global Clues - clues that apply every image on the page - filename of page - path of page - host of page - title element of the web page - keywords found in meta element - description found in meta element Why do we break the URL into three clues? Different parts of the URL contributes different relevance factor to the overall relevance of the image in that page.

Clues : global Apple <META NAME="keywords" CONTENT="Apple Computer, Power Macintosh, PowerBook, AppleWorks, WebObjects, iMovie, QuickTime, Desktop Movies, Software, Operating Systems, Mac OS, iMac, iBook"> <META NAME="Description" CONTENT="Visit for the latest news, the hottest products, and technical support resources from Apple Computer, Inc.">

Clues : image file Image file properties - external properties - filename - path - host An image can be from another host, and have different paths.

Clues : common attributes Elements have common attributes -title - describe what the element is -id- used in identifying the element -name- same as id, older HTML Clues that use these attributes: link, image, object, table, cell, row

Clues: Image Container Link to an image - text enclosed within the link element Embed image element -alt attribute (usually describes what the image is) Object element -text that enclosed within the object element

Clues: table Table ( ): - summary attribute - describes the table content - caption - describes table content - row heading - row - column heading - column - cell - neighboring cells (above, below, right, left)

Clues: table

Clues: headings Heading elements(h1, h2,..h6) - headings above image - headings below image header above image header below image Heading can indicate a topic and images below the heading maybe relate to the heading. Some use headings as caption above images, and sometimes below images. Some headers are used where fonts should be used or bold should be used.

In this photo, the heading comes after the image. Often if it is used as a topic, it usually comes before the image. But some images have heading as caption below the images.

Clues : text Emphasized text elements - bold - italic - underline - strong - emphasis - big Body text - text that surrounds the image - distance

Current Project Status - Prototype download and clue extraction software nearly complete - now testing implementation - data (without human relevance ratings) in early November - Recruiting students to build on-line relevance rating system - hope to get students outside lab to help with ratings via Web interface

Challenges for image search systems - computing word distance from image - Stylesheet used for presentation - table pattern - pattern of HTML elements usage - CGI returned images - structural boundaries -patterns in Web page design -HTML generators

Cheng Thao,