Deriving Emergent Web Page Semantics D.V. Sreenath*, W.I. Grosky**, and F. Fotouhi* *Wayne State University **University of Michigan-Dearborn.

Slides:



Advertisements
Similar presentations
MediaView -- Towards a “Semantic” Multimedia Database Model
Advertisements

Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
ARNOLD SMEULDERS MARCEL WORRING SIMONE SANTINI AMARNATH GUPTA RAMESH JAIN PRESENTERS FATIH CAKIR MELIHCAN TURK Content-Based Image Retrieval at the End.
Video Summarization Using Mutual Reinforcement Principle and Shot Arrangement Patterns Lu Shi Oct. 4, 2004.
1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.
Relative and Absolute Relative Absolute.  In web-page design, a hyperlink (or link) is a reference to a document that the reader can directly follow,
Interfaces for Retrieval Results. Information Retrieval Activities Selecting a collection –Talked about last class –Lists, overviews, wizards, automatic.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
A. Frank Multimedia Multimedia/Video Search. 2 A. Frank Contents Multimedia (MM) and search/retrieval Text-based MM search in General SEs Text-based MM.
Web Mining Research: A Survey
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
Paper Summary of: Modelling Retrieval and Navigation in Context by: Massimo Melucci Ahmed A. AlNazer May 2008 ICS-542: Multimedia Computing – 072.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Visual Information Systems visual information retrieval.
Visual Information System visual information retrieval (VIR) Lilian Tang.
Overview of Web Data Mining and Applications Part I
Information Retrieval in Practice
Chapter 5 Application Software.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Multimedia Databases (MMDB)
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
CONCLUSION & FUTURE WORK Given a new user with an information gathering task consisting of document IDs and respective term vectors, this can be compared.
Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Relevance Feedback in Image Retrieval Systems: A Survey Part II Lin Luo, Tao Huang, Chengcui Zhang School of Computer Science Florida International University.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Search Engine Architecture
SINGULAR VALUE DECOMPOSITION (SVD)
BLISS Problem Statement Jonathan Rosenberg Cisco.
Algorithmic Detection of Semantic Similarity WWW 2005.
Flickr Tag Recommendation based on Collective Knowledge BÖrkur SigurbjÖnsson, Roelof van Zwol Yahoo! Research WWW Summarized and presented.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Similarity Access for Networked Media Connectivity Pavel Zezula Masaryk University Brno, Czech Republic.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
SAPIR Search in Audio-Visual Content using P2P Information Retrival For more information visit: Support.
Detecting Sequences and Cycles of Web Pages Narayan L. Bhamidipati and Sankar K. Pal Indian Statistical Institute Kolkata.
Natural Language Processing Topics in Information Retrieval August, 2002.
Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Cyber Security Research on Engineering Solutions Dr. Bhavani.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
WP5: Semantic Multimedia
Visual Information Retrieval
Connecting Interface Metaphors to Support Creation of Path-based Collections Unmil P. Karadkar, Andruid Kerne, Richard Furuta, Luis Francisco-Revilla,
Introduction Multimedia initial focus
Data and Applications Security Developments and Directions
Search Engine Architecture
Personalized Social Image Recommendation
Text & Web Mining 9/22/2018.
Multimedia Information Retrieval
Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Magnet & /facet Zheng Liang
Search Engine Architecture
Presentation transcript:

Deriving Emergent Web Page Semantics D.V. Sreenath*, W.I. Grosky**, and F. Fotouhi* *Wayne State University **University of Michigan-Dearborn

Semantics The semantics of a web page is potentially richer than can be defined by the page’s author(s) –Some semantics emerge through context –A multimedia document has multiple semantics through being placed in multiple contexts

Content-Based Retrieval Development of feature-based techniques for content-based retrieval is a mature area, at least for images CBR researchers should now concentrate on extracting semantics from multimedia documents so that retrievals using concept-based queries can be tailored to individual users –The semantic gap (Semi)-automated multimedia annotation

Multimedia Annotation(s) Multimedia annotations should be semantically rich –Multiple semantics This can be discovered by placing multimedia information in a natural, context-rich environment –A social theory based on how multimedia information is used

Context-Rich Environments Structural context – Author’s contribution –Document’s author places semantically similar pieces of information close to each other Dynamic context – User’s contribution –Short browsing sub-paths are semantically coherent

Context-Rich Environments The WEB is a perfect example of a context- rich environment Develop multimedia annotations through cross-modal techniques –Audio –Images –Text –Video

Goal Derive document semantics based on user browsing behavior –The same document has multiple semantics »Different people see different meanings in the same document –Over short browsing paths, an individual user’s wants and needs are uniform »The pages visited over these short paths exhibit semantics in congruence with these wants and needs

Questions How can the semantics of a web page be derived given a set of user browsing paths that end at that page? How can we characterize the semantics of a user browsing path? How can web page semantics help us in navigating the web more efficiently? How can our approach actually be implemented in the real web world?

Our Approach We use actual browsing paths to find the latent semantics of web pages –Textual features –Image features –Structural features We hope to find general concepts comprising various textual and image features which frequently co-occur

Semantic Coherence We believe that a user’s browsing path exhibits semantic coherence –While the user’s entire path exhibits multiple semantics, especially pages far from each other on the path, neighboring pages, especially the portions close to the links taken, are semantically close to each other

Semantic Break Points We would like to characterize the contiguous sub-paths of a user’s browsing path that exhibit similar semantics and detect the semantic break points along the path where the semantics appreciably change –Collect these sub-paths into a multiset

Web Page Semantics We categorize the semantics of each web page based on a history of the semantically-coherent browsing paths of all users which end at that page A browsing path will be represented by a high- dimensional vector The various positions of the vector correspond to the presence of –textual keywords –image features (visual keywords) –structural features (structural keywords)

Deriving Emergent Web Page Semantics From the complete set of web pages under consideration, we extract a set of textual, visual, and structural keywords For each multiset, M, of sub-paths that we are to analyze, we form three matrices –term-path matrix –image-path matrix –structure-path matrix

Deriving Emergent Web Page Semantics The (i,j) th element of these matrices are determined by –Strength of the presence of i th keyword along the j th browsing path »Determined by How many times this term occurs on the pages along the path How much time the user spends examining these pages How close each occurrence of the i th keyword is to both the outgoing and incoming anchor positions –How many times this browsing path occurs in M

Deriving Emergent Web Page Semantics These matrices may be concatenated together in various ways to produce an overall keyword-path matrix Perform latent-semantic analysis to get concepts A page is then represented by a set of concept classes

Architecture

Vantage Points

Local Iterative Technique

Bob Hope Path – Page 1

Bob Hope Path – Page 2

Bob Hope Path – Page 3

Bob Hope Path – Page 4

Bob Hope Path – Page 5

Bob Hope Path – Page 6

Bob Hope Path – Page 7

Bob Hope Path– Page 8

Bob Hope Path – Page 9

Vaudeville

Broadway

Radio

Troops

Experiment 1 – Paths/Paths Bob Hope Broadway Golf Movies Radio Troops Vaudeville

Experiment 2 – Paths/URLs Bob Hope Broadway Golf Movies Radio Troops Vaudeville

Experiment 3 – URLs/URLs Bob Hope Broadway Golf Movies Radio Troops Vaudeville

Issues Data capture – privacy issues Compute intensive SVD updating Dynamic content Constantly evolving websites