Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
6/2/ An Automatic Personalized Context- Aware Event Notification System for Mobile Users George Lee User Context-based Service Control Group Network.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Design and Implementation of a Server Director Project for the LCCN Lab at the Technion.
Caching And Prefetching For Web Content Distribution Presented By:- Harpreet Singh Sidong Zeng ECE Fall 2007.
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Samhaa R. El-Beltagy, Wendy Hall, David De Roure, and Leslie Carr Intelligence, Agents, Multimedia Department of Electronics and Computer Science University.
Master Thesis Defense Jan Fiedler 04/17/98
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Enabling Peer-to-Peer SDP in an Agent Environment University of Maryland Baltimore County USA.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Algorithmic Detection of Semantic Similarity WWW 2005.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
08 - StructuralCSC4071 Structural Patterns concerned with how classes and objects are composed to form larger structures –Adapter interface converter Bridge.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736.
Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.
GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Cloud based linked data platform for Structural Engineering Experiment
PROTEAN: A Scalable Architecture for Active Networks
Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Data collection methodology and NM paradigms
Database Systems Instructor Name: Lecture-3.
EE 122: Lecture 22 (Overlay Networks)
Web Mining Research: A Survey
Presentation transcript:

Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON

Outline ► Motivations and Proxies ► Documents indexation ► Temperature of documents ► Collaboration schema and architecture ► Results, evaluation and discussion ► Conclusion

Sharing information/Sharing usage ► Information is disseminated ► The volume of information is huge  How find my way in the jungle of the IS ? ► Many possible solutions : search engines, agents, ontologies... ► A solution to be explored : help from/collaboration with other users

Making users share usages ►... Is an issue that has been addressed for a long time : proxies server proxy users

► Proxies allow  reducing the response time  reducing the server load  reducing the network load ► Proxies can be located close to the server and/or close to users ► Proxies can collaborate (hierarchical or "flat" collaboration) ► Proxy management policies are based on operational (LRU/MFU-like) information

Motivations ► Users are generally interested in some concerns ► User caches contain related documents ► Metadata, user profiles, virtual communities, hot topics can provide proxies with semantic and contextual information about the queries they have to serve

monitoring this semantic and contextual information to : ► optimize proxy management policies and proxy communication policies ► allow users to share usages ► give users a personalized view of the web information space Proposition

► Proposition : use collaborative proxies to :  improve performances (basic)  act as forum and mediators for helping users share usage information ► Assumptions :  proxies do not share rough data but documents that hold information which can be described by metadata (descriptors)  users are not isolated nor autistic : they share some common interest or experience or objective or behavior (virtual communities)  information and topics of interest evolve rapidly : "hot" topics

From proxies to adaptive indexes ► The (present + past) content of a proxy de facto provides a view over the global information system ► This view has some real added value ► Examples :  what teaching materials about Java are the most accessed ?  are there some news about football ?  what correlated documents people who once read this document have accessed after ?

Document indexation ► indexing tree : an "ontology" of the web space ► difficulty to find one ! ► « Yahoo » like

How the indexation is performed ? ► analyzes the content of the document…  Title  Meta-tags (Content, Keywords, …)  Links  Formatting (header, bold face, outline) ► … to extract keywords ► Keywords are analyzed to find related concepts ► mapping is realized from concepts to ontology

Weighted indexing tree ► Edges between concepts (ancestors and children) are weighted ► The weight relates to the probability of a request for a document located under the child node to be next requested after a document under the parent node in the hierarchy was requested. ► It is the “correlation” (in terms of access patterns) between the target node and its “brothers”

Weighted tree for instance, one interested in baseball is more likely to be interested by soccer than skiing (subject of discuss)

Notion of Temperature ► documents are assigned a temperature related to their « hotness » : a more a document is accessed, the higher its temperature ► cache replacement policy uses the temperature of documents : cooler documents are first suppressed from the cache; prefetching uses the hottest documents

Temperature ► Represents the probability for a document to be accessed in the near future ► It is the synthesis between the number of requests for a document in the last time interval and the semantic links represented by the data structure. ► A temperature value is also associated to internal nodes of the data structure.

Temperature computation ► Temperature computation occurs at regular requests intervals ► The number of accesses to each document between two consecutive computations is stored in an access table.  if a document has been accessed since the last temperature computation, its temperature increases of the corresponding value in the table and this value is stored in a stack for future cooling  otherwise, it decreases

Temperature propagation up the data structure ► The temperature variation (  ) for each document is diffused along the edges of the data structure. ► More precisely, for each (document, concept) couple where there exists an edge of weight W between document and concept, the temperature of concept increases or decreases by W *  ► The concept temperature variation may be further diffused to its parent node (with a given threshold).

Example :   for document 1 : +3  T1 Temperature variation for Soccer (from  T1) :  s  s = 3*70% = 2.1 Temperature variation for Sports = 2.1 * 40% = 0.84 Temperature variation for Recreation and Sports = 0.84*15% = [stops here if threshold is 0.5]

Temperature retropropagation down the data structure ► Temperature is diffused from concepts down to documents ► each document under a concept that has seen its temperature modified sees its temperature modified ► even « non-accessed » documents might see their temperature increase

Example : Temperature variation for Games concept = *15% = Temperature variation for Baseball = 0.84*40% = Temperature variation for Document 2 = 2.1*50%= 1.05 Temperature variation for Document 3 = 2.1*60%= 1.26 In fact, one upward phase for all documents, then a downward phase for all concepts

Document – Concept link (precision) ► When a document is related to two concepts, we duplicate its node and link the two created nodes to the two related concepts. ► Otherwise, with only one node, problem with the temperature variation propagation among non related documents (by rebound)

22 A distributed collaborative architecture

Proxy architecture Index Query processing Server/proxy connection Profile Cache Client Connection Temperature

Navigator cache vs user proxy ► Navigator "local caches" are basic and cannot communicate ► Implementing true communicating proxies at the navigator/user level allows :  reducing the intermediate proxy load  optimizing the network traffic  reducing the response time  managing the user profile  counting document hits  customizing semantic and contextual information

From proxies to virtual communities ► User profile : topics of interest ► Virtual community = users with similar profile ► Virtual communities could be used for :  monitoring the document usage  associating proxies with specific communities  providing users with pertinent information about the content of proxy caches  monitoring the evolution of the topics of interest  sharing experiences and optimizing queries

Collaboration and communities ► Subscription : manual and static to evolve to dynamic and automatic ► Relationships between the user proxy and the aggregate proxies in charge of the community :  to find in another user proxy a requested document  to see the most accessed documents in the community ► The proxy organization must reflect the community structure and usages

Prototype ► Java ► Indexation tree limited to 2 or 3 levels of Yahoo! ► Matching done only with keywords (being or not in the indexing tree) and not with concepts ► Interfaced with ThoughtTreasure (a french- english Wordnet) for keywords not in the indexing tree

Evaluation ► temperature notion already proved efficient for video archives caching (hit rate) ► small scale experiments of the proxy-web architecture proved to be robust ► indexation is working well (more than 90% of documents indexed) ► difficulties related to the necessity to handle contents of web pages to test the behavior

► Enhancing the integration of distributed information systems or servers into a global service by the means of collaborative proxies ► Management and collaboration based on semantic and contextual information  temperature ► Performance improvement ► Virtual communities ► Attachment of a proxy to each user

Future works ► test the prototype on a large scale : design a test platform ! ► push the intermediate cache management to the heart of the networks (active router) ► enhance the indexation algorithm ► apply the technology to Grid computing (cache management)