Download presentation
Presentation is loading. Please wait.
Published byMarilyn Fisher Modified over 8 years ago
1
Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON Jean-Marc.Pierson@insa-lyon.fr
2
Jean-Marc.Pierson@insa-lyon.fr2 Outline ► Motivations and Proxies ► Documents indexation ► Temperature of documents ► Collaboration schema and architecture ► Results, evaluation and discussion ► Conclusion
3
Jean-Marc.Pierson@insa-lyon.fr3 Sharing information/Sharing usage ► Information is disseminated ► The volume of information is huge How find my way in the jungle of the IS ? ► Many possible solutions : search engines, agents, ontologies... ► A solution to be explored : help from/collaboration with other users
4
Jean-Marc.Pierson@insa-lyon.fr4 Making users share usages ►... Is an issue that has been addressed for a long time : proxies server proxy users
5
Jean-Marc.Pierson@insa-lyon.fr5Proxies ► Proxies allow reducing the response time reducing the server load reducing the network load ► Proxies can be located close to the server and/or close to users ► Proxies can collaborate (hierarchical or "flat" collaboration) ► Proxy management policies are based on operational (LRU/MFU-like) information
6
Jean-Marc.Pierson@insa-lyon.fr6 Motivations ► Users are generally interested in some concerns ► User caches contain related documents ► Metadata, user profiles, virtual communities, hot topics can provide proxies with semantic and contextual information about the queries they have to serve
7
Jean-Marc.Pierson@insa-lyon.fr7 monitoring this semantic and contextual information to : ► optimize proxy management policies and proxy communication policies ► allow users to share usages ► give users a personalized view of the web information space Proposition
8
Jean-Marc.Pierson@insa-lyon.fr8 ► Proposition : use collaborative proxies to : improve performances (basic) act as forum and mediators for helping users share usage information ► Assumptions : proxies do not share rough data but documents that hold information which can be described by metadata (descriptors) users are not isolated nor autistic : they share some common interest or experience or objective or behavior (virtual communities) information and topics of interest evolve rapidly : "hot" topics
9
Jean-Marc.Pierson@insa-lyon.fr9 From proxies to adaptive indexes ► The (present + past) content of a proxy de facto provides a view over the global information system ► This view has some real added value ► Examples : what teaching materials about Java are the most accessed ? are there some news about football ? what correlated documents people who once read this document have accessed after ?
10
Jean-Marc.Pierson@insa-lyon.fr10 Document indexation ► indexing tree : an "ontology" of the web space ► difficulty to find one ! ► « Yahoo » like
11
Jean-Marc.Pierson@insa-lyon.fr11 How the indexation is performed ? ► analyzes the content of the document… Title Meta-tags (Content, Keywords, …) Links Formatting (header, bold face, outline) ► … to extract keywords ► Keywords are analyzed to find related concepts ► mapping is realized from concepts to ontology
12
Jean-Marc.Pierson@insa-lyon.fr12 Weighted indexing tree ► Edges between concepts (ancestors and children) are weighted ► The weight relates to the probability of a request for a document located under the child node to be next requested after a document under the parent node in the hierarchy was requested. ► It is the “correlation” (in terms of access patterns) between the target node and its “brothers”
13
Jean-Marc.Pierson@insa-lyon.fr13 Weighted tree for instance, one interested in baseball is more likely to be interested by soccer than skiing (subject of discuss)
14
Jean-Marc.Pierson@insa-lyon.fr14 Notion of Temperature ► documents are assigned a temperature related to their « hotness » : a more a document is accessed, the higher its temperature ► cache replacement policy uses the temperature of documents : cooler documents are first suppressed from the cache; prefetching uses the hottest documents
15
Jean-Marc.Pierson@insa-lyon.fr15 Temperature ► Represents the probability for a document to be accessed in the near future ► It is the synthesis between the number of requests for a document in the last time interval and the semantic links represented by the data structure. ► A temperature value is also associated to internal nodes of the data structure.
16
Jean-Marc.Pierson@insa-lyon.fr16 Temperature computation ► Temperature computation occurs at regular requests intervals ► The number of accesses to each document between two consecutive computations is stored in an access table. if a document has been accessed since the last temperature computation, its temperature increases of the corresponding value in the table and this value is stored in a stack for future cooling otherwise, it decreases
17
Jean-Marc.Pierson@insa-lyon.fr17 Temperature propagation up the data structure ► The temperature variation ( ) for each document is diffused along the edges of the data structure. ► More precisely, for each (document, concept) couple where there exists an edge of weight W between document and concept, the temperature of concept increases or decreases by W * ► The concept temperature variation may be further diffused to its parent node (with a given threshold).
18
Jean-Marc.Pierson@insa-lyon.fr18 Example : for document 1 : +3 T1 Temperature variation for Soccer (from T1) : s s = 3*70% = 2.1 Temperature variation for Sports = 2.1 * 40% = 0.84 Temperature variation for Recreation and Sports = 0.84*15% = 0.126 [stops here if threshold is 0.5]
19
Jean-Marc.Pierson@insa-lyon.fr19 Temperature retropropagation down the data structure ► Temperature is diffused from concepts down to documents ► each document under a concept that has seen its temperature modified sees its temperature modified ► even « non-accessed » documents might see their temperature increase
20
Jean-Marc.Pierson@insa-lyon.fr20 Example : Temperature variation for Games concept = +0.126*15% = 0.0189 Temperature variation for Baseball = 0.84*40% = 0.336 Temperature variation for Document 2 = 2.1*50%= 1.05 Temperature variation for Document 3 = 2.1*60%= 1.26 In fact, one upward phase for all documents, then a downward phase for all concepts +2.1 0.84 0.126
21
Jean-Marc.Pierson@insa-lyon.fr21 Document – Concept link (precision) ► When a document is related to two concepts, we duplicate its node and link the two created nodes to the two related concepts. ► Otherwise, with only one node, problem with the temperature variation propagation among non related documents (by rebound)
22
22 A distributed collaborative architecture
23
Jean-Marc.Pierson@insa-lyon.fr23 Proxy architecture Index Query processing Server/proxy connection Profile Cache Client Connection Temperature
24
Jean-Marc.Pierson@insa-lyon.fr24 Navigator cache vs user proxy ► Navigator "local caches" are basic and cannot communicate ► Implementing true communicating proxies at the navigator/user level allows : reducing the intermediate proxy load optimizing the network traffic reducing the response time managing the user profile counting document hits customizing semantic and contextual information
25
Jean-Marc.Pierson@insa-lyon.fr25 From proxies to virtual communities ► User profile : topics of interest ► Virtual community = users with similar profile ► Virtual communities could be used for : monitoring the document usage associating proxies with specific communities providing users with pertinent information about the content of proxy caches monitoring the evolution of the topics of interest sharing experiences and optimizing queries
26
Jean-Marc.Pierson@insa-lyon.fr26 Collaboration and communities ► Subscription : manual and static to evolve to dynamic and automatic ► Relationships between the user proxy and the aggregate proxies in charge of the community : to find in another user proxy a requested document to see the most accessed documents in the community ► The proxy organization must reflect the community structure and usages
27
Jean-Marc.Pierson@insa-lyon.fr27 Prototype ► Java ► Indexation tree limited to 2 or 3 levels of Yahoo! ► Matching done only with keywords (being or not in the indexing tree) and not with concepts ► Interfaced with ThoughtTreasure (a french- english Wordnet) for keywords not in the indexing tree
28
Jean-Marc.Pierson@insa-lyon.fr28 Evaluation ► temperature notion already proved efficient for video archives caching (hit rate) ► small scale experiments of the proxy-web architecture proved to be robust ► indexation is working well (more than 90% of documents indexed) ► difficulties related to the necessity to handle contents of web pages to test the behavior
29
Jean-Marc.Pierson@insa-lyon.fr29Conclusion ► Enhancing the integration of distributed information systems or servers into a global service by the means of collaborative proxies ► Management and collaboration based on semantic and contextual information temperature ► Performance improvement ► Virtual communities ► Attachment of a proxy to each user
30
Jean-Marc.Pierson@insa-lyon.fr30 Future works ► test the prototype on a large scale : design a test platform ! ► push the intermediate cache management to the heart of the networks (active router) ► enhance the indexation algorithm ► apply the technology to Grid computing (cache management)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.