Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON

Similar presentations


Presentation on theme: "Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON"— Presentation transcript:

1 Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON Jean-Marc.Pierson@insa-lyon.fr

2 Jean-Marc.Pierson@insa-lyon.fr2 Outline ► Motivations and Proxies ► Documents indexation ► Temperature of documents ► Collaboration schema and architecture ► Results, evaluation and discussion ► Conclusion

3 Jean-Marc.Pierson@insa-lyon.fr3 Sharing information/Sharing usage ► Information is disseminated ► The volume of information is huge  How find my way in the jungle of the IS ? ► Many possible solutions : search engines, agents, ontologies... ► A solution to be explored : help from/collaboration with other users

4 Jean-Marc.Pierson@insa-lyon.fr4 Making users share usages ►... Is an issue that has been addressed for a long time : proxies server proxy users

5 Jean-Marc.Pierson@insa-lyon.fr5Proxies ► Proxies allow  reducing the response time  reducing the server load  reducing the network load ► Proxies can be located close to the server and/or close to users ► Proxies can collaborate (hierarchical or "flat" collaboration) ► Proxy management policies are based on operational (LRU/MFU-like) information

6 Jean-Marc.Pierson@insa-lyon.fr6 Motivations ► Users are generally interested in some concerns ► User caches contain related documents ► Metadata, user profiles, virtual communities, hot topics can provide proxies with semantic and contextual information about the queries they have to serve

7 Jean-Marc.Pierson@insa-lyon.fr7 monitoring this semantic and contextual information to : ► optimize proxy management policies and proxy communication policies ► allow users to share usages ► give users a personalized view of the web information space Proposition

8 Jean-Marc.Pierson@insa-lyon.fr8 ► Proposition : use collaborative proxies to :  improve performances (basic)  act as forum and mediators for helping users share usage information ► Assumptions :  proxies do not share rough data but documents that hold information which can be described by metadata (descriptors)  users are not isolated nor autistic : they share some common interest or experience or objective or behavior (virtual communities)  information and topics of interest evolve rapidly : "hot" topics

9 Jean-Marc.Pierson@insa-lyon.fr9 From proxies to adaptive indexes ► The (present + past) content of a proxy de facto provides a view over the global information system ► This view has some real added value ► Examples :  what teaching materials about Java are the most accessed ?  are there some news about football ?  what correlated documents people who once read this document have accessed after ?

10 Jean-Marc.Pierson@insa-lyon.fr10 Document indexation ► indexing tree : an "ontology" of the web space ► difficulty to find one ! ► « Yahoo » like

11 Jean-Marc.Pierson@insa-lyon.fr11 How the indexation is performed ? ► analyzes the content of the document…  Title  Meta-tags (Content, Keywords, …)  Links  Formatting (header, bold face, outline) ► … to extract keywords ► Keywords are analyzed to find related concepts ► mapping is realized from concepts to ontology

12 Jean-Marc.Pierson@insa-lyon.fr12 Weighted indexing tree ► Edges between concepts (ancestors and children) are weighted ► The weight relates to the probability of a request for a document located under the child node to be next requested after a document under the parent node in the hierarchy was requested. ► It is the “correlation” (in terms of access patterns) between the target node and its “brothers”

13 Jean-Marc.Pierson@insa-lyon.fr13 Weighted tree for instance, one interested in baseball is more likely to be interested by soccer than skiing (subject of discuss)

14 Jean-Marc.Pierson@insa-lyon.fr14 Notion of Temperature ► documents are assigned a temperature related to their « hotness » : a more a document is accessed, the higher its temperature ► cache replacement policy uses the temperature of documents : cooler documents are first suppressed from the cache; prefetching uses the hottest documents

15 Jean-Marc.Pierson@insa-lyon.fr15 Temperature ► Represents the probability for a document to be accessed in the near future ► It is the synthesis between the number of requests for a document in the last time interval and the semantic links represented by the data structure. ► A temperature value is also associated to internal nodes of the data structure.

16 Jean-Marc.Pierson@insa-lyon.fr16 Temperature computation ► Temperature computation occurs at regular requests intervals ► The number of accesses to each document between two consecutive computations is stored in an access table.  if a document has been accessed since the last temperature computation, its temperature increases of the corresponding value in the table and this value is stored in a stack for future cooling  otherwise, it decreases

17 Jean-Marc.Pierson@insa-lyon.fr17 Temperature propagation up the data structure ► The temperature variation (  ) for each document is diffused along the edges of the data structure. ► More precisely, for each (document, concept) couple where there exists an edge of weight W between document and concept, the temperature of concept increases or decreases by W *  ► The concept temperature variation may be further diffused to its parent node (with a given threshold).

18 Jean-Marc.Pierson@insa-lyon.fr18 Example :   for document 1 : +3  T1 Temperature variation for Soccer (from  T1) :  s  s = 3*70% = 2.1 Temperature variation for Sports = 2.1 * 40% = 0.84 Temperature variation for Recreation and Sports = 0.84*15% = 0.126 [stops here if threshold is 0.5]

19 Jean-Marc.Pierson@insa-lyon.fr19 Temperature retropropagation down the data structure ► Temperature is diffused from concepts down to documents ► each document under a concept that has seen its temperature modified sees its temperature modified ► even « non-accessed » documents might see their temperature increase

20 Jean-Marc.Pierson@insa-lyon.fr20 Example : Temperature variation for Games concept = +0.126*15% = 0.0189 Temperature variation for Baseball = 0.84*40% = 0.336 Temperature variation for Document 2 = 2.1*50%= 1.05 Temperature variation for Document 3 = 2.1*60%= 1.26 In fact, one upward phase for all documents, then a downward phase for all concepts +2.1 0.84 0.126

21 Jean-Marc.Pierson@insa-lyon.fr21 Document – Concept link (precision) ► When a document is related to two concepts, we duplicate its node and link the two created nodes to the two related concepts. ► Otherwise, with only one node, problem with the temperature variation propagation among non related documents (by rebound)

22 22 A distributed collaborative architecture

23 Jean-Marc.Pierson@insa-lyon.fr23 Proxy architecture Index Query processing Server/proxy connection Profile Cache Client Connection Temperature

24 Jean-Marc.Pierson@insa-lyon.fr24 Navigator cache vs user proxy ► Navigator "local caches" are basic and cannot communicate ► Implementing true communicating proxies at the navigator/user level allows :  reducing the intermediate proxy load  optimizing the network traffic  reducing the response time  managing the user profile  counting document hits  customizing semantic and contextual information

25 Jean-Marc.Pierson@insa-lyon.fr25 From proxies to virtual communities ► User profile : topics of interest ► Virtual community = users with similar profile ► Virtual communities could be used for :  monitoring the document usage  associating proxies with specific communities  providing users with pertinent information about the content of proxy caches  monitoring the evolution of the topics of interest  sharing experiences and optimizing queries

26 Jean-Marc.Pierson@insa-lyon.fr26 Collaboration and communities ► Subscription : manual and static to evolve to dynamic and automatic ► Relationships between the user proxy and the aggregate proxies in charge of the community :  to find in another user proxy a requested document  to see the most accessed documents in the community ► The proxy organization must reflect the community structure and usages

27 Jean-Marc.Pierson@insa-lyon.fr27 Prototype ► Java ► Indexation tree limited to 2 or 3 levels of Yahoo! ► Matching done only with keywords (being or not in the indexing tree) and not with concepts ► Interfaced with ThoughtTreasure (a french- english Wordnet) for keywords not in the indexing tree

28 Jean-Marc.Pierson@insa-lyon.fr28 Evaluation ► temperature notion already proved efficient for video archives caching (hit rate) ► small scale experiments of the proxy-web architecture proved to be robust ► indexation is working well (more than 90% of documents indexed) ► difficulties related to the necessity to handle contents of web pages to test the behavior

29 Jean-Marc.Pierson@insa-lyon.fr29Conclusion ► Enhancing the integration of distributed information systems or servers into a global service by the means of collaborative proxies ► Management and collaboration based on semantic and contextual information  temperature ► Performance improvement ► Virtual communities ► Attachment of a proxy to each user

30 Jean-Marc.Pierson@insa-lyon.fr30 Future works ► test the prototype on a large scale : design a test platform ! ► push the intermediate cache management to the heart of the networks (active router) ► enhance the indexation algorithm ► apply the technology to Grid computing (cache management)


Download ppt "Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON"

Similar presentations


Ads by Google