Functions of a Web Warehouse Kai Cheng, Yahiko Kambayashi, Seok Tae Lee Graduate School of Informatics, Kyoto University, Japan and Mukesh Mohania Western Michigan University, USA
13-16 November 2000ICDL Table of Contents Survival from “Information Explosion” Warehouse-Mediated Content Delivery Community-Oriented Web Warehouses Technical Issues Warehouse Enhanced Web Caching Related Work Concluding Remarks
13-16 November 2000ICDL Survival from “Information Explosion” Web Traffic Doubled Every 3-6 Months Exponential Growth of the Web –1 Billion Pages, January 2000 –2 Billion Pages, June 2000 –100 Times Increase in the Next 2 Years Information Overload for both Nets and Users
13-16 November 2000ICDL Scale up the Web and Internet More Bandwidth –Never Keep Pace with the Traffic Growth More Server Capacity –How to Deal with “Hot-Spots” ? Site Replication –Only Benefit Replicated Servers ?
13-16 November 2000ICDL Our Approach Tame the Chaotic Info. Streams Saving Redundant Data Transfers Unite the Individual Users Sharing Findings and Efforts of Each Other
13-16 November 2000ICDL Warehouse-Mediated Content Delivery Direct Delivery – QoS: Server, Network Overloaded – Personalized Services Unrealistic – Information Hunting Difficult Internet
13-16 November 2000ICDL Indirect Content Delivery Storage Output Analysis Notification Transformation Buffering WWW Input Resource Discovery Clustering Searching Navigation Filtering Web Warehouse
13-16 November 2000ICDL Community-Oriented Web Warehousing Sharing Contribution The Community of Users * People with Special Information Needs/Interests
13-16 November 2000ICDL Examples of User Community Sports Fan Patients Businessman Researchers
13-16 November 2000ICDL Real/Cyber Communities (a) Real Communities Dependent on Location (b) Cyber Communities Independent on Location
13-16 November 2000ICDL Technical Issues Functions of a Web Warehouse Web Caching vs. Web Warehousing Data Warehousing vs. Web Warehousing Dynamic Hierarchical Web Warehouses
13-16 November 2000ICDL Functions of a Web Warehouse Buffering Transformation 1.Transcoding 2.Summarizing Content Analysis Notification Resource Discovery Storage Reusing Transform Format A Format B Content A Transform Content B Data/Information Analysis Knowledge
13-16 November 2000ICDL Web Caching Research Program Content Analysis Transformation Warehousing
13-16 November 2000ICDL From Web Caching to Web Warehousing Web CachingWeb Warehousing ObjectDataInformation ObjectiveReusingSharing StorageBoundedBound-Free PopulationResponsesWeb View ModelFS DependentHypermedia
13-16 November 2000ICDL From Data Warehousing to Web Warehousing ItemsData WHWeb WH 1ObjectiveDecision SupportInformation Sharing 2ModelRDB/OORDBHypermedia 3PopulationView Materialization Resource Discovery Content Localization 4ResourceOperational DataWeb Documents 5Data TypeStructuredSemi-/Un-structured 6Tie to Web DWH WebWWH Web
13-16 November 2000ICDL Warehouse as Shared Information Repository Real Communities –Centralized Management of Warehouses –Unicast Data Transfer Cyber Communities –Distributed Management of Warehouse –Multicast Data Transfer
13-16 November 2000ICDL Hierarchy of Web Warehouses HP Design Sports Skiing Tennis Mr. A, Ms. C Mrs. D … Mr. A, Ms. C Mrs. D … Mr. A. Mr. D ….. Mr. A. Mr. D …..
13-16 November 2000ICDL Dynamic Formation of Web Warehouses (Split ) Tennis Skiing A B Sports Tennis Skiing A A B B
13-16 November 2000ICDL Dynamic Formation of Web Warehouses (Union ) Painting Drawing A A B B Painting & Drawing Painting & Drawing A A B B
13-16 November 2000ICDL Current Status: Content-Sensitive Caching Web Caching Warehousing Content Sensitive Caching Content-Sensitive Caching
13-16 November 2000ICDL Content-Sensitive Cache Replacement Policy Cache Replacement : Keep? Replace? Traditional Caching Long Time Observation Replacement Decision 60% One-Access Objects How Differentiate ? Content-Sensitive Caching LRU-SP+
13-16 November 2000ICDL LRU-SP+: Content-Sensitive Size-Adjusted & Popularity-Aware LRU Daily Indexing: Cache Content Indices Indices Popular Topics How Similar? New Document Popular Topics Benefit/Size Model “Observed” Pop. + “Inherent” Pop. Implement this Model
13-16 November 2000ICDL Related Work LSAM’s Proxy Cache (Push) –Multicast-Based Virtual Cache –Affinity Groups and Push Channels INTELSAT’s Wormhole Content Delivery –Warehouse-Koisk Model –Satellite-Based Delivery Platform
13-16 November 2000ICDL Concluding Remarks Proposed to Cope with the Scaling Problems by Web Warehouse-Mediated Content Delivery Discussed the Basic Functions of a Web Warehouse: Buffering, Transformation, Notification and Content Analysis Introduced our Current Work: Warehouse-Enhanced Web Caching