Download presentation
Presentation is loading. Please wait.
Published byLogan Harrington Modified over 10 years ago
1
2000 Making DADS distributed a Nordunet2 project Jochen Hollmann Chalmers University of Technology
2
2000 Project Aims Principles for the design of distributed systems devoted to Digital Libraries (DL) Project results will contribute with Tradeoffs for the design of future DL infrastructures Knowledge how users interact with DL Algorithms for data replication and pre-fetching Detailed experience from actual implementations (DADS)
3
2000 Agenda Comparison: centralized and distributed approach General techniques for speedup System properties and opportunities for improvement How to distribute? Project plan
4
2000 Centralized Approach Potential Advantages: Low complexity Low total ownership costs Simple administration Potential Disadvantages: Single point of failure Latency/Overload Availability Does not scale No parallel activities
5
2000 Total Replication to all Clients Potential Advantages: High availability Minimal latency Data retrieval in parallel Potential Disadvantages: Expensive Bandwidth used to distribute Difficult to allow updates everywhere Does not scale
6
2000 General speedup techniques Prefetching: Meta data or heuristics allow to request a local copy ahead of time Caching: Keep a retrieved copy for future use (and avoid re-transferring it) Replication: Select data and distribute copies without a request t t t Request 1Request 2 Start Prefetching Point of Replication Search result available
7
2000 Properties of Articles and the System Articles Contain references to related work selected by the author Are catalogued by experts Published articles went through an acceptance process –high quality data A Search Reduces the number of articles to a small number Presents the results before retrieving the article May contain patterns to hint replication
8
2000 General speedup techniques t t t Request 1Request 2 Start Prefetching Point of Replication Search result available Search in the index Selection from the list fetch a paper get related articles manual feedback
9
2000 How to distribute? DepartmentResearcherUniversity Global Library Research Group In deep knowledge Research area Journals Field Everything Prefetching Caching on article base Caching on journal base Replication of most used journals
10
2000 Project Plan
11
2000 Project Plan Phase I (Aug 2000 - Apr 2001) Analysis of the current centralized system and construction of a simulation model (using data from DADS) Phase II (Apr 2001 - Dec 2001) Design and evaluation of a distributed version and the contained algorithms Phase III (Dec2001 - July 2002) Evaluation and fine-tuning of the algorithms in DADS
12
2000 Phase I: Analysis of the current system Life System analyze the log files –find locality find bottleneck in the current system –hints what should be logged –hints what can be replicated –where does latencies occur Understand the system properties Simulation build a trace driven the simulation model test if the bottlenecks can be reproduced measure the simulation with current and future technology parameters –network technology, storage costs –what are the problems that will remain Develop a benchmark! Develop metric to quantify the costs. Both systems should behave identically
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.