Download presentation
Presentation is loading. Please wait.
Published byNigel Byrd Modified over 9 years ago
1
EFFECTIVE LOAD-BALANCING VIA MIGRATION AND REPLICATION IN SPATIAL GRIDS ANIRBAN MONDAL KAZUO GODA MASARU KITSUREGAWA INSTITUTE OF INDUSTRIAL SCIENCE UNIVERSITY OF TOKYO, JAPAN {anirban,kgoda,kitsure}@tkl.iis.u-tokyo.ac.jp
2
PRESENTATION OUTLINE INTRODUCTION INTRODUCTION RELATED WORK RELATED WORK SYSTEM OVERVIEW SYSTEM OVERVIEW MIGRATION AND REPLICATION MIGRATION AND REPLICATION LOAD-BALANCING LOAD-BALANCING PERFORMANCE STUDY PERFORMANCE STUDY CONCLUSION AND FUTURE WORK CONCLUSION AND FUTURE WORK
3
INTRODUCTION Prevalence of spatial applications GIS, CAD,VLSI Resource management, development planning, emergency planning, scientific research, GIS, CAD,VLSI Unprecedented growth of available spatial data at geographically distributed locations the need for efficient networking Emergence of GRID computing and powerful networks Motivates the design of a SPATIAL GRID.
4
CHALLENGES Scale Scale Heterogeneity Heterogeneity Dynamism Dynamism Cross-domain administrative issues Cross-domain administrative issues Efficient search and load-balancing mechanisms Efficient search and load-balancing mechanisms We focus on load-balancing. Load-balancing in GRIDs is much more complicated than in traditional environments.
5
LOAD-BALANCING Some nodes become hot Some nodes become hot Skewed Workloads Dynamic access patterns These hot nodes become bottlenecks These hot nodes become bottlenecks Increased waiting times High response times MAIN CONTRIBUTIONS MAIN CONTRIBUTIONS Viewing a spatial GRID as comprising several clusters Each cluster is a LAN Proposal of an inter-cluster load-balancing algorithm which uses migration/replication of data. Presentation of a scalable technique for dynamic data placement.
6
RELATED WORK Ongoing GRID projects Earth Systems Grid (ESG) NASA Information Power Grid (IPG) Grid Physics Network (GriPhyN) European DataGrid. [Thain01] Binding of execution and storage sites together into I/O communities [Thain01] Data-movement system (Kangaroo) Load-balancing Load-balancing STATIC (BUBBA, tile technique) DYNAMIC (Disk cooling) Job (Process) MIGRATION in CONDOR Job (Process) MIGRATION in CONDOR Spatial indexes: R-tree [Guttman:84]
7
SYSTEM OVERVIEW Viewing the GRID as a set of clusters Viewing the GRID as a set of clusters Distance between two clusters Distance between two clusters Communication time between cluster leaders Neighbours Neighbours Definition of Load Definition of Load Number of disk I/Os in a certain time interval Normalize w.r.t CPU power Cluster leaders Cluster leaders Coordinate cluster activities Maintain meta-information Data stored at its own cluster & its neighbours Hotspot detection via access statistics Hotspot detection via access statistics Use only recent statistics
8
DATA MOVEMENT IN GRIDs MIGRATION & REPLICATION MIGRATION & REPLICATION Unlike replication, migration implies deletion of hot data at the source node. Which option is better: Migration or Replication Which option is better: Migration or Replication Load-balancing Data Availability Disk space usage Periodic cleanup REPLICA CONSISTENCY ?? REPLICA CONSISTENCY ?? Decisions concerning migration/replication should be taken during run-time. Decisions concerning migration/replication should be taken during run-time.
9
DATA MOVEMENT (Cont.) Impact of heterogeneity on data movement Impact of heterogeneity on data movement Administrative policies (e.g., security) Data management techniques (Indexing, hotspot detection, etc) CPU Disk space Moving data entails movement of indexes. Moving data entails movement of indexes. To address variations in indexing schemes, we extract data from the index at a node and rebuild the index at the destination node. To address variations in indexing schemes, we extract data from the index at a node and rebuild the index at the destination node. Each node has two indexes Each node has two indexes Index for its own data Index for moved data
10
DATA MOVEMENT (Cont.) Impact of variations in disk space on data movement ‘Pushing’ non-hot data to large capacity peers Large-sized data: migration Small-sized data: replication Replicating small-sized hot data at small capacity peers Large-sized hot data: migration to large capacity peers if peers are available, otherwise replication. Deletion of infrequently accessed replicas
11
INTER-CLUSTER LOAD-BALANCING Periodic exchange of load info between neighbours Periodic exchange of load info between neighbours Leader L considers itself to be overloaded if its load exceeds that of its neighbours by 10%. Leader L considers itself to be overloaded if its load exceeds that of its neighbours by 10%. L determines its hot regions and informs its neighbours about disk space requirement of hot regions. Number of hot regions depends upon load imbalance. Neighbours with enough disk space reply to L with their load status and disk space information. Neighbours with enough disk space reply to L with their load status and disk space information. These leaders are sorted (asc) in List1 based on their loads. These leaders are sorted (asc) in List1 based on their loads. L assigns hot regions to members of List 1 in a round-robin manner. L assigns hot regions to members of List 1 in a round-robin manner. The hottest region is moved to first member of List1, the second hottest region is moved to second member of List1 and so on.
12
PERFORMANCE STUDY 16 SUN workstations, each of which is a 143 MHz Sun UltraSparc I processor (256 MB RAM) running Solaris 2.5.1 operating system. These are connected by relatively high speed switch (200 Mbyte/s), the APnet. Each cluster is modeled by a workstation node. We simulated a transfer rate of 1 Mbit/second among the clusters. We implemented an R-tree on each of the clusters to organize the data allocated to each cluster. A real dataset (Greece Roads) Each cluster had more than 200000 data rectangles. Zipf distribution was used to model workload skews. We investigated only migration in this proposal.
13
PERFORMANCE OF OUR PROPOSED SCHEME
14
SNAPSHOT OF LOAD-BALANCING FOR ZIPF FACTOR OF 0.1
15
VARIATIONS IN WORKLOAD SKEW
16
SNAPSHOT OF LOAD DISTRIBUTION FOR ZIPF FACTOR OF 0.5
17
Huge amounts of available spatial data worldwide coupled with the emergence of GRID technologies and powerful networks motivate the design of a spatial GRID. Huge amounts of available spatial data worldwide coupled with the emergence of GRID technologies and powerful networks motivate the design of a spatial GRID. For performance reasons, effective load-balancing is necessary in such a spatial GRID. For performance reasons, effective load-balancing is necessary in such a spatial GRID. We view a GRID as a set of clusters. We view a GRID as a set of clusters. Proposal of a dynamic inter-cluster load-balancing strategy via migration/replication in GRIDs Proposal of a dynamic inter-cluster load-balancing strategy via migration/replication in GRIDs SUMMARY
18
FUTURE SCOPE OF WORK FAIRNESS IN LOAD-BALANCING FAIRNESS IN LOAD-BALANCING GRANULARITY OF DATA MOVEMENT GRANULARITY OF DATA MOVEMENT DETAILED PERFORMANCE STUDY DETAILED PERFORMANCE STUDY REPLICATION DIFFERENT WORKLOAD TYPES SCALABILITY INTEGRATION INTO EXISTING GRIDs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.