Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management on Opportunistic Grids

Similar presentations


Presentation on theme: "Data Management on Opportunistic Grids"— Presentation transcript:

1 Data Management on Opportunistic Grids
Raphael Y. de Camargo Orientador: Prof. Fabio Kon Department of Computer Science IME / USP São Paulo, February 2006 4th Workshop InteGrade

2 INTRODUCTION Portable checkpointing of parallel applications
Strategies for storage of checkpointing data inside a single cluster Replication Encoding into redundant fragments Next step is to develop techniques to distribute data throughout the Grid São Paulo, February 2006 4th Workshop InteGrade

3 INTRODUCTION Grid applications require large amounts of computational power and storage space Image processing, weather forecasting Input, output, and checkpointing data Fault-tolerance and data liveness An entire cluster can fail, losing all application data Pervasive access to data São Paulo, February 2006 4th Workshop InteGrade

4 OBJECTIVES Develop a middleware that deals with the storage of large amounts of data in opportunistic Grids Independent of InteGrade Two-level approach for data storage Grid level: Data is broken into several redundant fragments, which are scattered throughout the Grid Cluster level: Data stored inside a cluster São Paulo, February 2006 4th Workshop InteGrade

5 SYSTEM DESIGN Organized as a federation of clusters:
CDRMs: Cluster Data Repository Managers ADRs: Autonomous Data Repositories Access Broker São Paulo, February 2006 4th Workshop InteGrade

6 SYSTEM DESIGN Structured overlay network of CDRMs
Pastry: Distributed Hash Table (DHT). CDRMs forward storage requests to ADRs Broker: provide client access to storage Transparently contacts a CDRM and perform storage and read operations. São Paulo, February 2006 4th Workshop InteGrade

7 PASTRY DHT PROTOCOL Large id space (e.g. 160 bits)
Each file receives a random key Each node receives a random nodeId Ids generated using a one-way hash function Given a key, Pastry returns the node with the closest nodeId in O(log N) steps FreePastry: Open-source implementation São Paulo, February 2006 4th Workshop InteGrade

8 DATA SCATTERING File is broken and encoded into n pieces
k fragments sufficient to reconstruct the file Fragment key: hash value of its contents Fragments are individually routed and stored São Paulo, February 2006 4th Workshop InteGrade

9 DATA SCATTERING Storage requests reach CDRMs
Forward the storage request to an available ADR Requests for storage of fragments return an ADR address Client creates a File Information Structure containing the hash and server address for each fragment File Information Structure is stored in the Grid Data Recovery Client searchs for the File Information Structure Gets the fragments directly from the servers São Paulo, February 2006 4th Workshop InteGrade

10 IMPROVING PERFORMANCE
Strategies to improve system performance Storage of fragments in the local cluster File Information Structure in the local CDRM For moderately sized Grids (e.g. 10k clusters), can reach target CDRM in 2 hops São Paulo, February 2006 4th Workshop InteGrade

11 ADAPTIVE ID ALGORITHM Pastry: random assigments of nodeIDs
Each node is responsible for a random ID range Heterogeneity of clusters is not considered Adaptive nodeId algorithm nodeId of CDRMs is partially random Range of ids for a CDRMs is dependent on the CDRM capacity São Paulo, February 2006 4th Workshop InteGrade

12 ADAPTIVE ID ALGORITHM When a new CDRM starts, it compare its capacity with its k neighbors CDRMs neighbors alter their nodeId The id space is then repartitioned Routing table of other CDRMs is not updated After the repartitioning, some CDRMs will be responsible for an id range previously from other CDRM A CDRM redirects the redirects the request to the correct node São Paulo, February 2006 4th Workshop InteGrade

13 INTRA-CLUSTER STORAGE
Each ADR is responsible for a slice of the CDRM id range Slice proportional to ADRcapacity Also employ the adaptive Id algorithm File name information is kept on ADR Each ADR maintain information from files of its neighbors for fault-tolerance ADR are reached with a single hop São Paulo, February 2006 4th Workshop InteGrade

14 DATA SECURITY Data stored on mutually distrusting nodes Data privacy
IDA coding provides partial privacy by breaking the into several encoded fragments Possible to encrypt data using private keys Data integrity Fragments checked by hashing its contents Fragments hashs stored on File Info Struct F.I.S. protected using key-based hashing São Paulo, February 2006 4th Workshop InteGrade

15 CASE-STUDY: INTEGRADE
InteGrade architecture Organized as a federation of clusters Clusters contain 2 kinds of nodes Cluster Manager Resource Provider CDRMs located the GRMs ADRs located with LRMs São Paulo, February 2006 4th Workshop InteGrade

16 CASE-STUDY: INTEGRADE
Used to store application data Semantic information: Execution Manager and Data Repository. Checkpointing data Stored in the local cluster Periodically stored in the Grid Output Data Stored incrementaly Input Data Can be obtained directly from the ASCT São Paulo, February 2006 4th Workshop InteGrade

17 SIMULATION Simulations to evaluate our system: Adaptive Id algorithm
Amount of reconfiguration Number of server forwarding Fault-tolerance properties Intracluster Storage Data liveness São Paulo, February 2006 4th Workshop InteGrade

18 FAILURE MODELS Two failure cases Changes in machine states
Communication between clusters End-to-end internet connectivity Failures in machines Changes in machine states Machine goes from idle to occupied Analysing correlations in usage patterns São Paulo, February 2006 4th Workshop InteGrade

19 CURRENT STATUS We are currently in the design phase
Next step is to implement a prototype of the system and perform simulations After, we will implement the system on InteGrade and perform simulations to test its scalability on a real environment São Paulo, February 2006 4th Workshop InteGrade

20 CONCLUSIONS System that allows storage of large amounts of data on opportunistic Grids Self-organizing and fault-tolerant Considers the heterogeneity of the Grid São Paulo, February 2006 4th Workshop InteGrade

21 QUESTIONS For more information, please visit the poject page:
São Paulo, February 2006 4th Workshop InteGrade


Download ppt "Data Management on Opportunistic Grids"

Similar presentations


Ads by Google