Download presentation
Presentation is loading. Please wait.
1
www.Grid.org.il Distributed Data Management for Compute Grid Presented by Michael Di Stefano Founder of Author of Meeting: Tuesday, September 13 th, 2005
2
www.Grid.org.il Slide - 2 - Agenda Data Management - The Next Grid Problem Evolution in Compute Topology Objectives of Data Management New Topology – New Data Management Techniques New Techniques, New Research, Emergence of Standards
3
www.Grid.org.il Slide - 3 - Two Components of The Grid Compute GRID The Grid Operating System - provides the core services for grid computing –Physical Resource Accounting –Process Task Queues –Management of Task/Resource Execution Data GRID Data Management System of Grid - Manages all aspects –Enterprise Data –Data Scheduling –Replication –Availability –Legacy Access Compute Grid Data Grid
4
www.Grid.org.il Slide - 4 - Compute Grids Roll your own Compute Grid Free Versions of Compute Grids Product and Supported Compute Grids
5
www.Grid.org.il Slide - 5 - Data Grids Data Grid Engine - Movement of Bits and Bytes FTP Sockets Middleware (messaging) Caches Applications Perspective Multiple Data Characteristics Quality of Service Data Management not Bit/Byte Movement
6
www.Grid.org.il Slide - 6 - Evolution in Computing MainframeMiniClient/Server
7
www.Grid.org.il Slide - 7 - 15 Years of Distributed Computing Evolution Sockets CORBA Messaging Internet Application Servers Tight Bindings Loose Coupling Publish / Subscribe Grid Topology Emerging from the “Evolutionary Mist” Client/Server © Integrasoft, L.L.C. 2005
8
www.Grid.org.il Slide - 8 - Evolution Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005
9
www.Grid.org.il Slide - 9 - The Grid Topology Client / Server Compute Grid Physical Operational Operating System Physical CPU Peripherals Execution Threads Operating System Physical Nodes Resource/Node Management Inventory of Work/Tasks Resource Inventory Matching of Task to Recourse Close Proximity (Mother Board) Diverse CPU Families Diverse Geography Diverse Network Bandwidth
10
www.Grid.org.il Slide - 10 - Application on the Grid Multiple Data Sources and Destinations Client Information Portfolio Information Market Data Quality of Service Levels Application in its entirety Application components Speed of Access Query Updates (Transactional, Optimistic)
11
www.Grid.org.il Slide - 11 - How QoS is Delivered Today Relational Databases SQL Query Transactional Updates Stored Procedures Middleware Queuing Various delivery modes Publish and Subscribe Easy Programmatic API Other Object Databases Object Relational Data flow and movement is optimized. Designed to meet Application QoS For Client/Server Topology
12
www.Grid.org.il Slide - 12 - Application Today in Client/Server Threads RAM Connection Pools Tailored Middleware Business Applicatio n Server Machine
13
www.Grid.org.il Slide - 13 - What Happens in a Grid Business Applicatio n Server Machine Compute Grid
14
www.Grid.org.il Slide - 14 - The Data Access Funnel Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005
15
www.Grid.org.il Slide - 15 - Data Grid Eliminates the Funnel Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005
16
www.Grid.org.il Slide - 16 - Goals of a Data Management in Grid The Big 3 Goals of Data Management in Grid Optimize Data Affinity –Minimize Data Movement –Optimize the recourse of the Network Maintain Business Application QoS for Data Management Integrate Legacy Systems into the Grid
17
www.Grid.org.il Slide - 17 - How do Achieve Goals of the Data Grid What the Architect/Developer must Address How many copies or “Replicas” of data are needed in the Data Grid? How fine is the granularity of my “Data Atoms” to be replicated? How do best to “Distribute” Data Atoms across the Data Grid? What level of “Synchronization” is required? How do “logically group” data along business lines? How to “Integrate” and “Operate” legacy data sources? How to manage “Events” in the Data Grid? Synchronization of data sources external to the Data Grid?
18
www.Grid.org.il Slide - 18 - Data Management in Grid Granularity of Data Atoms Replication Distribution Logical Data Groupings (Data Regions) Synchronization InterRegion IntraRegion External Data Sources Events Integration with Legacy Systems Nothing to do with mechanics of the bits and bytes These are Data Management Issues
19
www.Grid.org.il Slide - 19 - Data Management is NOT Caching Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005 Moves the bits and bytes -Cache -Grid FTP -Others Data Management to deliver Business Application’s QoS given the “compute topology”
20
www.Grid.org.il Slide - 20 - Engines of a Data Grid Cache Java based engines such as JCache, Java Spaces, … Various C++ Caches Recycled Object Data Base Technology FTP Grid FTP Meta Data Services File Systems NFS Distributed File Systems
21
www.Grid.org.il Slide - 21 - Right Tool for the Job Business Applications have specific QoS levels from the Data Grid Complex Analysis of Large Data Sets Dependency of small fast moving data sets Large Static Data Sets …….
22
www.Grid.org.il Slide - 22 - Business Drivers Fueling Grid
23
www.Grid.org.il Slide - 23 - Business Drivers Fueling Grid Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005
24
www.Grid.org.il Slide - 24 - Limited Patience of Business
25
www.Grid.org.il Slide - 25 - No Data Management Tools Difficult Custom Code Long Time to Delivery No Reuse Business Prospective Increased Complexity Improved Performance Financial ROI Grid fails Wide Spread Acceptance
26
www.Grid.org.il Slide - 26 - Business Prospective Financial ROI With Data Management for Grid Easy to use/understand Reuse Effort on business Increased Complexity Improved Performance Fast Time to Market Ease of Migration to Grid Changes Data Centers
27
www.Grid.org.il Slide - 27 - Data Management in Grid Granularity of Data Atoms Replication Distribution Data Regions Synchronization Integration with Legacy Systems If Distributed Data Management is not addressed, wide acceptance of Grid will fail.
28
www.Grid.org.il Slide - 28 - Measuring QoS to Determine Data Grid Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005
29
www.Grid.org.il Slide - 29 - Measuring QoS to Determine Data Grid Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005 Application QoS( Work(), Data(), Time(), Geography() Query() ) Where: Work( batch/atomic, sync/async ) Data( overall size, atomic size, transient, query ) Time( RealTime, Non-RealTime, Near-RealTime ) Geography( Topology, Bandwidth ) Query( Basic, Complex )
30
www.Grid.org.il Slide - 30 - Objective of Data Grid - Data Affinity Low cost of CPU Data size is determined by application Network bandwidth is limited Data and Work need to be co-located Virtual Centrally Managed Data Base Physically Distributed
31
www.Grid.org.il Slide - 31 - How to Achieve Data Affinity Locate data and work close together to minimize data movement across the network Reactive : Data Grid distributes data in anticipation of where work will be assigned. Distributed Data Management policies of Regionalization Replication Distribution Synchronization Proactive : Routing of Task to Data. Compute Grid Task Scheduler queries Data Locality Information from Data Grid
32
www.Grid.org.il Slide - 32 - Distributed Data Management Data Regions Replication Distribution Synchronization Load and Store Event
33
www.Grid.org.il Slide - 33 - Distributed Data Management Policies Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005
34
www.Grid.org.il Slide - 34 - Advanced Topics in Distributed Data Management Natural Attraction Forces of Data Bodies Within a Data Grid To Describe Efficient Data Distribution Patterns ---------------White Paper ------------- Michael Di Stefano September 2004 Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005
35
www.Grid.org.il Slide - 35 - Advanced Topics in Distributed Data Management Natural Attraction Forces of Data Bodies Within a Data Grid To Describe Efficient Data Distribution Patterns ---------------White Paper ------------- Michael Di Stefano September 2004 Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005
36
www.Grid.org.il Slide - 36 - Purchasing Information Please Visit www.integrasoftware.com To Purchase your copy of “Distributed Data Management for Grid Computing” To receive a 15% discount.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.