Grid Computing – Issues in Data grids and Solutions Sudhindra Rao
Grid ComputingOSCAR Lab2 Outline Grid Computing – introduction Computational Grids Data Grids Data Management Related Work Technologies – JavaSpaces, OceanStore Our research plan Discussion
Grid ComputingOSCAR Lab3 What is grid computing? Use a network of PCs Faster networks, cheaper PCs, lot of idle time Easy to build, maintain, scale Generic solution for scientific and business problems alike Some form of grid computing - Argonne National Lab, Google etc.
Grid ComputingOSCAR Lab4 Capabilities Security Manageablity Agility Goals Efficiency Profitability Control Uncertainty Complexity Distribution New Opportunities World Events Market Dynamics Grid Computing Maturing Technology Why today?
Grid ComputingOSCAR Lab5 Compute- intensive analytics OLAP data analysis Data Center operations Compute Utility services Value at risk Credit risk Real-time risk management Automated trade programs Anti-money laundering Credit card (risk and customer Data mining) Billing In-process system migration High fault tolerance Geographic data center independence for failover and business applications Data center compute farms Corporate compute utility services creating a low- cost infrastructure similar to the electric grid Applications – data grids Geographic distribution of data Computations on large scale data
Grid ComputingOSCAR Lab6 Distributed Computing Evolution File sharing CORBAData translation Data queuesPublish/SubscribeSmart routing Pipes/socketsClustersData gridsUtility service Middleware Client/Server Grid Computing Evolution of distributed computing
Grid ComputingOSCAR Lab7 Compute grid Distributed pool of resources Completing a task for a user User requests and reserves resources Some kind of middleware manages resources and tasks Resilient and fault tolerant
Grid ComputingOSCAR Lab8 Data grid Client Network pipe 1-1 connectivity Server Data Storage Compute grid – coordinating set of tasks Multiple applications/worker threads accessing single datastore Business AppServer Client Network pipe 1-1 connectivity Server
Grid ComputingOSCAR Lab9 Data Storage Compute grid – coordinating set of tasks Data grid – manages data Data grid – eliminates data access bottlenecks
Grid ComputingOSCAR Lab10 Data grid architecture Mechanism neutrality Policy neutrality Compatibility with compute grid Uniformity with information infrastructure Services Storage Service Grid storage API Metadata service
Grid ComputingOSCAR Lab11 Data grid architecture Expectations Coordination between compute and data grid Data delivery to facilitate task and resource management Sharing data distribution and location information Leveraging data locality Guarantees Dependability Consistency Pervasiveness Security Inexpensive
Grid ComputingOSCAR Lab12 Batch Synchronous Static data Nontransactional Atomic Synchronous Static Data Nontransactional Atomic Asynchronous Static Data Nontransactional Atomic Asynchronous Dynamic data Nontransactional Atomic Synchronous Static data Transactional Atomic Asynchronous Dynamic data Transactional Atomic Asynchronous Static data Transactional Batch Synchronous Static data Transactional Application Complexity Work, Time, Data, Transactional Data Grid QoS Level 0 Level 1 OLAP Real-time datamart Monte Carlo Simulation Data delivery - QoS requirements
Grid ComputingOSCAR Lab13 Related Work Grid File System - provides primitives like a file system – Level 0 QoS NFSv4 – High performance, extensible, secure – in the works Secure File System – self certifying paths, unique identifiers, global namespace, key based certification
Grid ComputingOSCAR Lab14 Technologies related to data grids - JavaSpaces “Make Room for JavaSpaces, Part I Ease the Development of Distributed Apps with JavaSpaces” - Eric Freeman and Susan Hupfer
Grid ComputingOSCAR Lab15 OceanStore Global replication of data Promiscuously caches data Version based archival storage Applications can control their consistency requirements to manage performance Internal event monitors analyze access patterns to move data and provide redundancy
Grid ComputingOSCAR Lab16 Grid Fabric - Integrasoft Business solution provided for financial institutions, share traders Designed to complement compute grid Works closely with compute grid to schedule tasks based on data availability Moves data closer to computation
Grid ComputingOSCAR Lab17 WebServices Business process Data Grid Delivershas Requires State SOA and Data grids Moore’s law and Metcalf’s law Network based computation and grid computing with SOA Intelligent infrastructure – SONA
Grid ComputingOSCAR Lab18 Web 2.0
Grid ComputingOSCAR Lab19 Our research – Motivation Issues in data management Data tightly coupled to computation Data cached locally Distribution is haphazard and reuse is minimal Data pulled by computation – not delivered Mechanisms still improvise based on experience on smaller systems
Grid ComputingOSCAR Lab20 Data Grid and DBMS Grid DBMS Security Transparency Robustness Efficiency Intelligence Fragmentation Heterogeneity DBMSData Regions TablesSchemaOrdered Structure TriggersEvents Stored Procedures OptimizationsDistributed procedures Intra-table fieldsIndexingCross-structure Table/row levelLockingData atom level Table joinsRelationData atom SQLQueryProgrammatic string base IndexesRepeated data access Tags
Grid ComputingOSCAR Lab21 Data grid – eliminates data access bottlenecks Persistence Mechanism – with data regions Data Storage indicates Replicas, relations Data grids as extended DBMS
Grid ComputingOSCAR Lab22 Datacentric grids Automated space management and garbage collection Space and data objects lifetime mechanism I/O allocation on storage system Estimating access from Magnetic storage Co-scheduling of compute and storage resources Space reservation dilemma Thin clients Code mobility towards data
Grid ComputingOSCAR Lab23 Expected Results Can we move computation closer to data? Data grid –with features of persistence? Performance improvement using tags? Loosely coupled data grid and compute grid? Scalability of unique naming in file systems?
Thank you!