Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

© 2006 Open Grid Forum GGF18, 13th September 2006 OGSA Data Architecture Scenarios Dave Berry & Stephen Davey.
ITEC474 INTRODUCTION.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Oracle Enterprise Manager – Cloud Control 12c Simon Keys, The Small Ronnie Martin Lambert, The Large Ronnie.
HP and ORSYP Working Together: IT Process Automation & Workload Automation.
Brian Browning | Senior Director of Client Services.
Components and Architecture CS 543 – Data Warehousing.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.
David Besemer, CTO On Demand Data Integration with Data Virtualization.
Understanding and Managing WebSphere V5
Enterprise Reporting with Reporting Services SQL Server 2005 Donald Farmer Group Program Manager Microsoft Corporation.
® IBM Software Group © IBM Corporation IBM Information Server Service Oriented Architecture WebSphere Information Services Director (WISD)
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
FIORANO SERVICE BUS The Cloud Enablement Platform
Performance and Exception Monitoring Project Tim Smith CERN/IT.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Module 7: Fundamentals of Administering Windows Server 2008.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
Managing the Oracle Application Server with Oracle Enterprise Manager 10g.
Future of the Server Room Tour. Ottawa Montreal Calgary Vancouver Toronto Future of Your Server Room Three Pillars of Windows Server 2008 Virtualization.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Oracle Data Integrator Architecture Components.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
VMware vSphere Configuration and Management v6
Company small business cloud solution Client UNIVERSITY OF BEDFORDSHIRE.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Mobile Analyzer A Distributed Computing Platform Juho Karppinen Helsinki Institute of Physics Technology Program May 23th, 2002 Mobile.
Copyright © New Signature Who we are: Focused on consistently delivering great customer experiences. What we do: We help you transform your business.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
Red Hat Enterprise Linux Presenter name Title, Red Hat Date.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
1 Case Study: Business Intelligence & Customer Data Customer Support Web-based Dashboard VP Marketing SQL XSLT XML Data Grid Customer Data Customer Order.
BIG DATA/ Hadoop Interview Questions.
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
2016 Citrix presentation.
OGSA Data Architecture Scenarios
Storage Virtualization
Introduction to Cloud Computing
Ron Carovano Manager, Business Development F5 Networks
Presentation transcript:

Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc The best thing about the Grid is that it is unstoppable. The Economist, June 21,

© Platform Computing Inc Grid : Transparent, secure and coordinated computing resource sharing across geographically disparate sites What is Grid computing?

© Platform Computing Inc Benefits of Grid Computing Grid technology is used to aggregate computing resources across the entire organization, regardless of location or business unit.  Provides virtually unlimited computing capacity  Delivers reliable, “always-on” computing infrastructure  Virtualizes IT infrastructure for end-users  Coordinates the usage of heterogeneous computing resources in order to accomplish business processing tasks

© Platform Computing Inc Example Use Cases  Batch Process Automation  Multi-Site Capacity Computing  Service Virtualization

Batch Process Automation

© Platform Computing Inc What is Platform JobScheduler? Intelligent batch process automation Grid-enabled enterprise batch process automation software Provides a Graphical Design Studio & Management console to design and control the scheduling of Oracle jobs and compute jobs with various dependencies (Line-of-Business Processes) across a virtualized environment

© Platform Computing Inc Simplified Scheduling Environment for Oracle jobs and Compute jobs Single Point of Control to Design & Monitor Job Events, File Events, Time Events Central Repository for Storing/Sharing Jobs Business flows Sub flows Proxy dependencies Consistent, Flexible & Extensible Automated Exception Handling Re-running jobs, Killing jobs, Triggering other jobs

© Platform Computing Inc More Efficient Use of Computing Resources for Oracle jobs and Compute jobs Resource Virtualization Ensures the reliability of mission critical business flows and always- on availability of resources Provision additional databases for specific tasks across time Matching demand for resources with the supply of resources

© Platform Computing Inc JobScheduler Architecture Client Grid-Enabled Application Execution Infrastructure Load XML Save XML Log Grid Master & Grid Agents Scheduling Time, Job, file, Other events Jobflow Server Process Designing/ Control Oracle Database

© Platform Computing Inc JobScheduler and Oracle scheduler integration Platform JobScheduler client Platform JobScheduler server LSF Master host Oracle instance Oracle client CB LSF host orajobstart elim.oracle.C elim.oracle.B LSF Cluster

© Platform Computing Inc ETL using Platform JobScheduler A common use of the Platform JobScheduler and Oracle scheduler integration is for ETL into a data warehouse. Example: a brokerage firm wants to load the day’s trading data into their data warehouse for analysis (e.g. risk positions, trending, etc)  ETL flow is triggered by: Time of day event Arrival of market data in flat-file format Completion of a stored procedure which collects location brokerage data  Data is cleansed and loaded with SQL*Loader into the database  Stored procedures are invoked which do some analysis and initial reporting

Multi-Site Capacity Computing

© Platform Computing Inc Increasing Computing Capacity with Platform MultiCluster A parameter space study is done on tens of thousands of individual sets of parameters, resulting in tens of thousands of analysis jobs Local cluster doesn’t have enough capacity, so Platform MultiCluster is used to allow the forwarding of analysis jobs to clusters located at other sites of the organization The DBMS_STREAMS_ADM.MAINTAIN_TABLESPACES procedure provided with Oracle Database 10g is used to replicate input data for the analysis at the remote site Database aware scheduling is used to make intelligent decisions about which sites are suitable for receiving jobs

© Platform Computing Inc Platform MultiCluster Job Forwarding Model Compute Servers Compute Servers Site A Site B Send queue Receive queue You submit We do --- Job transfer data staging Account mapping Accounting

© Platform Computing Inc Enterprise Grid Architecture

© Platform Computing Inc Workload driven data management 1. Job forwarded Pre-exec script Application Master molecular database (MOL) Tablespaces for MOL Streams maintained version of MOL Tablespaces for MOL 2. Run pre-exec 3. Connect to MOL and run MAINTAIN_TABLESPACES 4. MOL metadata and tablespaces transferred 5. pre-exec finished 6. Job is run 7. Job uses copy Streams DML updates

© Platform Computing Inc Database aware scheduling MOL Site 1 Site 3 Site 2 Data Management Service Site 1 – MOL, MOL2 Site 2 – (none) Site 3 - MOL MOL2 1. Poll for datasets 2. Update cache info 3. bsub -extsched MOL 4. Local site is overloaded Database aware scheduler plug-in decides to forward the job to site 3, since it has the MOL database 5. Job forwarded to site 3

Service Virtualization

© Platform Computing Inc Demo Lab Hardware -- A Common Web Service/Application Environment Node NAS/SAN Node Web Server & App Server Oracle RAC CISCO Hardware Load Balancer Web Interconnect network Storage network Public network (Linux) (Linux AS 2.1)

© Platform Computing Inc Oracle RAC Provisioning Demo System Apps Web Server instances … Provisioner Agent Manager Node5 Managed node Web Layer/Nodes (Linux) RAC Agent Node8 Managed node App Agent Agent Manager Apps Applicatio n instances Service Agent Node6 Managed node Agent Manager Apps App Server instances Service Agent Apps RAC instances RAC Managed cluster Node1Node4 … RAC Layer/Nodes (Linux AS 2.1) Application Layer/Nodes (Linux)

© Platform Computing Inc Proof of Concept Demos Dynamic Provisioning within Database Layer Dynamic Provisioning cross Database and Application Layers

RAC Layer dbHR dbFinance -Show one RAC node running dbFinance, two RAC nodes running dbHR, and one RAC node is idle -Have a lot of data access to dbFinance, a few of data access to dbHR -Without dynamic provisioning, the response time to dbFinance is very slow, while other RAC nodes are idle -Applying dynamic provisioning, one idle node is added to dbFinance, one dbHR node is shutdown and moved to dbFinance -The response time to dbFinance is improved ? App LayerWeb Layer Node Web ServerApp Server Provisioning Within DB Layer

Provisioning Across DB & App Layers RAC Layer dbHR dbFinance -Show one RAC node running dbFinance, one RAC node running dbHR, and two RAC nodes are idle -Have a lot of applications need to run on App Layer -Without dynamic provisioning, the response time of App Layer is very slow, while some RAC nodes are idle -Applying dynamic provisioning, some applications are running on two idle RAC nodes -The response time of App Layer is improved App Layer Web Layer ? App Server -When there are some data accesses to dbFinance, more database instances are needed -Applications on the RAC nodes are gracefully preempted, and two more dbFinance instances are started Node Web ServerApp Server

© Platform Computing Inc RAC Agent Gathers Metrics: numInstances – Instances in a given database. instanceState – Operation state of an instance. dbLoad – Various load metrics from a database User Calls, Recursive Calls Physical Reads, Physical Writes Consistent Gets, dB Block Gets Takes Actions: startInstance – Start an instance on a candidate stopInstance – Stop an instance on a candidate

© Platform Computing Inc Policy Functions Discover State of System What is the current state of the Candidates Database High Load If a candidate is free start an Instance of the loaded database. Database Low Load If a candidate was added, shutdown the database instance on the candidate.

© Platform Computing Inc Scenario 1: Results Discovery Discover pe02, and pe03 are free High Load Detect High Load on HR database. Have a candidate free. Remove candidate from free host list. Start another instance of the HR database. Add the candidate to the list of HR instances.

© Platform Computing Inc Scenario 1: Results Continued High Load Detect low load on the HR database. Detect that candidate hosts are in use. Remove from last added candidate from list of HR instances. Stop HR instance on candidate. Return candidate to list of free hosts. Low Load Add the remaining candidate to the HR instances.

Questions?