Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University.

Slides:



Advertisements
Similar presentations
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
Advertisements

Live migration of Virtual Machines Nour Stefan, SCPD.
23 July 2002 Performance Comparison of Grid Information Services Beth Plale Computer Science Dept. Indiana University Unified Relational GIS Project Collaborative.
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Overview of this week Debugging tips for ML algorithms
Addressing the Trust Asymmetry Problem In Grid Computing with Encrypted Computation Peter A. Dinda Prescience Lab Department of Computer Science Northwestern.
1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer.
Scenario ResultsEase of Use Ease of Use captures intangible aspects of performance of a grid service, in particular, amount of work client must undertake.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
Automatic Run-time Adaptation in Virtual Execution Environments Ananth I. Sundararaj Advisor: Peter A. Dinda Prescience Lab Department of Computer Science.
Increasing Application Performance In Virtual Environments Through Run-time Inference and Adaptation Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience.
A Decentralized Relational Information Service for Large Scale Distributed Computing Thesis Proposal April 2 nd, 2004 Dong Lu Committee Peter A. Dinda.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
1 Dong Lu, Peter A. Dinda Prescience Laboratory Department of Computer Science Northwestern University Evanston, IL GridG: Synthesizing Realistic.
Recent Results in Resource Signal Measurement, Dissemination, and Prediction App Transport Network Data Link Physical App Transport Network Data Link Physical.
Hardness of Approximation and Greedy Algorithms for the Adaptation Problem in Virtual Environments Ananth I. Sundararaj, Manan Sanghi, John R. Lange and.
Exploiting Packet Header Redundancy for Zero Cost Dissemination of Dynamic Resource Information Peter A. Dinda Prescience Lab Department of Computer Science.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
An Optimization Problem in Adaptive Virtual Environments Ananth I. Sundararaj Manan Sanghi Jack R. Lange Peter A. Dinda Prescience Lab Department of Computer.
1 Dong Lu, Peter A. Dinda Prescience Laboratory Computer Science Department Northwestern University Virtualized.
Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.
Chapter 14 The Second Component: The Database.
Peter Dinda Department of Computer Science Northwestern University Beth Plale Department.
Getting Started Chapter One DATABASE CONCEPTS, 7th Edition
SQL Forms Engine Koifman Eran Egri Ozi Supervisor: Ilana David.
Dynamic Topology Adaptation of Virtual Networks of Virtual Machines Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience Lab Department of Computer.
Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Peter Dinda Department of Computer Science Northwestern University.
Feeds Computer Applications to Medicine NSF REU at University of Virginia July 27, 2006 Paul Lee.
Stanford University StanfordNetDB Stanford NetDB- An Open Source Network Management Application for DNS, DHCP, IP Address Spaces, etc.
Overview of the ODP Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal technology, Buenos.
● Problem statement ● Proposed solution ● Proposed product ● Product Features ● Web Service ● Delegation ● Revocation ● Report Generation ● XACML 3.0.
NETWORK CENTRIC COMPUTING (With included EMBEDDED SYSTEMS)
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
Network Monitoring System for the UNIX Lab Bradley Kita Capstone Project Mentor: Dr C. David Shaffer Fall 2004/Spring 2005.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen†,Eric Jul†, Christian Limpach, Ian Pratt, Andrew Warfield.
DBI313. MetricOLTPDWLog Read/Write mixMostly reads, smaller # of rows at a time Scan intensive, large portions of data at a time, bulk loading Mostly.
Resource Predictors in HEP Applications John Huth, Harvard Sebastian Grinstein, Harvard Peter Hurst, Harvard Jennifer M. Schopf, ANL/NeSC.
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
Experiences with OGSA-DAI : Portlet Access and Benchmark Deepti Kodeboyina and Beth Plale Computer Science Dept. Indiana University.
CS453: Databases and State in Web Applications (Part 2) Prof. Tom Horton.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
Streaming Big Data with Self-Adjusting Computation Umut A. Acar, Yan Chen DDFP January 2014 SNU IDB Lab. Namyoon Kim.
Java Web Server Presented by- Sapna Bansode-03 Nutan Mote-15 Poonam Mote-16.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
DGC Paris Spitfire A Relational DB Service for the Grid Leanne Guy Peter Z. Kunszt Gavin McCance William Bell European DataGrid Data Management.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Lock Tuning. Overview Data definition language (DDL) statements are considered harmful DDL is the language used to access and manipulate catalog or metadata.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
 INDEX  Overview.  Introduction.  System Requirement.  Features Of SQL.  Development Process.  System Design (SDLC).  Implementation.  Future.
Grid Information Services: alternate models
Haiyan Meng and Douglas Thain
Department of Computer Science Northwestern University
Building a Database on S3
CS 440 Database Management Systems
Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience Lab
DotSlash: An Automated Web Hotspot Rescue System
Cloud computing mechanisms
Admission Control and Request Scheduling in E-Commerce Web Sites
An Electronic Borrowing System Using REST
Performance And Scalability In Oracle9i And SQL Server 2000
An Optimization Problem in Adaptive Virtual Environments
Presentation transcript:

Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University

2 Overview RGIS: GIS system based on the relational data model using SQL Complex compositional queries can be posed –“Find me 16 hosts on the same LAN that together have 32 GB of RAM” Can be very expensive to answer –Joins: worst case O(n^m) for m tables of size n Introduce nondeterminism –User gets random sample of result set –Automated query transformation

3 Outline Overview Model Implementation Nondeterministic queries Performance evaluation Related work Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

4 RGIS Model of a Grid module endpoint maclink macswitch iplink router host connectorswitch connectorlink Annotated network topology graph Annotation examples –Hosts: memory, disk, OS, NICs, etc. –Router/Switch: backplane bandwidth, ports –Link: latency and bandwidth Highly dynamic data in streams, not DB Virtualization, Futures, Leases –Virtual machines Network Data link Physical Software

5 Outline Overview Model Implementation Nondeterministic queries Performance evaluation Related work Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

6 Software Network Data Link Physical Metadata Types Security

7

8 RGIS Design (Per Site)

9 RGIS Design (Intersite) RGIS Server Update Push To Friend Site Update Push To Friend Site Site RGIS server pushes local updates to friend sites Site RGIS server consolidates updates from site and friend sites Site RGIS server answers all queries originating from its site A B C

10 Insert/Update/Delete Dual Xeon 1 GHz, 2 GB, 8x36 GB RAID5, Oracle 9i xx

11 2,700 lines of authored SQL 4,000 lines of generated PL/SQL 22,000 lines of authored Perl Main dependencies DBI to Oracle 9i SOAP::Lite CGI Not finished yet!

12 RGIS Design (Per Site) This talk

13 Outline Overview Model Implementation Nondeterministic queries Performance evaluation Related work Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

14 Motivation Queries for compositions of resources easily expressed in SQL: But such queries can be very expensive to execute However, we typically don’t need the entire result set, just some rows, and not always the same ones And we need them in a bounded amount of time “Find 2 hosts with Linux that together have 3 GB of RAM” select h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072

15 Why Not Just Limit? Oracle rownum, MySQL limit clause “Return first k rows of result set” Problem: Always get the SAME answer Problem: May STILL take a long time –Results not discovered until near the end Problem: Query time related to DATA as well as k

16 Query Approaches All results Scoped results Nondeterministic results (this paper) Approximate results Available in Grid 2003 Paper Return Random Sample of Result Set

17 Nondeterministic Version of Query select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds

18 Implementing non-deterministic queries select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds SELECT H1.INSERTID, H2.INSERTID FROM HOSTS H1 SAMPLE(P), HOSTS H2 SAMPLE(P) WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) Query Manager and Rewriter Random sample of input tables with Selection Probability P determined by time constraint and server load Using Oracle-Specific Extensions

19 Implementing non-deterministic queries select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds SELECT H1.INSERTID, H2.INSERTID FROM HOSTS H1, HOSTS H2, INSERTIDS TEMP_H1, INSERTIDS TEMP_H2 WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) AND (H1.INSERTID=TEMP_H1.INSERTID AND TEMP_H1.rand > AND TEMP_H1.rand AND TEMP_H2.rand <= ) Query Manager and Rewriter Random sample of input tables with Selection Probability P determined by time constraint and server load Using Our Schema (Not Oracle-Specific) Rest of Talk

20 Implementing non-deterministic queries Hostinsertidrandom_number 0N x x+y Random Starting Point y=P*N Reshuffling Requirement

21 Deadlines Hard-limiting –Time-limited thread or process forked Climbing –Start with low probability p, issue query, if no results, double probability, try again, keep going until no more time or have results Estimation –Like climbing, but do polynomial estimation over previous runs to estimate if next run will exceed deadline

22 Outline Overview Model Implementation Nondeterministic queries Performance evaluation Related work Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

23 GridG: Synthesing Realistic Computational Grids Generates a Grid as an annotated layer 3 topology –Hosts, routers, links Graph conforms to power laws of Internet topology Annotations include: –memory, clock speed, cpu type, number of CPUs, operating system type, link bandwidths, router bandwidths, etc. –Memory distribution according to Smith study of MDS contents

24 Test Grids Grid Size (Hosts)Query 50,000“Find n hosts with 3 GB of memory” 500,000“Find n hosts with 3 GB of memory” 5,000,000“Find n hosts with 3 GB of memory” 10,000“Find 2 close hosts” 50,000“Find 2 close hosts” 100,000“Find 2 close hosts”

25 Nondeterministic query performance Meaningful tradeoff between query processing time and result set size is possible Select two hosts that together have >3GB of RAM

26 Nondeterministic query performance Can use tradeoff to control query time independent of query complexity Select n hosts that together have >3GB of RAM, holding query time constant

27 Deadlines Find 2 hosts with collective 600 GB RAM (VERY RARE) in 50K host grid Max Min

28 Extending RGIS to Support Grid Computing On Virtual Machines Virtuals –Each RGIS object has a unique id –Virtualization table associates unique id of virtual resources with unique ids of their constituent physical resources –Virtual nature of resource is hidden unless query explicitly requests it Futures –An RGIS object that does not exist yet –Futures table of unique ids –Future nature of resource hidden unless query explicitly requests it

29 Related Work SLP, X.500, LDAP Condor ClassAds MDS R-GMA Redline Random sampling from databases –Olsen, others

30 Conclusions GIS system based on relational data model Powerful queries, but expensive to execute Nondeterminism to control query time –Can be implemented without RDMBS support –Automated query translation in RGIS Several techniques to implement deadlines for queries

31 People and Acknowledgements Students –Jason Skicewicz, Andrew Weinrich (Web + Soap), Jack Lange (CDN) Collaborator –Relational Grid Resources Project at Indiana Beth Plale Funder –NSF

32 For More Information URGIS Site – Prescience Lab – Join The User Comfort Study! Special Advertising Section