Peter Dinda Department of Computer Science Northwestern University Beth Plale Department.

Slides:



Advertisements
Similar presentations
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Advertisements

Three Questions You Should Ask William Gropp Mathematics and Computer Science.
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer.
Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Final Project of Information Retrieval and Extraction by d 吳蕙如.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Communication Pattern Based Node Selection for Shared Networks
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
A Decentralized Relational Information Service for Large Scale Distributed Computing Thesis Proposal April 2 nd, 2004 Dong Lu Committee Peter A. Dinda.
Virtuoso: Distributed Computing Using Virtual Machines Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University
Virtuoso: Distributed Computing Using Virtual Machines Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University
1 Dong Lu, Peter A. Dinda Prescience Laboratory Department of Computer Science Northwestern University Evanston, IL GridG: Synthesizing Realistic.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Hardness of Approximation and Greedy Algorithms for the Adaptation Problem in Virtual Environments Ananth I. Sundararaj, Manan Sanghi, John R. Lange and.
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
1 Dong Lu, Peter A. Dinda Prescience Laboratory Computer Science Department Northwestern University Virtualized.
JDBC. In This Class We Will Cover: What SQL is What ODBC is What JDBC is JDBC basics Introduction to advanced JDBC topics.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Peter Dinda Department of Computer Science Northwestern University.
Chapter 27 Q and A Victor Norman IS333 Spring 2015.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Best Western Green Bay CHEMS 2013 SYSTEM ARCHITECTURE.
Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Indiana University’s Name for its Sakai Implementation Oncourse CL (Collaborative Learning) Active Users = 112,341 Sites.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Resource Predictors in HEP Applications John Huth, Harvard Sebastian Grinstein, Harvard Peter Hurst, Harvard Jennifer M. Schopf, ANL/NeSC.
1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.
Experiences with OGSA-DAI : Portlet Access and Benchmark Deepti Kodeboyina and Beth Plale Computer Science Dept. Indiana University.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Streaming Big Data with Self-Adjusting Computation Umut A. Acar, Yan Chen DDFP January 2014 SNU IDB Lab. Namyoon Kim.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
Revision Mid 2, Cache Prof. Sin-Min Lee Department of Computer Science.
1 Introduction to Computers Prof. Sokol Computer and Information Science Brooklyn College.
Copyright ©2003 Dell Inc. All rights reserved. Scaling-Out with Oracle® Grid Computing on Dell™ Hardware J. Craig Lowery, Ph.D. Software Architect and.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
Hierarchical Memory Systems Prof. Sin-Min Lee Department of Computer Science.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
Database 12.2 and Oracle Enterprise Manager 13c Liana LUPSA.
Cheltenham Courseware
Introduction to Load Balancing:
NASA Space Communications Symposium
2016 Citrix presentation.
100% Exam Passing Guarantee & Money Back Assurance
Grid Information Services: alternate models
Migration Strategies – Business Desktop Deployment (BDD) Overview
Ch 4. The Evolution of Analytic Scalability
Performance And Scalability In Oracle9i And SQL Server 2000
Performance And Scalability In Oracle9i And SQL Server 2000
Presentation transcript:

Peter Dinda Department of Computer Science Northwestern University Beth Plale Department of Computer Science Indiana University “Find 2 hosts with Linux that together have 3 GB of RAM” select h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 But such queries can be very expensive to execute However, we typically don’t need the entire result set, just some rows, and not always the same ones And we need them in a bounded amount of time Approach: return random sample of result set select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds SELECT H1.INSERTID, H2.INSERTID FROM HOSTS H1, HOSTS H2, INSERTIDS TEMP_H1, INSERTIDS TEMP_H2 WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) AND (H1.INSERTID=TEMP_H1.INSERTID AND TEMP_H1.rand > AND TEMP_H1.rand AND TEMP_H2.rand <= ) Query Manager and Rewriter Random sample of input tables Probability of inclusion determined by time constraint and server load Select two hosts that together have >3GB of RAM 500,000 host grid generated by GridG Memory distribution according to Smith study of MDS contents Dual Xeon 1 GHz, 2 GB, 240 GB RAID, RGIS2, Oracle 9i Enterprise Average of five trials A Framework for Unifying Static and Dynamic Grid Information Non-deterministic Queries For Fast, Efficient Compositional Queries GridG: A Tool For Generating Realistic Computational Grids Select n hosts that together have >3GB of RAM 500,000 host grid generated by GridG Memory distribution according to Smith study of MDS contents Dual Xeon 1 GHz, 2 GB, 240 GB RAID, RGIS2, Oracle 9i Enterprise Average of five trials Generates a Grid as an annotated topology graph Hosts, routers, links Graph conforms to power laws of Internet topology Annotations include memory, clock speed, cpu type, number of CPUs, operating system type, link bandwidths, router bandwidths, etc. To appear in Performance Evaluation Review Queries for Compositions of Resources Easily Expressed in SQL Distributed resources (hosts, clusters, switches, routers) Monitoring (NWS, Remos, RPS, vmstat) SQL query Query for ‘static’ data passed to database Query for highly dynamic data passed to stream queries Query rewrite Query response Relational DB back end Stream query Query routing and Results Merge Controlled updates RGIS Servers for Objects With Relatively Static Properties -- SQL queries run over streaming data so results are extremely timely -- Stream queries can control update frequency while returning freshest possible data if freshest is required. If query is investigative (i.e. exploratory), then database can be queried to return results that are ‘good enough’. -- Stream queries compute freshness and provenance metrics for dynamic data. Advantages of Stream Queries: dQUOB-based Servers for Streams and Objects with Dynamic Properties Nondeterministic Queries Implemented Using Rewriting