Kangseok Kim, Marlon E. Pierce Community Grids Laboratory, Indiana University

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

Distributed Systems basics
Scenario ResultsEase of Use Ease of Use captures intangible aspects of performance of a grid service, in particular, amount of work client must undertake.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Technical Architectures
The Virtual Microscope Umit V. Catalyurek Department of Biomedical Informatics Division of Data Intensive and Grid Computing.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Based on last years lecture notes, used by Juha Takkinen.
Distributed Database Management Systems
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Distributed Systems: Client/Server Computing
The Client/Server Database Environment
Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
Chapter 2 Database System Architecture. An “architecture” for a database system. A specification of how it will work, what it will “look like.” The “ANSI/SPARC”
IoTCloud Platform – Connecting Sensors to Cloud Services Supun Kamburugamuve, Geoffrey C. Fox {skamburu, School of Informatics and Computing.
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
Enterprise JavaBeans. What is EJB? l An EJB is a specialized, non-visual JavaBean that runs on a server. l EJB technology supports application development.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
9 September 2008CIS 340 # 1 Topics reviewTo review the communication needs to support the architectures variety of approachesTo examine the variety of.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
SALSA HPC Group School of Informatics and Computing Indiana University.
OS2- Sem ; R. Jalili Introduction Chapter 1.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Distributed Computing Systems CSCI 4780/6780. Distributed System A distributed system is: A collection of independent computers that appears to its users.
Distributed Computing Environment (DCE) Presenter: Zaobo He Instructor: Professor Zhang Advanced Operating System Advanced Operating System.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
DISTRIBUTED DATABASES JORGE POMBAR. Overview Most businesses need to support databases at multiple sites. Most businesses need to support databases at.
Distributed database system
Experiences with OGSA-DAI : Portlet Access and Benchmark Deepti Kodeboyina and Beth Plale Computer Science Dept. Indiana University.
Indexes and Views Unit 7.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Web Technologies Lecture 8 Server side web. Client Side vs. Server Side Web Client-side code executes on the end-user's computer, usually within a web.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Seminar on Service Oriented Architecture Distributed Systems Architectural Models From Coulouris, 5 th Ed. SOA Seminar Coulouris 5Ed.1.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Web Technologies Lecture 13 Introduction to cloud computing.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
+ Support multiple virtual environment for Grid computing Dr. Lizhe Wang.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
James A. Senn’s Information Technology, 3rd Edition
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
The Client/Server Database Environment
ECRG High-Performance Computing Seminar
Definition of Distributed System
#01 Client/Server Computing
Advanced Operating Systems
Chapter 17: Database System Architectures
Scalable, distributed database system built on multicore systems
Multiple Processor Systems
Chapter 17: Client/Server Computing
Multiple Processor and Distributed Systems
Introduction To Distributed Systems
Performance And Scalability In Oracle9i And SQL Server 2000
Database System Architectures
#01 Client/Server Computing
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Kangseok Kim, Marlon E. Pierce Community Grids Laboratory, Indiana University Rajarshi Guha School of Informatics, Indiana University

Huge increase in the size of datasets in a variety of fields, e.g.  Scientific observations for e-Science  Sensors (video, environmental)  Data fetched from Internet defining users interests  Need data management and partitioning and processing strategies that are scalable We also need to find effective ways to use our overabundance of computing power.  Cloud computing and virtualization  The partitioning of database over virtual private servers can be a critical factor for scalability and performance.  The purpose of the virtual private servers’ use is to facilitate concurrent access to individual applications (databases) residing on multiple virtual platforms on a single or multiple physical machines with effective resources’ use and management, as compared to an application (database) on a physical machine 2

Database system is composed of three tiers  web service (WS) client (front-end)  web service and message service system (middleware)  agents and a collection of databases (back-end) Distributed database system allows WS clients to access data from databases distributed over virtual private servers. Databases are distributed over multiple virtual private servers by fragmenting data using two different methods:  data clustering  horizontal (or equal) partitioning The distributed database system is a network of two or more PostgreSQL databases that reside on one or more virtual private servers.  Lab uses 8 virtual private servers over one physical machine with OpenVZ virtualization technology WS client can simultaneously access (or query) the data in several databases in a single distributed environment.  SQMD (Single Query Multiple Database) mechanism which transmits a single query that synchronously operates on multiple databases, using publish/subscribe paradigm. 3

Query/Response Message / Service System (Broker) Web Service Message Service WS Client (Front-end User Interface) Web Server Query/Response DB Host Server DB Agent (JDBC to PostgreSQL) Topics: 1. Query/Response 2. Heart-beat DB Host Server DB Agent (JDBC to PostgreSQL) DB Host Server DB Agent (JDBC to PostgreSQL) 4

SELECT cid, structure FROM pubchem_3d WHERE cube_enlarge ( COORDS, R, 12 momsim cid – compoundID Pubchem_3d – 3D structure for public repository of chemical information including connection tables, properties and biological assay results COORDS - 12-D shape descriptor of query molecule R – user specified distance cutoff to retrieve those points from the database whose distance to the query point cube_enlarge - PostgreSQL function that generates the bounding hypercube from the query point momsim - 12-D CUBE field The example query means to find all rows of the database for which the 12-D shape descriptor lies in the hypercubical region defined by cube_enlarge Total number of hits for varying R, using the above query 5 R Total number of response data 4956,87037,049113,123247,171 Size in bytes80,8371,121,1816,043,33718,447,43840,302,297

6 T query T response T client2ws (Transit cost) T ws2db (WS cost) ………….……… ………………….. Web Service (WS) Client WS Broker DB Agent Total latency = T client2ws + T ws2db T aggregation T agent2db

7 time to transmit a query (T query ) to and receive a response (T response ) from the web service running on web server time spent in the web service for serially aggregating responses from databases time between submitting a query from an agent to and retrieving the responses of the query from a database server including the corresponding execution time of the agent  As the distance R increases, the time needed to perform a query in the database increases since the size of result set increases and thus the query processing cost clearly becomes the biggest portion of the total cost.

We show the performance of a query/response interaction mechanism between a client and distributed databases, with overheads associated with virtualization deployments compared to real (physical) host deployments, and also with two different data partitioning strategies – horizontal partitioning vs. data clustering. In our experiment with virtual private servers  in case of using data clustering method  we allocated the memory into each virtual server in proportion to the size of each cluster  in case of using horizontal partitioning method  we allocated the memory into each server in same size 8

9  Using horizontal partitioning is faster than using data clustering since fragments partitioned by the data clustering method can be different in the number of dataset.

10  As the responses occurred in performing a query in a large size of cluster increase, the time needed to perform the query in the cluster increases as well.  In other words the total active (hash) index set for the query increases as the distance R increase.  To avoid as much disk access as possible and thus to improve the query processing performance, the total index set is needed to fit in main memory.

11

SQMD mechanism, based on publish/subscribe paradigm, transmits a single query that simultaneously operates on multiple databases, hiding the details about data distribution in middleware to provide the transparency of the distributed databases to heterogeneous web service clients. The results of experiments with our distributed system indicate the performance in using virtual private servers on a machine (host) is comparable to that in using eight physical machines (hosts). In future work, we need to decrease the workload for aggregating the results of a query in web service. We will investigate the use of the M-tree index.  M-tree indexes allow one to perform queries using hyperspherical regions.  This would allow us to avoid the extra hits we currently obtain due to the hypercube representation. To eliminate the unnecessary query processing with some databases distributed by the data clustering method, we should consider the query optimization that allows a query to localize into some specific databases in future work. In future work we will extend the evaluation for the (optimized) effective use of other resources as well as memory with our distributed database system. 12