1 Ömer Korçak Cmpe 521 Mariposa: A Wide Area Distributed Database System.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Dimensional Modeling.
Chapter 10: Designing Databases
1 Mariposa system Witold Litwin. 2 Basic goals WAN oriented DDBS Multiple sites –e.g., 1000 Scalable Locally autonomous Easy to evolve.
Powerpoint Templates Page 1 Powerpoint Templates Page 2 Something you own that has value There can be assets that gain value over time…. What is an Asset?
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Lab Practical 1 Study about different types of networks
Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara
Technical Architectures
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Integration of Applications MIS3502: Application Integration and Evaluation Paul Weinberg Adapted from material by Arnold Kurtz, David.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 2 Introduction to Database Development.
Introduction to Database Development. 2-2 Outline  Context for database development  Goals of database development  Phases of database development.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Systems: Client/Server Computing
Chapter 1 Introduction to Databases
PostgreSQL Enhancement PopSQL Daniel Basilio, Eril Berkok Julia Canella, Mark Fischer Misiu Godfrey, Andrew Heard.
Distributed Databases
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
WP6: Grid Authorization Service Review meeting in Berlin, March 8 th 2004 Marcin Adamski Michał Chmielewski Sergiusz Fonrobert Jarek Nabrzyski Tomasz Nowocień.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Database Design – Lecture 16
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Distributed File Systems
Databases and Database Management Systems
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Csi315csi315 Client/Server Models. Client/Server Environment LAN or WAN Server Data Berson, Fig 1.4, p.8 clients network.
Mariposa: a wide-area distributed database system Kumar Ramdurgkar. CIS 661.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
1-1 System Development Process System development process – a set of activities, methods, best practices, deliverables, and automated tools that stakeholders.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Distributed Database Systems Overview
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
1 Chapter Overview Performing Configuration Tasks Setting Up Additional Features Performing Maintenance Tasks.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
DDBMS Distributed Database Management Systems Fragmentation
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
1 Distributed Databases BUAD/American University Distributed Databases.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
Distributed database system
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Distributed Database: Part 2. Distributed DBMS Distributed database requires distributed DBMS Distributed database requires distributed DBMS Functions.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
SHUJAZ IBRAHIM CHAYLASY GNOPHANXAY FIT, KMUTNB JANUARY 05, 2010 Distributed Database Systems | Dr.Nawaporn Wisitpongphan | KMUTNB Based on article by :
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
PLM, Document and Workflow Management
MANAGING DATA RESOURCES
Distributed Databases
Database System Architectures
Presentation transcript:

1 Ömer Korçak Cmpe 521 Mariposa: A Wide Area Distributed Database System

2 What is Mariposa? Wide Area Distributed Database Management System. Ongoing project in UC Berkeley. Addresses the fundamental problems to the standard approach to distributed database management.

3 Why Mariposa? To date, distributed database management systems have been designed for local-area networks (LAN’s) –Single administrative structure: Few servers operating within one administrative domain, such as one company or one department within a company. –Uniformity: These systems assume uniformity of all processors and network connections within the system. –Static data allocation: Data movement in these systems is a very “heavyweight” operation and is performed manually by a database administrator. The requirements for Wide-Area distributed database systems differ dramatically from those of Local Area Network Systems. –Individual sites usually report to different system admins. –Have different access and charging algorithms. –Install site-specific data type extensions. –Different constraints on servicing remote requests. –There may be many sites participating in a WAN distributed DBMS.

4 Main Goals of Mariposa  Scalability to a large number of cooperating sites. In a WAN environment, there may be a large number of sites. Their goal is to scale to 10,000 servers.  Local autonomy. Each site must have control over its resources. This includes which objects to store and which queries to run. Query and data allocation cannot be done by a central, authoritarian query optimizer.  Data mobility. It should be easy and efficient to change the “home” of an object. Preferably, the object should remain available during movement.  No global synchronization. Updates and schema changes should not force a site to synchronize with all other sites. Otherwise, many common operations will have exceptionally poor response time.  Easily configurable policies. It should be easy for a local database administrator to change the behavior of a Mariposa site. A Mariposa system should respond gracefully to changes in user activity and data access patterns to maintain low response time and high system throughput.

5 Traditional distributed DBMSs do not meet these requirements. –Use of an authoritarian, centralized query optimizer does not scale well. –The high cost of moving an object between sites restricts data mobility. –Schema changes typically require global synchronization –Centralized management designs inhibit local autonomy and flexible policy configuration. One could claim that these are implementation issues. However, the author argue that traditional distributed DBMSs cannot meet these requirements because of fundemental architectural reasons. A new architecture is introduced. It uses a microeconomic framework.

6 Overview of the Architecture All distributed DBMS issues (query optimization, data movement, name service, etc) are reformulated in microeconomic terms. Implementation of the economic paradigm involves a number of entities and mechanisms. All Mariposa clients and servers have an account on a network bank User allocates a budget for each query. The goal of the query processing system is to solve the query within the alloted budget by contracting with various Mariposa processing sites to perform portion of the query. Each query is administered by a broker, which obtains bids for pieces of query from various sites.

7 Mechanism Instead of using centralized metadata to determine where to run the query, the broker finds sites that might want to bid on the query. (By using distributed advertising service) A server can join to the Mariposa system by buying objects from other sites, advertising its services and then bidding on queries. It can leave the system by selling its objects and ceasing to bid.  Large number of sites supported, highly scalable system. Each site makes storage decisions to buy and sell fragments, based on optimizing the revenue it expects to collect. Mariposa Objects have no notion of home. Their owner could change rapidly as objects are moved.  Data mobility, free trade of objects.  Global synchronization is avoided by the use of some economic paradigms like Replication. Each site is free to bid on any business of interest.  Total local autonomy

8 The Mariposa Architecture

9 An Example A company that sell widgets. Has offices in San Francisco, Chicago, New York and Miami. Database of the company includes a table called WIDGETS which contains pricing and inventory information on all the company’s widgets. Widgets are warehoused in New York and Miami –Company keeps half of the WIDGETS table in New York and half in Miami. In Mariposa, splitting a table is called fragmentation. And the pieces of tables are called fragments. Here WIDGETS table is fragmented into WIDGETS1 and WIDGETS2. If purchasing manager in San Fransisco wants to retrieve all the records from WIDGETS table, he would enter “SELECT * FROM WIDGETS”. The site where the query is entered, San Fransisco in this case, is called the home site. The query is sent from the frontend application to the Mariposa program running on the server on San Francisco.

10 Example (cont) The query is passed through: –Parser,which checks the syntactic correctness of the query. –Optimizer, which produces a query plan that describes the order in which different steps in the plan will be executed. –Fragmenter, which changes the plan produced by the optimizer to reflect the data fragmentation. The resulting plan is called fragmented query plan. In order to do their work, parser, optimizer and fragmenter needs information about data types, fragment location, etc. This information is maintained by a Mariposa name server. –In our example the name server is in the Chicago office. The fragmented query plan describes the operations that will be performed in order to execute the query, and the order in which they will be carried out. –In our example, the purchasing manager’s query, “SELECT * FROM WIDGETS” is represented by a query plan which scans the two WIDGETS fragments, WIDGETS1 and WIDGETS2, and merges the result. The fragmented query plan is passed to the query broker, whose job it is to decide where each piece of the fragmented query plan will be executed.

11 Example (cont) The query broker contacts the bidder module at each potential processing site. The broker waits for responses from the bidders before selecting the best ones. After the query broker has specified the processing sites, the backend’s coordinator module takes over. The coordinator notifies the remote sites to begin processing, collects the results, and returns the answer to the client program.

12

13

14 Broker Responsible for getting the query performed on behalf of the user Receive a budget from the user to pay for the query Find possible bidders by examining the Ad Table. –Finding bidders is achieved through an advertising mechanism –Servers announce their willingness to perform various services by posting ads. –Name servers keep a record of these adsin an Ad Table. Contact possible bidders and act as auctioneer Coordinate the query execution

15 Bidder One bidder per site Responds to queries issued by brokers Define bids so to maximize system use and site revenue Follow site predefined policies Storage Manager Store the fragments and their revenue history –Revenue history is good predictor for future revenue. Decide to buy and sell fragments so to maximize memory use and site revenue

16 Name Server Mariposa uses decentralized naming facility. Four Structure used in object naming: –Internal names: Location dependent. Used to determine the physical location of a fragment. –Full names: Completely specified names that uniquely identify an object. Location independent. (Full name is still valid when an object moves) –Common names: User-specific, partially specified names. Using them avoids the tedium of using full names Simple rules permit translation from common name to full name. –Name context: Set of affiliated names. Names within a context are expected to share some feature.  Names do not have to be globally registered.

17 Conclusions They present a distributed microeconomic approach for managing query execution and storage management. The economic model reduces the scheduling complexity of distributed intractions because it does not seek globally optimal solutions. They test the power and flexibility of Mariposa through experiments running over a WAN and results are positive. Mariposa is an ongoing project and they are continuing to implement more sophisticated features. Authors: Michale Stonebreaker, Paul M. Auki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu.