© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved MOCHA : A Self-Extensible Database Middleware System for Distributed Data Sources Manuel Rodriguez-Martinez.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Chapter 1: The Database Environment
Chapter 27 Software Change.
Distributed Systems Architectures
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Remote Educational Programming Of Robots (REPOR) Tord Fauskanger Aurelie Aurilla Bechina Arntzen Dag Samuelsen Buskerud University College.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
Trusted Query Network (TQN) A Novel Approach to Generating Information Security Data Vijay Vaishnavi Richard Baskerville Art Vandenberg Jack Zheng Department.
11 Application of CSF4 in Avian Flu Grid: Meta-scheduler CSF4. Lab of Grid Computing and Network Security Jilin University, Changchun, China Hongliang.
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
NetSEC: metrology-based application for network security Jean-François SCARIOT Bernard MARTINET Centre Interuniversitaire de Calcul de Grenoble TNC 2002.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
©2003 aQute, All Rights Reserved Tokyo, August 2003 : 1 OSGi Service Platform Tokyo August 28, 2003 Peter Kriens CEO aQute, OSGi Fellow
19 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Developing Web Services.
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.
Presented to: By: Date: Federal Aviation Administration Registry/Repository in a SOA Environment SOA Brown Bag #5 SWIM Team March 9, 2011.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Addition Facts
Eldas 1.0 Enterprise Level Data Access Services Design Issues, Implementation and Future Development Davy Virdee.
Universitá degli Studi di LAquila Mälardalens Högskola, Västerås 10th September 2009 Integrating Wireless Systems into Process Industry and Business Management.
The Impact of Soft Resource Allocation on n-tier Application Scalability Qingyang Wang, Simon Malkowski, Yasuhiko Kanemasa, Deepal Jayasinghe, Pengcheng.
Video Services over Software-Defined Networks
Database Systems: Design, Implementation, and Management
Introduction Lesson 1 Microsoft Office 2010 and the Internet
INTERNET PROTOCOLS Class 9 CSCI 6433 David C. Roberts Entire contents copyright 2011, David C. Roberts, all rights reserved.
Configuration management
Software change management
The IP Revolution. Page 2 The IP Revolution IP Revolution Why now? The 3 Pillars of the IP Revolution How IP changes everything.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Managing Web server performance with AutoTune agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigu Jangwon Han Seongwon Park
Chapter 1: Introduction to Scaling Networks
1 The phone in the cloud Utilizing resources hosted anywhere Claes Nilsson.
ABC Technology Project
Adding services to PA and Plesk infrastructure with APS Ilya Baimetov Director of Program Management, Automation.
ICS 434 Advanced Database Systems
A Comparison of HTTP and HTTPS Performance Arthur Goldberg, Robert Buff, Andrew Schmitt [artg, buff, Computer Science Department Courant.
Squares and Square Root WALK. Solve each problem REVIEW:
Database System Concepts and Architecture
31242/32549 Advanced Internet Programming Advanced Java Programming
Chapter 9: The Client/Server Database Environment
Executional Architecture
Implementation Architecture
Chapter 5 Test Review Sections 5-1 through 5-4.
HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Requirements Analysis 1. 1 Introduction b501.ppt © Copyright De Montfort University 2000 All Rights Reserved INFO2005 Requirements Analysis Introduction.
25 seconds left…...
Equal or Not. Equal or Not
Slippery Slope
Week 1.
We will resume in: 25 Minutes.
VPN AND REMOTE ACCESS Mohammad S. Hasan 1 VPN and Remote Access.
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
1 The MOCHA Project Goals: –Transparent access to distributed data sources –Scaleable middleware architecture –Automatic deployment of code (Plug-N-Play)
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved Automatic Deployment of Application-Specific Metadata and Code in MOCHA Manuel Rodriguez-Martinez.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Disklets –Take streams as inputs, generate streams as outputs –Streams accessed using interface that delivers data in buffers with known size –Cannot allocate.
The Future of MOCHA Nick Roussopoulos October 5, 2001.
#01 Client/Server Computing
#01 Client/Server Computing
Presentation transcript:

© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved MOCHA : A Self-Extensible Database Middleware System for Distributed Data Sources Manuel Rodriguez-Martinez Nick Roussopoulos

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 2 Motivation Data Sources are distributed and heterogeneous: Fact of Life... Client Oracle 8iInformixXML DataText Data Internet

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 3 Client-Server Connectivity 2-tier architecture means FAT Clients Client Oracle 8iInformixXML DataText Data Internet Not a Good Idea

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 4 Middleware Integration Service Client Oracle 8iInformixXML DataText Data Internet Translator Middleware is a 3-tier connectivity solution – Thin Clients Integration Server Catalog

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 5 Problem 1: Code Deployment User-defined types and functions –Polygon –Composite() – image aggregation Porting and manual installation of code –Operating system –Hardware platform Expensive Software Maintenance –Updates –Version management Security –Software certification

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 6 Problem 1: Code Deployment Client Oracle 8iInformixXML DataText Data Internet Translator Not Scalable – Expensive System Growth Integration Server Catalog

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 7 Problem 2: Query Processing Operator placement options –Limited by site-dependent software Composite() – got to have it before using it! Most processing at Integration Server –Powerful Data Servers are under-utilized I/O Nodes –Excessive data movement over the network Network bottleneck Unfeasible in WANs, Internet

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 8 Problem 2: Query Processing Client Oracle 8iInformixXML DataText Data Internet Translator Not Scalable – Inefficient evaluation of queries Integration Server Catalog 100MB

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 9 MOCHA Solution: Ship Code! Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Client Oracle Informix DAP QPC Code Repository Catalog Internet Virginia Maryland Virginia Texas

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 10 MOCHA Solution: Filter Data! Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Client Oracle Informix DAP QPC Code Repository Internet Virginia Maryland Virginia Texas Catalog 200MB tuples 100MB tuples results 200KB results 150KB results 150KB results 200KB results 150KB results 200KB results 350KB results 350KB

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 11 MOCHA Goals Automatic Deployment of Code (self-extensible) –QPC ships compiled Java classes User-defined types and functions –XML for their metadata (easy exchange) Data processing at data source sites –Utilize powerful machines On-site data distillation Processing based on data movement reduction –Filter data at the data sources –Expand data near the clients

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 12 The MOCHA Architecture Client Informix Oracle QPC DAP Code Repository Catalog Multi-threaded Distributed Objects Coordination Thread Execution Thread

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 13 QPC: The Integration Server Client API Query Parser Catalog Manager Query Optimizer Execution Engine Code Loader SQL & XML Proc. Interface DAP Access API XML Catalog Code Repository DAP QPC Controls and Coordinates Query Execution

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 14 DAP: The Facilitator of Data DAP Provides QPC with Remote Access to the Data Data Source DAP Access API Control Module Execution Engine Code Loader SQL & XML Proc. Interface Data Source Access Layer JDBCI/O APIDOMJNI 100MB tuples 100MB tuples 100MB tuples results 150KB 100MB tuples

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 15 Road Map Introduction Problem Definition MOCHA Architecture Query Processing Experiments Summary

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 16 Processing The Queries Issue 1: Placement and deployment of operators –Which operators go to QPC, and which go to the DAPs? Issue 2: How to determine this placement? –Dynamic programming [SAC+79], [ML86] –But search space is enormous Placement of UDF, joins, execution sites … Plenty of bad plans èIn MOCHA: Query Optimization based on heuristics –Network usually is the critical factor optimize for it first –CPU and I/O are cheaper optimize for them later –Quickly converge to a good plan

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 17 Operator Placement Data-Reducing Operators –Filter the data –Aggregates, predicates, projections, semi-joins Composite(), Overlaps(), AvgEnergy() Push to the DAPs –Code Shipping policy (Unique to MOCHA) –Only send back distilled results +Less data movement Cost: –Computation cost –Transfer of filtered results

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 18 Operator Placement Data-Inflating Operators –Expand the data –projections, image processing, some joins … DoubleResolution(), RotateSolid() Pull to the QPC –Data Shipping policy [FJK96] –Only send back raw arguments +Less data movement Cost: –Computation cost –Transfer of raw argument values

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 19 Placement Metric: VRF Volume Reduction Factor : Given operator and relation R, then VDT - volume of data transmitted after applying to R VDA - volume of data originally present in R is Data-Reducing VRF < 1 Composite() is Data-Inflating VRF 1 DoubleRes()

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 20 Goal: Plans with small CVRF Cumulative Volume Reduction Factor: Given a plan P to solve query Q over relations R1, …, Rn CVDT - volume of data transmitted by applying all operators in P to R1, …, Rn CVDA- volume of data originally present in R1, …, Rn Search Space Optimizer searches for plans that move minimal amount of data. CVRF(Plan) [0,1]

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 21 Performance Evaluation Goals of this study: –Measure how good code shipping can be –Validate heuristics being proposed VRF CVRF –Guide implementation of the optimizer Configured MOCHA with plans that place operators based on heuristics.

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 22 Experimental Environment Sequoia 2000 Benchmark –scientific data - points, polygons, satellite images –Distributed applications Software and Hardware: –JDK 1.2 –QPC - Sun Ultra 60, Solaris 2.6 –DAPs - Sun Ultra 1, Sun Ultra5, Solaris 2.6 –Data Sources 2 Informix IUS 9.12 Server –10 Mpbs Ethernet

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 23 Reducing vs. Inflating Runnning Time (secs) QPC DAP Query Class Q1Q2Q3 Query classes –Composite of all images –Clipping and sub-setting –Double resolution of images Performance gains –composites 99% data reduction 4-1 better performance –clipping and expansion 80% data reduction 3-1 better performance Validates heuristics

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 24 VRF vs Selectivity Select graphs identifiers based on number of vertices and arc length 5Selectivity [HS93] and cardinality [HKWY97] are not enough for distributed predicate placement Need to also consider size of arguments for predicates! Consider 50% selectivity –DAP CVRF = 0.01 –QPC CVRF = 1 Runnning Time (secs) Selectivity QPC DAP QPC DAP QPC DAP QPC DAP QPC DAP VRF is a better metric

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 25 Implementation Status Operational System –SIGMOD 2000 Demo Experimental deployment of MOCHA –NASA Earth Scientists (ESIP Federation) –Goddard Space Flight Center –NCSA Land Cover Visualization Tool

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 26 Summary and Conclusions Proposed a new Middleware Architecture: MOCHA –Automatic Code Deployment (self-extensible) Shipping Java classes –Query processing based on data movement reduction Proposed VRF metric for placement of functions –Better than selectivity and result cardinality Future work –Deployment of MOCHA for NASA ESIP Federation –Full implementation of MOCHA Optimizer More Info: –

SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 27 Problem 2: Query Processing Client Oracle 8iInformixXML DataText Data Internet Translator Not Scalable – Inefficient evaluation of queries Integration Server Catalog 100MB 200MB