Disklets –Take streams as inputs, generate streams as outputs –Streams accessed using interface that delivers data in buffers with known size –Cannot allocate.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved MOCHA : A Self-Extensible Database Middleware System for Distributed Data Sources Manuel Rodriguez-Martinez.
System Integration and Performance
Database System Concepts and Architecture
Database Architectures and the Web
1 The MOCHA Project Goals: –Transparent access to distributed data sources –Scaleable middleware architecture –Automatic deployment of code (Plug-N-Play)
3: OS Structures 1 OPERATING SYSTEM STRUCTURES PROCESS MANAGEMENT A process is a program in execution: (A program is passive, a process active.) A process.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
2: OS Structures 1 Jerry Breecher OPERATING SYSTEMS STRUCTURES.
Figure 1.1 Interaction between applications and the operating system.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Concepts of Database Management Sixth Edition
03/05/2008CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.
9 Copyright © Oracle Corporation, All rights reserved. Oracle Recovery Manager Overview and Configuration.
Database Management Systems (DBMS)
© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved Automatic Deployment of Application-Specific Metadata and Code in MOCHA Manuel Rodriguez-Martinez.
Conceptual Architecture of PostgreSQL PopSQL Andrew Heard, Daniel Basilio, Eril Berkok, Julia Canella, Mark Fischer, Misiu Godfrey.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 Chapter 3 Database Architecture and the Web Pearson Education © 2009.
Understanding the CORBA Model. What is CORBA?  The Common Object Request Broker Architecture (CORBA) allows distributed applications to interoperate.
5 Copyright © 2004, Oracle. All rights reserved. Using Recovery Manager.
Chapter 1. Introduction What is an Operating System? Mainframe Systems
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-7 Memory Management (1) Department of Computer Science and Software.
Chapter 4 The Relational Model 3: Advanced Topics Concepts of Database Management Seventh Edition.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Active Disks: Programming Model, Algorithm and Evaluation Anurag Acharya, Mustafa Uysal, Joel Saltz.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.
CS4432: Database Systems II Query Processing- Part 2.
Client-Server Paradise ICOM 8015 Distributed Databases.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 8: Main Memory.
Chapter 8: Memory Management. 8.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 8: Memory Management Background Swapping Contiguous.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Topic 4: Distributed Objects Dr. Ayman Srour Faculty of Applied Engineering and Urban Planning University of Palestine.
Databases and DBMSs Todd S. Bacastow January 2005.
Table General Guidelines for Better System Performance
Open Source distributed document DB for an enterprise
Parallel Data Laboratory, Carnegie Mellon University
The Client/Server Database Environment
Introduction What is a Database?.
Database Performance Tuning and Query Optimization
Database System Concepts and Architecture
Ch > 28.4.
Chapter 15 QUERY EXECUTION.
Table General Guidelines for Better System Performance
Software models - Software Architecture Design Patterns
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 11 Database Performance Tuning and Query Optimization
Database System Architectures
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Map Reduce, Types, Formats and Features
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Disklets –Take streams as inputs, generate streams as outputs –Streams accessed using interface that delivers data in buffers with known size –Cannot allocate or free memory –Has access only to pre-allocated buffers and scratch memory –Cannot initiate I/O operations

Streams Disk resident streams – files or ranges in files Host-resident streams – used by host-resident code to interact with disklets Pipe streams – used to pipe results of one disklet into another Streams are accessed using interface that delivers data in buffers with size known a-priori Disklets must have at least one input and one output stream

Additional Properties of Disklets Initialization function that is run when disklet is installed Processing function (read/write) which is run as data is read/written Long term scratch space, set of parameters to customize behavior, finalization function run when disket terminates Disklet cannot initiate I/O – all I/O unctions are initiated by host- resident program and checked for validity by host-resident file-system –Disklets cannot corrupt file system –OS layer on disk need not provide file-system functionality –Disklet is allowed to skip sub-ranges in input stream by notifying OS layer on disk –Makes possible algorithms which use indexing through use of two streams Data delivered on index stream used to decide which parts of data stream are to be read and which are to be skipped

More disklet properties Disklet cannot allocate or free memory – all memory management is done by operating-system layer on disk Memory accesses must be within a sandbox defined by buffers for input streams and long term scratch space Disklet binary is analyzed at download-time – disklets that may violate memory safety are rejected Communication between disklet and environment are restricted to input and output streams Sources and sinks are specified by host-resident program as part of disklet installation –Disklet cannot determine where input comes from or where output goes Figure disklet psuedocode

DiskOs Services –Memory management Stream based model simplifies memory management as memory is allocated in contiguous blocks whose size is known a priori and lifetime of blocks is known –Stream communication All stream buffers are preallocated –Disklet scheduling Ready to run when new data is available on one or more streams Host level support –Installation of disklets and management of host-resident streams

Utility of Active Disks Figure 4 –Compare conventional, active disk for 4, 32 disk configurations Select, groupby are not helped much by active disks –Perform little computation per byte of data Cube, Sort, Conv, Earth show major improvements, at least in part due to ability to distribute processing Figure 5 –Impact of variations in interconnect bandwidth 40 MB/s Ultra-SCSCI, 200 MB/s fibre channel, 400 MB/s Cube, Sort, Conv, Earth are compute limited so bandwidth doesn’t matter but Active Disks help Select, Groupby, Cube –Bandwidth limited with conventional disks, not bandwidth limited with active diks Figure 7 – Scalability Figure 8 – Impact of variation in central processor

Mocha Ship Java code implementing query operations and user defined functions Query plans push data reducing operations to the data source sites while executing data inflating operations at client sites Implemented in Java and runs on top of Informix and Oracle Data integration server – provides client applications with uniform view and access mechanisms to data at each source Impose global data model on top of local data model used for each source –Database server configured to access remote data source through database gateway –Mediator used as integration server --- wrappers access and translate information from data sources into global model

More Mocha User defined application specific data types and query operators are contained in libraries which must be linked to clients, integration servers, gateways or wrappers –Mocha targets optimized execution of: Implementation of complex data types and query operators not provide by commercial systems User defined functions –Ships code for: data reducing operators i.e. filters –Aggregates, predicates, data mining operators Data inflating operators –Decompression

Applications Integration of sites with images, audio, text, objects, programs –Invoke objects, user defined functions –Efficiently execute user defined queries Consider earth science application that manipulates distributed data –Assume one site per state –Schema: Rasters(time:Integer,band:Integer, location:Rectangle,image:Raster); Stores weekly energy readings from satellites Time is week number, band is energy band, location is rectangle covering region under study and image is raster image –Need to implement Rectangle, Raster classes at each site –Ongoing local changes to classes need to be tracked

Data Shipping v.s. Query Shipping Data shipping –Most operators in query are evaluated by the integration server at an integration site –Wrappers and gateways are used to extract data items from sources, and translate into middleware schema for further processing –Cannot assume that all sites have same ability to process queries Query shipping –One or more query operators are evaluated at data source and results sent back to integration server Hybrid shipping – combines data and query shipping Processing can only be carried out by operators already implemented at the data source

Example of Query involving Data Reducing Operator Select time, location, AvgEnergy(image) From Rasters Where AveEnergy(image) < 100 Assume 200 entries in table Rasters, image having size 1MB, time, band 4 bytes, location is 16 bytes, AveEnergy returns 8 byte double precision Evaluate query at data source and at worst you have to move 200*28 = 5KB Evaluate query on client and you have to move 200MB To evaluate at data source, need to have AveEnergy implemented there

Mocha Architecture Details Major components of MOCHA (Overview, Figure 1) –Client application –Applets, servelets, stand-alone client applications –Query processing coordinator (Figure 2) Controls execution of all queries and commands Parses, optimizes queries, monitors execution process Provides access to repository containing function classes, metadata QPC provides access to distributed data sites modeled as object-relational sources Infrastructure to carry out SQL queries posed over distributed data sources QPC process queries over XML repositories Procedural interface where HTTP requests, ftp downloads, file system access requests can access data sources Extensible query engine based on iterators – iterators used to carry out local selections, local joins, remote selections, distributed joins, sorting etc.

More Architectural Details Data Access Provider –Uniform access mechanism to remote data source –Extensible query execution engine that can load and use application- specific code obtained from the network with help of QPC –DAP is run close to data source –Mocha pushes down user code and queries –Figure 3 Data Server –Stores data for particular data site –Support for object relational systems (Oracle 8i, Informix) and flat file systems

More Architectural Details Catalog –Metadata about user defined function types, user-defined operators, selectivity of various operators, views defined over data sources –Views, data types and operators are uniquely identified by a Uniform Resource Identifier (URI) –Encoded in Resource Description Framework Data –MWObject interface that identifies class as one implementing MOCHA data type – Specifies methods used to read/write each data value into the network –MW LargeObject, MWSmallObject interfaces partition objects into two groups – large objects and small objects –Figure 5

Automatic Code Deployment Compiled Java classes are shipped When administrator incorporates new or updated data type –Stores Java class into a well-known code repository –Registers new type or operator by adding entries into system catalog showing Name or type of operator, associated URI, other info such as version number, user privileges Request from client –QPC generates a list with data types and operators needed to process query –QPC access catalog and maps each type or operator into the specific implementing class –Class retrieved from code repository by QPC’s code loader –QPC distributed pieces of the plan to be executed by ech of the DAPs running on the targeted data sites –QPC ships classes to client and DAPs, then ships classes for query operators to be executed by the DAP –Figure 4

Query Operators Projections and predicates (Figure 6a) Accumulators (Figure 6b) –Reset –Update –Summarize Memory management –Object preallocation and reuse –Iterator creates one structure to buffer columns read from database –One column to store results returned by each call of Next() Communications –Java RMI to marshall & unmarshal objects – this was inefficient and sometimes gave incorrect results –Methods associated with MWObject used to marshal and unmarshal

Query Processing Cost-based approach – evaluation of data-reducing operators moved to DAPs running on data sites Evaluation of data-inflating operators to QPC Execution cost of operator approximated as –Cost(X) = CompCost(X) + NetworkCost(X) –CompCost -- total cost of computing X over input relation R –NetworkCost is total cost of data movement while executing X on R If X is evaluated on DAP, cost is that of moving to QPC results generated after applying X to all tuples in R If X is evaluated at QPC, component is cost of moving to QPC each of the arguments to X in each of the tuples in R Volume reduction factor – total volume transmitted after applying X to R over total volume in R

Optimization Algorithm Cumulative volume reduction factor – –CVRF(P) = CVDT/CVDA –CVDT -- total data volume to be transmitted over the network after applying all operators P to R1, …, Rn –CVDA is total data volume in R1, …, Rn –Want to minimize CVRF Algorithm in Figure 7 is a heuristic that attempts to do this –Plans for single relation expressions are selected to best place complex functions –Complex predicates are sorted on increasing value of metric involving selectivity and computational cost –Once single table access plans are built, Figure 7a explores all different possibilities to perform a join, incrementally building a left-deep plan in which a new relation Rj is added to existing join plan Sj for subset of relations –After join plan is complete, algorithm then places complex operators