Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Future of MOCHA Nick Roussopoulos October 5, 2001.

Similar presentations


Presentation on theme: "The Future of MOCHA Nick Roussopoulos October 5, 2001."— Presentation transcript:

1 The Future of MOCHA Nick Roussopoulos October 5, 2001

2 Stanford Oct 5, 2001 Nick Roussopoulos 2 The Problem Data Sources for an enterprise are: –Distributed Internet, intranets, extranets –Heterogeneous Web servers, relational databases, file systems –Mission-critical Weather service, ocean temperature, stock status, … –Costly to replace or upgrade Risk of breaking it and loss of investment Distributed and heterogeneous data sources

3 Stanford Oct 5, 2001 Nick Roussopoulos 3 The Problem Internet Oracle 8iInformixXML DataText Data High volume access from everywhere Client

4 Stanford Oct 5, 2001 Nick Roussopoulos 4 Client-Server Client-Server 2-tier architecture complex FAT clients Bad Idea Client Internet Oracle 8iInformixXML DataText Data

5 Stanford Oct 5, 2001 Nick Roussopoulos 5 Middleware 3-tier architecture Oracle 8iInformixXML DataText Data Internet Translator Integration Server Catalog Client Thin & fit clients

6 Stanford Oct 5, 2001 Nick Roussopoulos 6 Nice but… Most middleware solutions are static Not flexible for dynamic environments Not scalable to hundreds of client and server sites Development cost is high One-site-at-a-time at a fixed cost Maintenance cost is high Upgrades are practically redevelopments

7 Stanford Oct 5, 2001 Nick Roussopoulos 7 A dynamic world needs Code extensibility & auto-deployment Need for user-defined types and functions –Polygon –Composite() – image aggregation Porting and manual installation of code (C/C++) –Operating System –Hardware Platform High cost of code maintenance –Updates on all platforms –Version management Security in hostile platforms

8 Stanford Oct 5, 2001 Nick Roussopoulos 8 Code Deployment Problem Client Oracle 8iInformixXML DataText Data Internet Translator Integration Server Catalog Not Scalable

9 Stanford Oct 5, 2001 Nick Roussopoulos 9 Query Processing Query execution options –Limited by site-dependent software Composite() – must be ported before use Most processing done at the Integration Server –Powerful Data Servers are under-utilized I/O Nodes –Excessive data movement over the network Network bottleneck Slow internet access

10 Stanford Oct 5, 2001 Nick Roussopoulos 10 Query Processing Problem Client Oracle 8iInformixXML DataText Data Internet Translator Integration Server Catalog 100MB 200MB Inefficient & not scalable

11 Stanford Oct 5, 2001 Nick Roussopoulos 11 Solution MOCHA Middleware Based On a Code SHipping Architecture

12 Stanford Oct 5, 2001 Nick Roussopoulos 12 MOCHA Solution: Ship Java Code Mochlets Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Client Oracle Informix DAP QPC Code Repository Catalog Internet Virginia Maryland Virginia Texas QQQQ Q Q QQ Q No code porting & no maintenance

13 Stanford Oct 5, 2001 Nick Roussopoulos 13 MOCHA Solution: Filter Data @ Source Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Client Oracle Informix DAP QPC Code Repository Internet Virginia Maryland Virginia Texas Catalog 200MB tuples 100MB tuples results 200KB results 150KB results 150KB results 200KB results 150KB results 200KB results 350KB results 350KB No bandwidth waste

14 Stanford Oct 5, 2001 Nick Roussopoulos 14 Software architecture Client DBMS OS File DAP QPC Code Repository Catalog

15 Stanford Oct 5, 2001 Nick Roussopoulos 15 QPC: The Query Processing Coordinator Client API Query Parser Catalog Manager Query Optimizer Execution Engine Code Loader SQL & XML Proc. Interface DAP Access API XML Catalog Code Repository DAP QPC Controls and Coordinates Query Execution

16 Stanford Oct 5, 2001 Nick Roussopoulos 16 DAP: The Data Access Provider DAP Provides QPC with Remote Access to the Data Data Source DAP Access API Control Module Execution Engine Code Loader SQL & XML Proc. Interface Data Source Access Layer JDBCI/O APIDOMJNI

17 Stanford Oct 5, 2001 Nick Roussopoulos 17 Data Server: Storage System Stores and Manages the data sets –database, web server, file system, XML repository Data Server

18 Stanford Oct 5, 2001 Nick Roussopoulos 18 Processing a Query in MOCHA  Query Parsing  Resource Discovery  Query Optimization  Metadata and Control Exchange  Code Deployment Phase  Query Execution Table Rasters location image week band Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Query:

19 Stanford Oct 5, 2001 Nick Roussopoulos 19 Plan Generation Client Informix Oracle QPC DAP Code Repository Catalog Coordination Thread Execution Thread Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location

20 Stanford Oct 5, 2001 Nick Roussopoulos 20 Automatic Code Deployment Client Informix Oracle QPC DAP Code Repository Catalog Coordination Thread Execution Thread Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location

21 Stanford Oct 5, 2001 Nick Roussopoulos 21 Data Processing Client Informix Oracle QPC DAP Code Repository Catalog Coordination Thread Execution Thread Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location

22 Stanford Oct 5, 2001 Nick Roussopoulos 22 Features of MOCHA Automatic code deployment “Plug-N-Play” no system-wide installations Metadata and Schema Mapping framework XML, RDF easy to exchange and map schemas semi-automatic mapping Query optimization based on code shipping –reduce data movement overhead filters at the source expands at the client metrics for code (operator) placement optimization for selection, union and join plans

23 Stanford Oct 5, 2001 Nick Roussopoulos 23 MOCHA Demo: Global Land Cover Facility Integrates the following DAP sites –University of New Hampshire (Webster), NASA GSFC, UMD-CS, UMD-Geography, UMD-UMIACS SP-2 HPSS GLCF hosts the QPC Operations supported: –Coverage queries –Visualization of preview images for –Data sets MODIS, TM, AVHRR –GIS Features Dynamic Sub-setting of TM scenes Composites of GIS Features and AVHRR images

24 Stanford Oct 5, 2001 Nick Roussopoulos 24 Multi-Sensor Analysis of the Los Alamos Fire Event Using MOCHA Data Synergy and Multi-Resolution Instrument Analysis using MOCHA –Access data residing at various data sources –Utilize image processing tools Fire Analysis required a multi-resolution approach –MOCHA is independent of instrument or resolution specifics High Resolution: IKONOS and TM data Moderate Resolution: 250m MODIS Coarse Resolution: AVHRR and DMSP

25 Stanford Oct 5, 2001 Nick Roussopoulos 25 MOCHA Search Utility

26 Stanford Oct 5, 2001 Nick Roussopoulos 26 MOCHA Search Utility (cont’d)

27 Stanford Oct 5, 2001 Nick Roussopoulos 27 MOCHA Search Utility (cont’d)

28 Stanford Oct 5, 2001 Nick Roussopoulos 28 MOCHA Query Results

29 Stanford Oct 5, 2001 Nick Roussopoulos 29 MOCHA ETM+ Subsetting Utility

30 Stanford Oct 5, 2001 Nick Roussopoulos 30 May 9, 2000 Los Alamos (Bands 1,2,3)

31 Stanford Oct 5, 2001 Nick Roussopoulos 31 May 9, 2000 Los Alamos (Bands 7,5,4)

32 Stanford Oct 5, 2001 Nick Roussopoulos 32 Multi-Sensor Query

33 Stanford Oct 5, 2001 Nick Roussopoulos 33 Tabular Query Results

34 Stanford Oct 5, 2001 Nick Roussopoulos 34 MODIS: May 11, 2000: During Fire

35 Stanford Oct 5, 2001 Nick Roussopoulos 35 MODIS: May 24, 2000: After Fire

36 Stanford Oct 5, 2001 Nick Roussopoulos 36 DMSP: Night Visibility of Fire

37 Stanford Oct 5, 2001 Nick Roussopoulos 37 IKONOS 4m resolution

38 Stanford Oct 5, 2001 Nick Roussopoulos 38 IKONOS 4m Subset

39 Stanford Oct 5, 2001 Nick Roussopoulos 39 IKONOS 1m resolution

40 Stanford Oct 5, 2001 Nick Roussopoulos 40 IKONOS 1m Subset

41 Stanford Oct 5, 2001 Nick Roussopoulos 41 MOCHA Metadata Publishing Framework Provides information about system resources Data sources schemas and mappings user-defined types and functions Automates operation of MOCHA Incremental system growth neither fixed nor hardwired parameters no extension by re-compilation Share metadata with others (Internet) machine readable form

42 Stanford Oct 5, 2001 Nick Roussopoulos 42 Metadata Publishing Framework Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location location image week band Table Rasters Query: 1. What kind of metadata are needed? 2. How to specify them?

43 Stanford Oct 5, 2001 Nick Roussopoulos 43 MOCHA Catalog Organization Metadata about “resources” –Local and global tables –UDF data types and operators –Schema mapping rules –DAPs Each one has Uniform Resource Identifier (URI) global namespace –e.g.: mocha://cs1.umd.edu/EarthSci/Polygon Modeled with RDF, serialized with XML easy to understand, use and exchange

44 Stanford Oct 5, 2001 Nick Roussopoulos 44 RDF Model: Data Types mocha:Type mocha:Class mocha:Repository mocha:Size mocha:Creator mocha://cs1.umd.edu/EarthSci/Raster Raster Raster.class cs1.umd.edu/EarthSci 1 megabyte user1@cs.umd.edu

45 Stanford Oct 5, 2001 Nick Roussopoulos 45 XML Serialization: Data Types W3C Standards Easy to specify using GUI tools Easy to exchange Crawlers can harvest it Stored in –DB –File System <rdf:Description about= “mocha://cs1.umd.edu/EarthSci/Raster”> Raster Raster.class cs1.umd.edu/EarthSci 1 MB user1@cs1.umd.edu

46 Stanford Oct 5, 2001 Nick Roussopoulos 46 Other Resources in MOCHA Local and Global tables –data sources + columns + types UDF Functions –argument types + return type –code repository Schema mapping rules DAPs –URL –login information

47 Stanford Oct 5, 2001 Nick Roussopoulos 47 Schema Mapping in MOCHA location image week band point1 point2 photo date band Direct column mappings Complex Expressions Rasters RastersMD rect() week()

48 Stanford Oct 5, 2001 Nick Roussopoulos 48 MOCHA Schema Mapping Rules Use XML to encode mapping rules Schema mapping sub-plans –leaf nodes image photo location rect(point1, point2) … Plan Tree SMP

49 Stanford Oct 5, 2001 Nick Roussopoulos 49 Query Optimization Problem Issue 1: Cost of query execution –What is the dominant factor? Issue 2: Placement of UDF operator execution –Which go to QPC? –Which go to DAPs? Issue 3: How to generate query plans? –Dynamic programming [SAC+79], [ML86] –But search space is enormous and full of “bad” plans Placement of UDF, joins, execution sites …

50 Stanford Oct 5, 2001 Nick Roussopoulos 50 MOCHA Optimization Framework Query optimization based on heuristics cost = network + CPU + I/O Network is the dominant factor (WAN) optimize for it first CPU and I/O are cheaper optimize for them later Operator placement: Enhanced Hybrid Shipping Code Data

51 Stanford Oct 5, 2001 Nick Roussopoulos 51 Operator Placement in MOCHA Data-Reducing Operators –“Filter” the data –aggregates, predicates, projections, semi-joins Composite(), Overlaps(), AvgEnergy() Push to the DAPs Return distilled results Less data movement Composite()

52 Stanford Oct 5, 2001 Nick Roussopoulos 52 Operator Placement in MOCHA Data-Inflating Operators “Expand” the data projections, image processing, some joins … DoubleResolution(), RotateSolid() Pull to the QPC Data Shipping policy [FJK96] Only send back raw arguments Less data movement DoubleRes()

53 Stanford Oct 5, 2001 Nick Roussopoulos 53 Placement Metric: VRF Volume Reduction Factor : Given operator f and relation R, then VDT - volume of data transmitted after applying f to R VDA - volume of data originally present in R f is Data-Reducing  VRF < 1 Composite() f is Data-Inflating  VRF  1 DoubleRes()

54 Stanford Oct 5, 2001 Nick Roussopoulos 54 Goal: Plans with small CVRF Cumulative Volume Reduction Factor: Given a plan P to solve query Q over relations R1, …, Rn CVDT - volume of data transmitted by applying all operators in P to R1, …, Rn CVDA- volume of data originally present in R1, …, Rn Search Space Optimizer searches for plans that move minimal amount of data. CVRF(Plan)  [0,1]

55 Stanford Oct 5, 2001 Nick Roussopoulos 55 MOCHA Query Optimizer System R style –Left-deep plans (joins at QPC) –cost: execution time (network + CPU + I/O) –operator placement : VRF and plan cost –selections, unions and joins Placement Policy: Enhanced Hybrid Shipping –Code Shipping: operators at DAPs –Data Shipping: operators at QPC –generalizes Hybrid Shipping [FJK96]

56 Stanford Oct 5, 2001 Nick Roussopoulos 56 Sequoia 2000 Benchmark Goals of first experiment: –Measure how good code shipping can be –Validate heuristics being proposed VRF CVRF Configured MOCHA with plans that place operators –at DAP with code shipping –at QPC with data shipping

57 Stanford Oct 5, 2001 Nick Roussopoulos 57 Reducing vs. Inflating Running Time (secs) QPC DAP Query Class Q1Q2Q3 Query classes –Q1: Composite of all images –Q2: Clipping and sub-setting –Q3: Double resolution of images Performance –composites 99% data reduction 4-1 better performance –clipping and expansion 80% data reduction 3-1 better performance Validates heuristics

58 Stanford Oct 5, 2001 Nick Roussopoulos 58 VRF vs. Selectivity Selectivity and cardinality not enough for distributed predicate placement Consider 50% selectivity DAP  CVRF = 0.01 QPC  CVRF = 1 Running Time (secs) Selectivity QPC DAP QPC DAP QPC DAP QPC DAP QPC DAP 0.25.50.75 1 VRF is a better metric

59 Stanford Oct 5, 2001 Nick Roussopoulos 59 WAN Experiment Sites used: –University of Maryland (QPC) –University of Puerto Rico –Oregon Graduate Institute –University of North Dakota –University of Alabama

60 Stanford Oct 5, 2001 Nick Roussopoulos 60 Union with Data-Reducing EHS is the better option –Filters data –2-1 better performance –Minimal resource usage Q6: Select landuse, location From polygons Where perimeter(location) > 2000.0 Sites: UPR and OGI

61 Stanford Oct 5, 2001 Nick Roussopoulos 61 Union with Reducing and Inflating Q5: Select landuse, location, triangulate(location) From Polygons Where perimeter(location) > 2000.0 EHS is better than DS and QS 2-1 better than QS 6-1 better than DS Consumes least resources Sites: UPR and OGI

62 Stanford Oct 5, 2001 Nick Roussopoulos 62 Join with Data-Reducing EHS is the better option 3-1 better performance –Minimal resource usage Same pattern as with unions –Data movement is the key Q8: Select P.landuse, R.location, R.week From polygons P, rasters R Where overlaps(P.location, R.location) And perimeter(P.location) > 2000.0 Sites: UPR and OGI

63 Stanford Oct 5, 2001 Nick Roussopoulos 63 Union with Extra Load EHS is still the better option Extra load has impact on both Not clear if data shipping wins in real situations Q9: Select landuse, location From polygons Where perimeter(location) > 2000.0 Sites: UPR and OGI Load 20 Load 10

64 Stanford Oct 5, 2001 Nick Roussopoulos 64 MOCHA System Status Operational MOCHA prototype –It’s real! –over 40,000 lines of 100% Java code (JDK 1.3) –People involved: Manuel Rodriguez-Martinez (lead) Mike McGann Steve Kelley Vadim Katz John Towshend, Frank Lindsay, Ben White (Geographers) Joseph JaJa (Algorithms) –Tested with NASA ESIP Federation Los Alamos fire –Supports: Oracle, Postgres, Informix, Sybase, HPSS

65 Stanford Oct 5, 2001 Nick Roussopoulos 65 Features of MOCHA Automatic Code Deployment Scalable middleware architecture Query optimization based on data movement reduction Metadata publishing framework [RMR00a] RDF and XML Publish schemas, mappings, types and functions Drives automatic code deployment Schema mapping rules expressed in XML attach as leaf nodes in query plan extensible

66 Stanford Oct 5, 2001 Nick Roussopoulos 66 MOCHA Publications Research papers and talks –ACM SIGMOD 2000 –EDBT 2000 Demos –ACM SIGMOD 2000 –SSDBM 2001 –NASA ESIP meetings and workshops –U.S. National Academy of Sciences

67 Stanford Oct 5, 2001 Nick Roussopoulos 67 The Future of MOCHA A Million Site MOCHA

68 Stanford Oct 5, 2001 Nick Roussopoulos 68 The Future of MOCHA The role of MOCHA in distributed software systems –sensors –satellites –network switches and routers – laptops, palm computers –custom-built devices –cars, planes, boats –people (fireman), animals (whales)

69 Stanford Oct 5, 2001 Nick Roussopoulos 69 Network of MOCHA enabled sensors Sensors are deployed in an area using ad hoc network techniques Sensors run Java JDK 1.3 Lighter Sensors run Java JDK 1.3 Micro Edition DAP

70 Stanford Oct 5, 2001 Nick Roussopoulos 70 Organization of sensors Leader Normal Sensor Groups Sensors are grouped together for specific goal or service data acquisition data aggregation, analysis data streaming Group leaders are responsible for establishing themselves (broadcast, voting, …) coordination among sensors making decisions (agents) participate in other higher level groups (hybrid P2P)

71 Stanford Oct 5, 2001 Nick Roussopoulos 71 Concrete Example (from NASA) Constellation of Satellites (with sensors) A group observes Gamma radiation –aggregates measurements –determines an important radiation event Group leader tells other peer group leaders to instruct their sensors to observe the Gamma radiation event (reaction). system adapts to changes in the environment

72 Stanford Oct 5, 2001 Nick Roussopoulos 72 MOCHAs Code Shipping feature for upgrades to fix bugs fresh code to gather data –at different resolution –new aggregates or functions dynamically configured code –application-specific security protocol –location-dependent encryption


Download ppt "The Future of MOCHA Nick Roussopoulos October 5, 2001."

Similar presentations


Ads by Google