IrisNet Cache-and-Query for Wide Area Sensor Databases Amol Deshpande, UC Berkeley Suman Nath, CMU Phillip Gibbons, Intel Research Pittsburgh Srinivasan.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Edoclite and Managing Client Engagements What is Edoclite? How is it used at IU? Development Process?
Dr. Kalpakis CMSC 421, Operating Systems. Fall File-System Interface.
Kuali Rice at Indiana University Important Workflow Concepts Leveraged in Production Environments July 29-30, 2008 Eric Westfall.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
An Engineering Approach to Computer Networking
Xyleme A Dynamic Warehouse for XML Data of the Web.
Internet Indirection Infrastructure Ion Stoica UC Berkeley.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
IrisNet Cache-and-Query for Wide Area Sensor Databases Amol Deshpande, UC Berkeley Suman Nath, CMU Phillip Gibbons, Intel Research Pittsburgh Srinivasan.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
1 Personal Activity Coordinator (PAC) Xia Hong UC Berkeley ISRG retreat 1/11/2000.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Protecting Privacy in Sensor- Enriched Internet Services Presenter: Yan Ke, CMU In collaboration with: Phillip B. Gibbons, Brad Karp, Rahul Sukthankar,
Lecture Nine Database Planning, Design, and Administration
Client-Server Processing and Distributed Databases
Evolved from ARPANET (Advanced Research Projects Agency of the U.S. Department of Defense) Was the first operational packet-switching network Began.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Database Design – Lecture 16
June 25 th PDPTA Incorporating an XML Matching Engine into Distributed Brokering Systems.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
March 6th, 2008Andrew Ofstad ECE 256, Spring 2008 TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael J. Franklin, Joseph.
Introduction GOALS:  To improve the Quality of Service (QoS) for the JBI platform and endpoints  E.g., latency, fault tolerance, scalability, graceful.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Sensor Database System Sultan Alhazmi
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
The european ITM Task Force data structure F. Imbeaux.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
DDBMS Distributed Database Management Systems Fragmentation
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
 Problem Definition  Presented by Sushant and Alex Overview of the problem space Scenario Issues Example (plant care example) Discussion conclusion open.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Lecture A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 7, Part 2 IrisNet Query Processing Phil Gibbons February.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Distributed Handler Architecture Beytullah Yildiz
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
1 Chapter 2 Database Environment Pearson Education © 2009.
Review Lecture DB A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Review Lecture Databases Phil Gibbons May 1, 2003.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Databases and DBMSs Todd S. Bacastow January 2005.
Chapter (12) – Old Version
Open Source distributed document DB for an enterprise
Data, Databases, and DBMSs
Database Architecture
Database System Architectures
Practical Database Design and Tuning Objectives
Map Reduce, Types, Formats and Features
Presentation transcript:

IrisNet Cache-and-Query for Wide Area Sensor Databases Amol Deshpande, UC Berkeley Suman Nath, CMU Phillip Gibbons, Intel Research Pittsburgh Srinivasan Seshan, CMU Presented by David Yates, April 9, 2004

IrisNet June Outline Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions Critique

IrisNet June Internet-scale Resource-intensive Sensor Network Services (IrisNet) Motivation Proliferation of resource-intensive sensors attached to powerful devices Webcams, pressure gauges, microphones Rich data sources with high data volumes Typically distributed over wide geographical areas Useful services utilizing such sensors missing IrisNet: An infrastructure to support deployment of sensor services over such sensors

IrisNet June IrisNet: Design Goals Ease of deployment of sensor services Minimal requirements from the service provider Distributed data storage and querying for high throughputs Ease of querying XML as the data format, XPATH as the query language Natural geographical hierarchy on data as well as queries Continuously evolving data Location transparency Logical view of the entire distributed database as a single centralized XML document

IrisNet June IrisNet Architecture Sensing Agents (SA) PDA/PC-class processor, MBs–GBs storage Collect & process data from sensors, as dictated by “senselet” code uploaded by OAs Processed data sent to the OAs for update in-place Organizing Agents (OA) PC/Server-class processor, GBs storage Provide data storage, discovery, querying facilities Use an off-the-shelf database to store data locally Interface with the local database using XPATH/XSLT SA OAs

IrisNet June Outline Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions Critique

IrisNet June Example Application : Parking Space Finder (PSF) Webcams monitor parking spaces and provide real-time information about their availability Image processing to extract availability information Natural geographical hierarchy on the data

IrisNet June Example XML Fragment for PSF 200 … no yes … …

IrisNet June Example XML Fragment for PSF

IrisNet June Example Queries Users issue queries against the document as a whole Find all available parking spots in Oakland = “no”] Find all blocks in in Allegheny have more than 20 metered parking spots //Block[ count (./pSpace[metered = “yes”]) > 20] Find the cheapest parking spot in Oakland Block 1 /pSpace[ not (../pSpace/price >./price)] Challenge : Evaluate arbitrary XPATH queries against the document even though the document may be partitioned across multiple OAs

IrisNet June Data Partitioning and Query Processing: Overview Maintain data partitioning invariants Used to guarantee that an OA always has sufficient information to participate correctly in a query Use DNS to maintain the data distribution information and to route queries to data Convert the XPATH query to an XSLT query that : Walks the document recursively Evaluates part of the query that can be done locally Gathers missing information by asking subqueries

IrisNet June Outline Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions Critique

IrisNet June Partitioning Granularity Definition : An IDable node in the document Has an “id” attribute with value unique among its siblings All its ancestors in the document are IDable

IrisNet June Partitioning Granularity Definition : Local Information of an IDable node All its attributes and all its non-IDable descendants IDs of all its IDable children

IrisNet June Partitioning Granularity Definition : Local Information of an IDable node All its attributes and all its non-IDable descendants IDs of all its IDable children

IrisNet June Data Partitioning Data storage, ownership always in units of local information corresponding to the IDable nodes in the document These form a nearly-disjoint partitioning of the overall document Granularity can be controlled using the “id” attributes A partitioning unit can be uniquely identified using the “id”’s on the path to the root of the document Data ownership: Each partitioning unit owned by exactly one OA

IrisNet June Data Partitioning Data stored locally at each OA: A document fragment consisting of union of partitioning units Constraints: Must store the document fragment it owns If stored the “id” of an IDable node, must also store the local information of all its ancestors We minimize the amount of information required to store (details in paper) Only need to store ID’s of all ancestors, and of their children Invariant : If an OA has the “id” of an IDable node, it either Has the local information for the node, or Has the “id”’s on the path to the root allowing it to locate the local information for that node

IrisNet June Data Partitioning: Example OA 2 Owns OA 1 Owns

IrisNet June Data Partitioning: Example Data storage configuration at OA 1 Local information required Local information optional Local information optional

IrisNet June Data Partitioning: Example Data storage configuration at OA 2 Local information required Local information required Local information optional

IrisNet June Mapping Data to OAs Mapping of nodes to physical OAs maintained using DNS For each IDable node, create a unique DNS-style name by concatenating the IDs on the path to the root Mapped to OA 1: Allegheny-County….iris.net Pittsburgh-City.Allegheny-County….iris.net Mapped to OA 2: Oakland-Neighborhood.Pittsburgh-City. Allegheny-County….iris.net 1-Block.Oakland-Neighborhood.Pittsburgh- City.Allegheny-County….iris.net 1-pSpace.1-Block.Oakland-Neighborhood. Pittsburgh-City.Allegheny-County….iris.net … OA 2 Owns OA 1 Owns

IrisNet June Outline Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions Critique

IrisNet June Self-Starting Distributed Queries Each query has a hierarchical prefix Simple parsing of the query to extract the least common ancestor (LCA) of the possible query result Send the query to Oakland-Neighborhood.Pittsburgh-City. Allegheny-County.Pennsylvania-State.parking.intel-iris.net Name extracted from query without any global or per-service state

IrisNet June QEG Details Nesting depth of an XPATH query Maximum depth at which a location path that traverses over IDable nodes occurs in the query Examples :  0  0 /a[./b/c]/b  1 (if b is IDable)  2 Complexity of evaluating a query increases with nesting depth

IrisNet June Queries with Nesting Depth = 0 Any predicate in the query can be evaluated using just the local information for an IDable node Example : > 10] Sketch of the XSLT program : Walk the document recursively If local information for the node under consideration available, evaluate the part of the query that refers to that node, otherwise tag the returned answer with the tag “asksubquery” Postprocessor finds the missing information by asking subqueries

IrisNet June Caching A site can add to its document any fragment as long as the data partitioning constraints are satisfied We generalize subqueries to fetch the smallest superset of the answer that satisfies the constraints and cache it Data time-stamped at the time of caching Queries can specify freshness requirements

IrisNet June Further Details in Paper Queries with Nesting Depth > 0 Schema changes Data partitioning changes Implementation details and experimental study

IrisNet June Conclusions Identified the challenges in query processing over a distributed XML document Developed formal framework and techniques that Allow for flexible document partitioning Integrate caching seamlessly Correctly and efficiently answer XPATH queries Experimental results demonstrate the advantages of flexible data partitioning and caching

IrisNet June Further Information IrisNet project website

IrisNet June Outline Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions Performance Study Critique

IrisNet June Performance Study Setup Current prototype written in Java A cluster of 9 2GHz Pentium IV machines Apache Xindice used as the backend XML database Artificially generated database 2400 parking spaces with 2 cities, 6 neighborhoods and 120 blocks Five query workloads QW-1: Asking for a single block QW-2: Asking for two blocks from a single neighborhood QW-3: Asking for two blocks from two neighborhoods QW-4: Asking for two blocks from two cities QW-Mix: 40% of QW-1 and QW-2, 15% QW-3, 5%QW-4

IrisNet June Architectures Compared

IrisNet June Caching Architecture already allows for caching data An OA is allowed to store more data than that it owns Data time-stamped at the time of caching Queries can specify freshness tolerance

IrisNet June Architectures Compared

IrisNet June Query Throughputs

IrisNet June Data Partitioning: Example 2 OA 1 OWNS OA 2 OWNS e.g. OA 2 must store local information of the County(Allegheny) node

IrisNet June Conclusions Location transparency distributed DB hidden from user Flexible data partitioning Low latency queries & Query scalability Direct query routing to LCA of the answer Query-driven caching, supporting partial matches Load shedding; No per-service state needed at web servers Support query-based consistency Use off-the-shelf DB components

IrisNet June Example XML Fragment for PSF … 8 no yes … …

IrisNet June Outline Overview of IrisNet Example application: Parking Space Finder Query processing in IrisNet Data partitioning Distributed query execution Conclusions Performance Study Critique

IrisNet June What I liked (strengths) In general, this is a very good idea paper, but a mediocre evaluation paper Application scenario is different from other sensor database work; data model is novel; and doesn’t share constraints with some other work Location transparency is elegant – logical view of distributed database as a single centralized database XML has some distinct advantages, e.g., facilitates dynamic update of database schema XML also provides standard query interfaces, e.g., XPATH and XSLT Query-based consistency that supports an application bypassing a cache if data is too stale (i.e., old) Partial match caching is a clever optimization that leverages the cache invariants in the distributed XML database

IrisNet June What I didn’t like (weaknesses) Proposed cache-and-query system is tied to TCP/IP network and DNS in particular Implemented distributed query processing without true distributed caching; authors admit that selective bypassing of caching is needed (at a minimum) The experimental setup used is not realistic (distributed database that isn’t really distributed) Evaluation is only for queries (without concurrent updates); really need both, e.g., 100% queries (baseline); 95% queries with 5 % updates; 90% queries with 10% updates; 80% with 20%; 60% with 40%

IrisNet June Possible Future Work Perform evaluation in distributed environment with more realistic network problems (e.g., network latency, packet delay and loss); perhaps this would make caching more important Add distributed caching, e.g., selective bypass of caches Perform evaluation with query + update workload Experiment with caching policies other than “cache everything everywhere” Explore other distributed database schemes (for XML) Explore other techniques for distributing data and distributing caching