Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.

Slides:



Advertisements
Similar presentations
Serializability in Multidatabases Ramon Lawrence Dept. of Computer Science
Advertisements

Bloom Based Filters for Hierarchical Data Georgia Koloniari and Evaggelia Pitoura University of Ioannina, Greece.
1 K-clustering in Wireless Ad Hoc Networks Fernandess and Malkhi Hebrew University of Jerusalem Presented by: Ashish Deopura.
Advanced Data Structures
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Progress Report Wireless Routing By Edward Mulimba.
Graph & BFS.
Dissemination protocols for large sensor networks Fan Ye, Haiyun Luo, Songwu Lu and Lixia Zhang Department of Computer Science UCLA Chien Kang Wu.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Distributed Systems CS Naming – Part II Lecture 6, Sep 26, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Naming Names in computer systems are used to share resources, to uniquely identify entities, to refer to locations and so on. An important issue with naming.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
Lyon, June 26th 2006 ICPS'06: IEEE International Conference on Pervasive Services 2006 Routing and Localization Services in Self-Organizing Wireless Ad-Hoc.
Research Interests Georgia Koloniari Computer Science Department University of Ioannina, Greece.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
SOS: Security Overlay Service Angelos D. Keromytis, Vishal Misra, Daniel Rubenstein- Columbia University ACM SIGCOMM 2002 CONFERENCE, PITTSBURGH PA, AUG.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Dimitrios Skoutas Alkis Simitsis
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee
DIST: A Distributed Spatio-temporal Index Structure for Sensor Networks Anand Meka and Ambuj Singh UCSB, 2005.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.
Algorithmic Detection of Semantic Similarity WWW 2005.
BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.
Tufts Wireless Laboratory School Of Engineering Tufts University Paper Review “An Energy Efficient Multipath Routing Protocol for Wireless Sensor Networks”,
Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.
Service Discovery and Semantic Overlay Network Creation in DBGlobe University of Ioannina 4th DBGlobe Meeting Paris, June 23, 2003.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1 Along & across algorithm for routing events and queries in wireless sensor networks Tat Wing Chim Department of Electrical and Electronic Engineering.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CS522 Advanced database Systems
Multiway Search Trees Data may not fit into main memory
Introduction to Wireless Sensor Networks
Wireless Sensor Network Architectures
Temporal Indexing MVBT.
Temporal Indexing MVBT.
Probabilistic Data Management
Pervasive Data Access (PDA) Research Group
Department of Computer Science
Advanced Operating Systems Chapter 11 Distributed File systems 11
Indexing and Hashing Basic Concepts Ordered Indices
Persistent Bloom Filter: Membership Testing for the Entire History
IPFS: Interplanetary File Systems
Presentation transcript:

Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive initiative on: Global Computing (GC) DBGlobe IST rd Meeting Athens, November 29, 2002 UoI Presentation

Directories :: Resource Location Data Delivery Outline

Summaries for Resource Discovery Maintain summaries (e.g., Bloom filters) to assist the search for a service (resource) Directories for XML metadata and appropriate summaries Resource Discovery

Motivation: (DBGlobe) Large Scale and Dynamic Environment How to locate a resource System Model: Sites that store hierarchical descriptions of services (in XML) or XML documents Path queries Limitations (so far): We consider only XML-Trees (no cycles) No value queries Joint work with Georgia Koloniari

Resource Discovery device printer color postscript digital camera An example XML-description and the corresponding XML-tree Path queries From the root: //device/printer Partial: camera/digital * Overall Approach: maintain Bloom-based indexes to check whether a document (item) exists at a site (peer)

Resource Discovery Bloom-Filters Allocate a vector v of m bits, initially all set to 0 Choose k independent hash functions, h 1, h 2, …, h k, each with range {1,…, m}. For each element a  A, set the bits at positions h 1 (a), h 2 (a),..., h k (a) to 1. (A particular bit might be set to 1 multiple times) Given a query for b, check the bits at positions h 1 (b), h 2 (b),..., h k (b). If any is 0, then certainly b is not in the set A. Otherwise we assume that b is in the set (“false positive”). test if an element b exists in a set A = {a 1, a 2,…, a n } of n elements (keys) Element a h 1 (a) = P1 h 2 (a) = P2 h 3 (a) = P3 h 4 (a) = P4 m bits Bit Vector v

Breadth (or level) Blooms Resource Discovery The Breadth Bloom Filter (BBF) for an XML tree T with j levels: set of Bloom filters {BBF 0, BBF 1, BBF 2, … BBF i }, i ≤ j One Bloom filter, denoted BBF i, for each level i of the tree. BBF i : the labels (attributes) of all nodes at level i. BBF 0 : all attributes that appear in any node of the XML tree T. device printer color postscript digital camera { device, printer, camera, color, postscript, digital } {device} {printer, camera} {color, postscript, digital} BBF 0 BBF 1 BBF 2 BBF 3 The BBF i s are not of the same size We may skip levels

Depth (or Path) Blooms Resource Discovery The Depth Bloom Filter (DBF) for an XML tree T with j levels: set of Bloom filters {DBF 0, DBF 1, DBF 2, … DBF i-1 }, i ≤ j One Bloom filter, denoted DBF i, for each path of length i (with i+1 nodes) of the tree. DBF i : the labels (attributes) of all paths of length i. DBF 0 : all attributes that appear in any node of the XML tree T. device printer color postscript digital camera { device, printer, camera, color, postscript, digital } {device/printer, device/camera, printer/color, printer/postscript, camera/digital} {device/printer/color, device/printer/postscript, device/camera/digital DBF 0 DBF 1 DBF 2 Special symbol for “root” paths

Resource Discovery Preliminary performance results Both outperform (in terms of false positives) a same size simple bloom Depth (path) very sensitive on the number of levels Depth (path) need more space Updates are handled efficiently (just the corresponding vectors)

Distribution Each site:  local-filter: a bloom filter for local resources  one or more summary -filter summary-filter: merge of the bloom filters of a set X of other sites Resource Discovery

Horizons (keep information for up to horizon = d neighbors (as in routing indexes) A merged-filter for each path: merge of blooms for all sites on the path up to length equal to the horizon Resource Discovery Merged of nodes 1, Merged of nodes 3, Merged of nodes 6, 7, 8 0

Hierarchical Resource Discovery root peers Leaf sites : local filter Internal sites : summaries for all nodes in its subtree Root sites : summaries for other root sites

Resource Discovery Future work Evaluate distribution strategies Other ways of summarizing data (related work on selectivity estimation) See how this  can be related to ontologies (meaningful path queries)  whether/how it can be integrated with querying

Directories :: Resource Location Data Delivery Outline

A survey on different modes to transmit data:  Push/pull  Continuous (periodic) /a-periodic  Multicast/unicast  Directed diffusion (communication only with neighbor nodes) For the 1 st deliverable on the topic Data Delivery

The different data delivery modes in DBGlobe Tradeoffs of using one over the other (e.g., in registering services, directory (location updates) To be extended for D10 (Data Delivery and Querying) For the 1 st deliverable on the topic Data Delivery

Data Delivery Modes and Coherence Data Delivery Focus: How to achieve temporal (currency) and Semantic (transaction-based) Coherency of Data under different modes of data delivery

The Data Broadcast Model Client Server Broadcast Channel The server broadcasts data from a database to a large number of clients push mode + no direct communication with the server Data updates at the server Periodic updates for the values on the channel Data Delivery  Efficient way to disseminate information to large client populations with similar interests  Physical support in wireless networks (satellite, cellular)  Alternative way of transmitting information for data intensive applications (e.g., web)

 Multiple Versions: Not just one value per item, but k such values [Pitoura&Chrysanthis, IEEE TC 2003]  Temporal and Semantic Coherency (Theory and Protocols) [Pitoura,Chrysanthis&Ramamritham, ICDT03] Data Delivery Clients must read consistent and current data without contacting the server directly

Currency  (x, u)  RS(R) CI(x, R) Currency Interval of an item x in RS(R) - CI(x, R) - is [c b, c e ) where c b is the time instance when the value was stored in the database, c e is the time insatnce of the next change of this value in the database  , say [c b, c e ) overlapping- equal to c e - RS(R) is a subset an actual database state at the server older value OV_Currency(R) = c e -, where c e is the smallest among the right limits of CI(x, R) Data Delivery Currency Interval for a set (readset) Two properties: Temporal spread (discrepancies among database states) Temporal Lag (how old with regards some point in time (e.g., T_commit)

Protocols and their properties  Timestamps (versioning)  Invalidation Reports  Propagation Data Delivery

Consistency Degrees of Consistency C0 C1 RS(R)  DS C2 R serializable with the set of server transactions that read values read (directly or indirectly) by R C3 R serializable with the all server transactions C4 R serializable with the all server transactions and the serial izability order of the server transactions that R observes is consistent with the commit order of transactions at the server Data Delivery

Protocols and their properties Data Delivery Relation to temporal coherency Based on broadcasting the serialization graph of the server (or parts of it)

Future Work Multiple servers model Applications in sensor networks Data Delivery

DBGlobe IST