1 WebdamExchange and WebdamLog: some models for web data management Emilien Antoine, Meghyn Bienvenu, Alban Galland Webdam WS, 04/03/2011.

Slides:



Advertisements
Similar presentations
Viewing the Web as a Distributed Knowledge Base Serge Abiteboul INRIA Saclay, Collège de France and ENS Cachan ICDE 2012Mai 30, 2012.
Advertisements

Key Management. Shared Key Exchange Problem How do Alice and Bob exchange a shared secret? Offline – Doesnt scale Using public key cryptography (possible)
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Building web applications on top of encrypted data using Mylar Presented by Tenglu Liang Tai Liu.
Lect. 18: Cryptographic Protocols. 2 1.Cryptographic Protocols 2.Special Signatures 3.Secret Sharing and Threshold Cryptography 4.Zero-knowledge Proofs.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010.
 Public key (asymmetric) cryptography o Modular exponentiation for encryption/decryption  Efficient algorithms for this o Attacker needs to factor large.
1 Adaptive Management Portal April
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
CMSC 414 Computer and Network Security Lecture 21 Jonathan Katz.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
CMSC 414 Computer and Network Security Lecture 19 Jonathan Katz.
More on AuthenticationCS-4513 D-term More on Authentication CS-4513 Distributed Computing Systems (Slides include materials from Operating System.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
Web Service Architecture Part I- Overview and Models (based on W3C Working Group Note Frank.
The Internet & The World Wide Web Notes
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
Secure Systems Research Group - FAU Patterns for Digital Signature using hashing Presented by Keiko Hashizume.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Lecture 7 Page 1 CS 236 Online Password Management Limit login attempts Encrypt your passwords Protecting the password file Forgotten passwords Generating.
Evaluating Centralized, Hierarchical, and Networked Architectures for Rule Systems Benjamin Craig University of New Brunswick Faculty of Computer Science.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Lecture 19 Page 1 CS 111 Online Symmetric Cryptosystems C = E(K,P) P = D(K,C) E() and D() are not necessarily the same operations.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
HomeViews: P2P Middleware for Personal Data Sharing Applications Roxana Geambasu, Magdalena Balazinska, Steve Gribble, Hank Levy University of Washington.
Identity Management Report By Jean Carreon and Marlon Gonzales.
Deploying Trust Policies on the Semantic Web Brian Matthews and Theo Dimitrakos.
COEN 351 E-Commerce Security Essentials of Cryptography.
Lecture 11: Strong Passwords
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
1 Security on Social Networks Or some clues about Access Control in Web Data Management with Privacy, Time and Provenance Serge Abiteboul, Alban Galland.
Confidentiality-preserving Proof Theories for Distributed Proof Systems Kazuhiro Minami National Institute of Informatics FAIS 2011.
Scalability in a Secure Distributed Proof System Kazuhiro Minami and David Kotz May 9, 2006 Institute for Security Technology Studies Dartmouth College.
Semantic Web Technologies Research Topics and Projects discussion Brief Readings Discussion Research Presentations.
Lightweight Consistency Enforcement Schemes for Distributed Proofs with Hidden Subtrees Adam J. Lee, Kazuhiro Minami, and Marianne Winslett University.
The TAOS Authentication System: Reasoning Formally About Security Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
COEN 350: Network Security Authentication. Between human and machine Between machine and machine.
Encryption. Introduction The incredible growth of the Internet has excited businesses and consumers alike with its promise of changing the way we live.
MEMBERSHIP AND IDENTITY Active server pages (ASP.NET) 1 Chapter-4.
OWL Representing Information Using the Web Ontology Language.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
6° of Darkness or Using Webs of Trust to Solve the Problem of Global Indexes.
Network Security Continued. Digital Signature You want to sign a document. Three conditions. – 1. The receiver can verify the identity of the sender.
11 Restricting key use with XACML* for access control * Zack’-a-mul.
FriendFinder Location-aware social networking on mobile phones.
Raluca Paiu1 Semantic Web Search By Raluca PAIU
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
- Laboratoire d'InfoRmatique en Image et Systèmes d'information LIRIS UMR 5205 CNRS/INSA.
1 Authorization Sec PAL: A Decentralized Authorization Language.
1 Web Services for Semantic Interoperability and Integration Tim Finin University of Maryland, Baltimore County Dagstuhl, 20 September 2004
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Chapter 1 Overview of Databases and Transaction Processing.
SOSIMPLE: A Serverless, Standards- based, P2P SIP Communication System David A. Bryan and Bruce B. Lowekamp College of William and Mary Cullen Jennings.
A formal study of collaborative access control in distributed datalog Serge Abiteboul – Inria & ENS Cachan Pierre Bourhis CNRS & Lille Univ. & Inria Victor.
Server Concepts Dr. Charles W. Kann.
CHAPTER 3 Architectures for Distributed Systems
Patterns.
MUMT611: Music Information Acquisition, Preservation, and Retrieval
A Distributed Tabling Algorithm for Rule Based Policy Systems
Social Abstractions for Information agents
A framework for ontology Learning FROM Big Data
Presentation transcript:

1 WebdamExchange and WebdamLog: some models for web data management Emilien Antoine, Meghyn Bienvenu, Alban Galland Webdam WS, 04/03/2011

2 Organization Introduction Representing all Web information as logical sentences Representing all Web data management as logical rules Some clues about WebdamPoor Some clues about implementation Conclusion

Introduction

4 Context of the work presented here Joint work with many people: Émilien Antoine, Serge Abiteboul, Meghyn Bienvenu, David Gross-Amblard, Marilena Oita, Amélie Marian, Bruno Marnette, Neoklis Polyzotis, Philippe Rigaux, Marie- Christine Rousset…

5 Context: Web data management Scale: lots of users, servers, large volume of data… Distribution heterogeneity: Cloud (social networks), P2P (DHT, gossiping)… Security heterogeneity: login, https, crypto, hidden URL… Terminology heterogeneity: annotation, semantic Web, ontologies… Incomplete information: inconsistencies, belief, trust… The heterogeneity keeps increasing with new systems and new applications arriving Consequence 1: difficulty to perform data integration/management Consequence 2: impossibility to keep control over its own data

6 Thesis: Web data = distributed knowledge Work plan 1. Represent all Web information as logical sentences 2. Represent all Web data management as logical rules 3. Develop a system to validate these ideas Motivation for the approach Facilitate the design/implementation of complex systems Facilitate the control/surveillance of complex systems Use reasoning to optimize query evaluation Use reasoning for semantics/ontologies Use reasoning to manage access control and protect data Use reasoning to analyze properties of systems

7 Motivating example Alice : get me the pictures of my friends where I am with Bob? What is going on: Find the friends of Alice (The iPhone of Alice may remember it) For each answer, say Sue, find where Sue keeps her pictures (She may keep her pictures on Picasa) Find the means to access Sue’s pictures (Alice may ask the private url to a common friend) Find the photos with Bob and Alice (e.g. by querying the meta-data)

8 Motivating example Alice : get me the pictures of my friends where I am with Bob? Issues: heterogeneity of friends Heterogeneity of hosting: Some keep their pictures on trusted servers such as Picasa, some put in on untrusted DHT, some have them on their smartphones… Heterogeneity of access-control: Some are public, some use login- password, some use private url, some use cryptography… Heterogeneity of data description: they may use different models of meta-data (taxonomies, ontologies…)

9 Complicated application organization… Example of our SocialRock demo:

Representing all Web information as logical sentences

11 The information belongs to someone Each information belongs to a principal A principal has an identity (URI) which can be authenticated Two kinds of principal: peer and virtual principal A peer: alice-laptop, alice-iPhone, picasa, facebook, dht-peer- 124, … Storage and processing capabilities A peer typically has a URL and can be sent query/update requests A virtual principal: alice, alice-friends, roc14 A virtual principal relies on peers for storage and processing

12 The kind of information we are talking about Data: pictures, movies, music, s, ebooks, reports Localization: bookmarks, knowledge such as Alice has an account in Facebook, Sue puts her pictures in Picasa Access: login/password, access rights on servers Annotations /Ontologies: semantic tags in Picasa,RDFS, OWL Services: search engines, yellow pages, dictionaries… Incomplete information: beliefs, probabilistic information… And more…

13 Logical statements to represent information Data: Document: Collection: Localization: picasa/alice) Access right: Access secret : “HG-FT23”) Ontologies: human-being) Services: $City, $Y) Belief: Etc.

14 WebdamExchange focus: authenticated knowledge Base statement: someone states (….) It is annotated with a proof that “someone” can write data of alice In the cryptographic setting, it is a signature of the whole statement using the write secret key of alice Keeping trace of provenance: alice-laptop states (….) requester bob at 12:30, 10/08/2009 alice-Laptop is the performer (the peer who did the update of the data of Alice) bob is the requester (the peer or the user who requested the update) The content is possibly encrypted: alice-laptop states (….) protected for requester bob at 12:30, 10/08/2009

15 WebdamExchange focus: authenticated knowledge Communication: external knowledge is knowledge about other principals: alice-laptop says (alice-laptop states (….) requester bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009 alice-laptop is the performer of the communication sue-iphone is the receiver of the communication External knowledge is authenticated by the performer and is stored by the receiver. The external knowledge keep a trusted trace of the provenance and communication are pilled-up: sue-iphone says (alice-laptop says (alice-laptop states (….) requester bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009) to bob-iphone at 13:10, 15/10/2009 The time is the time of the performer, there is no global clock

16 The model covers a wide range of data The model does not prescribe any particular architecture for distribution Gossiping, DHT, centralized server Combination of these Based on an abstract notion of localization The model does not prescribe how access control is enforced, e.g.: Documents in Web servers with access protected by login/password Documents protected by cryptographic keys in public sites Based on an abstract notion of secret and hint See presentation of Emilien on WebdamPoor

17 Summary of WebdamExchange All the information forms a trusted knowledge base Each peer manages some portion of the knowledge base Now, we have to use this distributed knowledge base … for the management of the distributed knowledge base!

Representing all Web data management as logical rules

19 From WebdamExchange to Webdamlog The logical part of the WebdamExchange statements can easily be translated into datalog facts. Now we want to perform reasoning on these facts in order to locate, exchange, and update information Example: use logical reasoning among peers to locate the pictures of Alice’s friends in which she appears with Bob This motivates Webdamlog, a rule-based language for web data management

20 Why datalog? Datalog: very popular in the 90’s, prehistory by Web time + Natural syntax; reasonably expressive; easy to extend - Recursion not really essential in most applications Datalog extensions Negation and aggregate functionslots of work on these Updates, time, trees, distributionless work on these We use a datalog-like language influenced by Active XMLfor distribution and delegation Hellerstein’s Dedalusfor time and performance

21 Webdamlog Facts (messages) of the form 1,...,a n ) Rules of the form :- (¬) R 1 (U 1 ), …, (¬) R n (U n ) R,R i are relation terms P,P i are peer terms U,U i are tuples of terms Safety condition Intuition: if the body holds for some valuation v, the fact is sent to the peer vP What happens if the body of the rule mentions different peers? Peers need to collaborate to evaluate the rule  rule delegation

22 Webdamlog System: A finite set  of peers Each peer p in  has a local program P(p) and a delegated program D(p), which are both finite sets of rules Each peer p also has a database I(p) consisting of a finite set of facts of the form Semantics: In a state (P,D,I), choose randomly some p Evaluate (P(p)UD(p))(I(p)) This defines the new DB I’(p) Send facts and update delegations of the other peers to define (D’(q),I’(q)) for each peer q≠p The changes to each q are installed instantaneously – we will see how to avoid this if desired Choose another peer and keep going (in a fair way)

23 Features of Webdamlog illustrated Alice: get me the pictures of my friends where I am with Bob $R, $P), $Photo, $Meta), “Alice”), “Bob”) photos, picasa) :- member($X, picasa) -Peers and relations treated as data: they are reified will instantiate with concrete relation and peer is extensional, occurs in data at alice-iphone intensional, derived from data + rules

24 Peer picasa will send the photos as extensional facts to alice-iphone. When Alice terminates her query, she cancels all the delegations. Features of Webdamlog illustrated photos, picasa) :- member($X, picasa) Partial evaluation at alice-iphone ($X  Sue, $R  photos, $P  picasa) Then alice-iphone installs the rest of the rule at picasa: :- “Alice”), “Bob”) $R, $P), $Photo, $Meta), “Alice”), “Bob”) Alice: get me the pictures of my friends where I am with Bob

25 What can we show ? In general, asynchronicity yields non-deterministic systems Identified two types of Webdamlog systems (only positive rules / appropriately stratified negation) for which we have: convergence: all runs eventually reach same state simulation by centralized datalog program Interesting to compare expressivity of different variants of WebdamLog: full / limited / no delegation, presence of time- stamps or ordering of peers… For appropriate notion of simulation, can show that full delegation > limited delegation > no delegation

26 More refined asynchronicity To model transmission of facts from peer p to peer q, we may use a “peer” net pq that captures the network Replace at p by pq (u) net pq should just relay messages: :- pq ($U) Problem: all messages stocked in net pq arrive at the same time Better with time pq (u,t) where t is the time at p :- pq (U,T), min(T, pq (U,T)), using min aggregate function

27 Summary of Webdamlog Peer are asynchronously running their own datalog programs They interact by exchanging facts and delegating rules Some things to look at: Evaluation and optimization of queries Acquisition of new rules Reasoning with social information (trust, provenance, etc.)