1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010.

Slides:



Advertisements
Similar presentations
Viewing the Web as a Distributed Knowledge Base Serge Abiteboul INRIA Saclay, Collège de France and ENS Cachan ICDE 2012Mai 30, 2012.
Advertisements

Overview Network security involves protecting a host (or a group of hosts) connected to a network Many of the same problems as with stand-alone computer.
Key Management. Shared Key Exchange Problem How do Alice and Bob exchange a shared secret? Offline – Doesnt scale Using public key cryptography (possible)
Key distribution and certification In the case of public key encryption model the authenticity of the public key of each partner in the communication must.
Akshat Sharma Samarth Shah
CS470, A.SelcukCryptographic Authentication1 Cryptographic Authentication Protocols CS 470 Introduction to Applied Cryptography Instructor: Ali Aydin Selcuk.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Building web applications on top of encrypted data using Mylar Presented by Tenglu Liang Tai Liu.
More on SSL/TLS. Internet security: TLS TLS is one of the more prominent internet security protocols. TLS is one of the more prominent internet security.
Lect. 18: Cryptographic Protocols. 2 1.Cryptographic Protocols 2.Special Signatures 3.Secret Sharing and Threshold Cryptography 4.Zero-knowledge Proofs.
CSE 425: Semantic Analysis Semantic Analysis Allows rigorous specification of a program’s meaning –Lets (parts of) programming languages be proven correct.
Http Web Authentication Web authentication is used to verify a users identity before allowing access to certain web pages On web browsers you get a login.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
1 WebdamExchange and WebdamLog: some models for web data management Emilien Antoine, Meghyn Bienvenu, Alban Galland Webdam WS, 04/03/2011.
 Public key (asymmetric) cryptography o Modular exponentiation for encryption/decryption  Efficient algorithms for this o Attacker needs to factor large.
 Key exchange o Kerberos o Digital certificates  Certificate authority structure o PGP, hierarchical model  Recovery from exposed keys o Revocation.
1 Adaptive Management Portal April
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
CMSC 414 Computer (and Network) Security Lecture 2 Jonathan Katz.
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
CMSC 414 Computer and Network Security Lecture 21 Jonathan Katz.
8-1 What is network security? Confidentiality: only sender, intended receiver should “understand” message contents m sender encrypts message m receiver.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
CMSC 414 Computer and Network Security Lecture 19 Jonathan Katz.
More on AuthenticationCS-4513 D-term More on Authentication CS-4513 Distributed Computing Systems (Slides include materials from Operating System.
1 Ivan Lanese Computer Science Department University of Bologna Italy Concurrent and located synchronizations in π-calculus.
Network Security – Part 2 V.T. Raja, Ph.D., Oregon State University.
The Internet & The World Wide Web Notes
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
Alexander Potapov.  Authentication definition  Protocol architectures  Cryptographic properties  Freshness  Types of attack on protocols  Two-way.
Secure Systems Research Group - FAU Patterns for Digital Signature using hashing Presented by Keiko Hashizume.
Company/Product Overview. You have lots of files all over the place.
Chapter 1 Overview of Databases and Transaction Processing.
IDENTITY MANAGEMENT Hoang Huu Hanh (PhD), OST – Hue University hanh-at-hueuni.edu.vn.
Lecture 7 Page 1 CS 236 Online Password Management Limit login attempts Encrypt your passwords Protecting the password file Forgotten passwords Generating.
Lecture 19 Page 1 CS 111 Online Symmetric Cryptosystems C = E(K,P) P = D(K,C) E() and D() are not necessarily the same operations.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Identity Management Report By Jean Carreon and Marlon Gonzales.
Network Security – Part 2 (Continued) Lecture Notes for May 8, 2006 V.T. Raja, Ph.D., Oregon State University.
Lecture 11: Strong Passwords
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
Chapter 3: Basic Protocols Dulal C. Kar. Key Exchange with Symmetric Cryptography Session key –A separate key for one particular communication session.
1 Security on Social Networks Or some clues about Access Control in Web Data Management with Privacy, Time and Provenance Serge Abiteboul, Alban Galland.
23-1 Last time □ P2P □ Security ♦ Intro ♦ Principles of cryptography.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
Confidentiality-preserving Proof Theories for Distributed Proof Systems Kazuhiro Minami National Institute of Informatics FAIS 2011.
Lightweight Consistency Enforcement Schemes for Distributed Proofs with Hidden Subtrees Adam J. Lee, Kazuhiro Minami, and Marianne Winslett University.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
The TAOS Authentication System: Reasoning Formally About Security Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Upper OSI Layers Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
COEN 350: Network Security Authentication. Between human and machine Between machine and machine.
Encryption. Introduction The incredible growth of the Internet has excited businesses and consumers alike with its promise of changing the way we live.
MEMBERSHIP AND IDENTITY Active server pages (ASP.NET) 1 Chapter-4.
DIGITAL SIGNATURE.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
OWL Representing Information Using the Web Ontology Language.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
6° of Darkness or Using Webs of Trust to Solve the Problem of Global Indexes.
Network Security Continued. Digital Signature You want to sign a document. Three conditions. – 1. The receiver can verify the identity of the sender.
Raluca Paiu1 Semantic Web Search By Raluca PAIU
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
1 Authorization Sec PAL: A Decentralized Authorization Language.
Chapter 1 Overview of Databases and Transaction Processing.
A formal study of collaborative access control in distributed datalog Serge Abiteboul – Inria & ENS Cachan Pierre Bourhis CNRS & Lille Univ. & Inria Victor.
Password Management Limit login attempts Encrypt your passwords
Server Concepts Dr. Charles W. Kann.
Architecture Competency Group
KERBEROS.
Presentation transcript:

1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

2 Organization Introduction Representing all Web information as logical sentences Representing all Web data management as logical rules Some clues about implementation Conclusion

Introduction

4 Context of the work presented here ERC Grant Webdam on Web Data Management of Serge Abiteboul with two INRIA teams, Leo-Iasi (ex Gemo, INRIA Saclay) and Dahu (LSV, ENS Cachan) Joint work with many people: Émilien Antoine, Serge Abiteboul, Meghyn Bienvenu, David Gross-Amblard, Amélie Marian, Bruno Marnette, Neoklis Polyzotis, Philippe Rigaux, Marie-Christine Rousset…

5 Context: Web data management Scale: lots of users, servers, large volume of data… Distribution heterogeneity: Cloud (social networks), P2P (DHT, gossiping)… Security heterogeneity: login, https, crypto, hidden URL… Terminology heterogeneity: annotation, semantic Web, ontologies… Incomplete information: inconsistencies, belief, trust… The heterogeneity keeps increasing with new systems and new applications arriving Consequence 1: difficulty to perform data integration/management Consequence 2: impossibility to keep control over its own data

6 Thesis: Web data = distributed knowledge Work plan 1. Represent all Web information as logical sentences 2. Represent all Web data management as logical rules 3. Develop a system to validate these ideas Motivation for the approach Facilitate the design/implementation of complex systems Facilitate the control/surveillance of complex systems Use reasoning to optimize query evaluation Use reasoning for semantics/ontologies Use reasoning to manage access control and protect data Use reasoning to analyze properties of systems

7 Motivating example Alice : get me the pictures of my friends where I am with Bob? What is going on: Find the friends of Alice (The iPhone of Alice may remember it) For each answer, say Sue, find where Sue keeps her pictures (She may keep her pictures on Picasa) Find the means to access Sue’s pictures (Alice may ask the private url to a common friend) Find the photos with Bob and Alice (e.g. by querying the meta-data)

8 Motivating example Alice : get me the pictures of my friends where I am with Bob? Issues: heterogeneity of friends Heterogeneity of hosting: Some keep their pictures on trusted servers such as Picasa, some put in on untrusted DHT, some have them on their smartphones… Heterogeneity of access-control: Some are public, some use login- password, some use private url, some use cryptography… Heterogeneity of data description: they may use different models of meta-data (taxonomies, ontologies…)

Representing all Web information as logical sentences

10 The information belongs to someone Each information belongs to a principal A principal has an identity (URI) which can be authenticated Two kinds of principal: peer and virtual principal A peer: alice-laptop, alice-iPhone, picasa, facebook, dht-peer- 124, … Storage and processing capabilities A peer typically has a URL and can be sent query/update requests A virtual principal: alice, alice-friends, roc14 A virtual principal relies on peers for storage and processing

11 The kind of information we are talking about Data: pictures, movies, music, s, ebooks, reports Localization: bookmarks, knowledge such as Alice has an account in Facebook, Sue puts her pictures in Picasa Access: login/password, access rights on servers Annotations /Ontologies: semantic tags in Picasa,RDFS, OWL Services: search engines, yellow pages, dictionaries… Incomplete information: beliefs, probabilistic information… And more…

12 Logical statements to represent information Data: Document: Collection: Localization: picasa/alice) Access right: Access secret : “HG-FT23”) Ontologies: human-being) Services: $City, $Y) Belief: Etc.

13 WebdamExchange focus: authenticated knowledge Base statement: someone states (….) It is annotated with a proof that “someone” can write data of alice In the cryptographic setting, it is a signature of the whole statement using the write secret key of alice Keeping trace of provenance: alice-laptop states (….) requester bob at 12:30, 10/08/2009 alice-Laptop is the performer (the peer who did the update of the data of Alice) bob is the requester (the peer or the user who requested the update) The content is possibly encrypted: alice-laptop states (….) protected for requester bob at 12:30, 10/08/2009

14 WebdamExchange focus: authenticated knowledge Communication: external knowledge is knowledge about other principals: alice-laptop says (alice-laptop states (….) requester bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009 alice-laptop is the performer of the communication sue-iphone is the receiver of the communication External knowledge is authenticated by the performer and is stored by the receiver. The external knowledge keep a trusted trace of the provenance and communication are pilled-up: sue-iphone says (alice-laptop says (alice-laptop states (….) requester bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009) to bob-iphone at 13:10, 15/10/2009 The time is the time of the performer, there is no global clock

15 The model covers a wide range of data The model does not prescribe any particular architecture for distribution Gossiping, DHT, centralized server Combination of these Based on an abstract notion of localization The model does not prescribe how access control is enforced, e.g.: Documents in Web servers with access protected by login/password Documents protected by cryptographic keys in public sites Based on an abstract notion of secret and hint

16 Summary of WebdamExchange All the information forms a trusted knowledge base Each peer manages some portion of the knowledge base Now, we have to use this distributed knowledge base … for the management of the distributed knowledge base!

Representing all Web data management as logical rules

18 From WebdamExchange to WebdamLog The logical part of the WebdamExchange statements can easily be translated into datalog facts. Most of the reasoning of the system can be done using the logical form and datalog-like rules It motivates WebdamLog, a rule-based language for web data management

19 Why datalog? Datalog: very popular in the 90’s, prehistory by Web time + Nicer/more compact syntax; easy to extend - Recursion not really essential Datalog extensions Negation and aggregate functionstons of works on that Updates, time, trees, distributionfewer works on it We use a datalog-like language influenced by Active XMLfor distribution and intensional data Hellerstein’s Dedalusfor time and performance

20 Webdamlog Facts are of the form: Rules are of the form: :- (not) …, (not) R,Ri are message terms P,Pi are peer terms U,Ui are tuples of terms Safety condition Intuition: if the body holds for some valuation v, the message is sent to the peer vP Issue: what happen if the body of the rules mentions different peers?

21 Webdamlog System: A finite set  of peers Each peer p in  has a local program P(p) and some delegated program D(p) consisting of finite sets of rules Each peer p in  has a database I(p), consisting of a finite set of facts of the form Semantics: in a state (P,D,I), choose randomly some p Evaluate (P(p)UD(p))(I(p)) This defines the new database I’(p) This adds facts and update rules of the other peers to define (D’(q),I’(q)) for each q The changes to each q are installed synchronously – we will see how to avoid it if desired Choose another peer and keep going (in a fair way) Peer1 Peer2 Peer3 Peer4

22 Features of WebdamLog illustrated Alice: get me the pictures of my friends where I am with Bob? :- “Alice”), “Bob”) Peers and messages as data: they are reified is extensional, in I(alice-iphone) is intensional, in P(alice-iphone)UD(alice- iphone) is bounded to a relation of (possibly) another peer is a service of that peer

23 Features of WebdamLog illustrated Delegation of rules Alice: get me the pictures of my friends where I am with Bob? :- “Alice”), “Bob”) :- Then alice-iphone installs the following rule at picasa/sue: :- “Alice”), “Bob”) picasa/sue will send the photos as extensional facts to alice-iphone. When Alice terminates her query, it cancels all the delegations.

24 Managing rules at other peers This is complex Regarding implementation, one manages instantiations of rules, i.e., rules and valuation The content of valuations may be constantly changing There could be some negations in the rules This is a security risk Someone else is installing data (facts) or code (rules) in a peer Need to control that carefully

25 Does it means something? Some not-so trivial theorems about positive case or stratified negation case insuring Church-rosser properties (convergence) Natural simulation by centralized systems Some even-less-trivial theorems about comparing expressivity of different variations of WebdamLog: without exchanging rules, without exchanging intensional data, with time-stamp…

26 More refined asynchronicity To model message from peer p to peer q, we may use a “peer” net pq that captures the network Replace a call at p by pq (u) net pq should just relay messages: :- pq ($U) Problem: all messages from p to q in the net arrive at the same time Better with time pq (u,t) where t is the time of the send at p :- pq (U,T), min( T, pq (U,T)), using min aggregate function

27 Summary of WebdamLog Peer are asynchronically running their datalog programs They exchange facts and delegations of rules

Some clues about implementation

29 Implementation We are implementing two kinds of peers WEP (Webdam Exchange Peer) – all functionalities IWEP (iPad Webdam Exchange Peer) – limited functionalities; rely on proxies We are implementing a social network on top of the system

Conclusion

31 Some cool results and still a lot of works WebdamExchange and WebdamLog models capture some nice problems of web data management: distribution, access control… Their good semantics allow us to prove theorems! We are implementing the corresponding system! Many issues are still open Concurrency, optimization, implementation Defining and verifying protocols (access control is not violated, one gets all the information one has access to) Looking for a killer application