Viewing the Web as a Distributed Knowledge Base Serge Abiteboul INRIA Saclay, Collège de France and ENS Cachan ICDE 2012Mai 30, 2012.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Repaso: Unidad 2 Lección 2
Symantec 2010 Windows 7 Migration Global Results.
1 A B C
1 Leveraging social networking in your business marketing Leveraging social networking in your business marketing.
Simplifications of Context-Free Grammars
Variations of the Turing Machine
3rd Annual Plex/2E Worldwide Users Conference 13A Batch Processing in 2E Jeffrey A. Welsh, STAR BASE Consulting, Inc. September 20, 2007.
AP STUDY SESSION 2.
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Slide 1 FastFacts Feature Presentation October 16 th, 2008 We are using audio during this session, so please dial in to our conference line… Phone number:
David Burdett May 11, 2004 Package Binding for WS CDL.
1. 2 Begin with the end in mind! 3 Understand Audience Needs Stakeholder Analysis WIIFM Typical Presentations Expert Peer Junior.
1. 2 Begin with the end in mind! 3 Understand Audience Needs Stakeholder Analysis WIIFM Typical Presentations Expert Peer Junior.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Custom Services and Training Provider Details Chapter 4.
CALENDAR.
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
Turing Machines.
Anything But Typical Learning to Love JavaScript Prototypes Page 1 © 2010 Razorfish. All rights reserved. Dan Nichols March 14, 2010.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Lexical Analysis Arial Font Family.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Why Do You Want To Work For Us?
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Artificial Intelligence
By CA. Pankaj Deshpande B.Com, FCA, D.I.S.A. (ICA) 1.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Clock will move after 1 minute
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
9. Two Functions of Two Random Variables
Introduction Peter Dolog dolog [at] cs [dot] aau [dot] dk Intelligent Web and Information Systems September 9, 2010.
1 DIGITAL INTERACTIVE MEDIA Wednesday, October 28, 2009.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
A formal study of collaborative access control in distributed datalog Serge Abiteboul – Inria & ENS Cachan Pierre Bourhis CNRS & Lille Univ. & Inria Victor.
Presentation transcript:

Viewing the Web as a Distributed Knowledge Base Serge Abiteboul INRIA Saclay, Collège de France and ENS Cachan ICDE 2012Mai 30, 2012

2 The Web as a distributed knowledge base WebdamLog: a rule-based language for the Web The WebdamLog system Inconsistencies and uncertainty Conclusion

Mai 30, The Web hypertext universal library of text and multimedia personal/private data social data

Mai 30, A typical Web users data What kinds of data? data: photos, music, movies, reports, metadata: photo taken by Alice in Paris on... ontologies: Alices ontology and mapping with other ontologies localization: Alices pictures are on Picasa, back-ups are at INRIA security: Facebook credentials (Alice, ) annotations: Alice likes Elvis website beliefs: Alice believes Elvis is alive external knowledge: Bob keeps copies of Alices pictures time, provenance,... all kinds Social data

Mai 30, A typical Web users data What kinds of data? Where is the data? laptop, desktop, smartphone, tablet, car computer mail, address book, agenda Facebook, LinkedIn, Picasa, YouTube, Tweeter svn, Google docs also access to data / information of family, friends, companies associations all kinds everywhere

Mai 30, A typical Web users data What kinds of data? Where is the data? all kinds everywhere What kind of organization? terminology: different ontologies systems: personal machines, social networks distribution: different localization security: different protocols quality: incomplete / inconsistent information heterogeneous

Mai 30, Example of processing Alice and Bob are getting engaged. Their friends want to offer them an album of photos where they are together To make such a photo album Find friends of Alice & Bob (say with Facebook) for each friend, find where she keeps her photos (say, Picassa) find the means to access her photos possibly via friends find the photos that feature Bob and Alice together, e.g., using tags or face recognition software possibly ask someone to verify the results Some reasoning is needed to execute these tasks (automatically)!

Mai 30, 2012 A typical Web user Overwhelmed by the mass of information Cannot find the information needed Is not aware of important events Cannot manage/control how others access and use his/her own data 8

Mai 30, 2012 YOU need help! How can systems help? We need to move from a Web of text to a Web of knowledge In the spirit of semantic Web To better support user needs, Systems need to analyze what is happening and construct knowledge Systems should exchange knowledge Systems should reason and infer knowledge 9

Mai 30, 2012 Thesis All this forms a distributed knowledge base with processing based on automated reasoning 10

Mai 30, Issues Distributed reasoning Exchanging facts and rules Contradictions Missing and noisy data WebdamLog Ignore for now

Mai 30, The Web as a distributed knowledge base WebdamLog: a rule-based language for the Web The WebdamLog system Inconsistencies and uncertainty Conclusion

Mai 30, WebdamLog: a datalog-style language Why datalog? A prehistoric language by Web time... + nice and compact syntax + well-studied with many extensions + recursion essential in a distributed setting: cycles in the network Extensional facts friend(peter,paul) friend(paul, mary) friend(mary,sue) Datalog programfof(x,y) :- friend(x,y) fof(x,y) :- friend(x,z), fof(z,y) Intentional facts fof(peter,paul) fof(peter,mary) fof(peter, sue) fof(paul, mary)fof(paul, sur) fof(mary,sue)

Mai 30, WebdamLog Extends datalog negation, updates, distribution, delegation, time For a world that is distributed: autonomous and asynchronous peers dynamic: knowledge evolves; peers come and go Influenced by Active XML (INRIA) - for distribution & intentional data Dedalus (UC Berkeley) - for time & implementation

Mai 30, Warning Not as simple Not as beautiful More procedural But this is needed for real Web applications! WebdamLog is not datalog

Mai 30, Schema (π, E, I, σ) π possibly infinite set of peer IDs E set of extensional relations of the form I set of intentional relations of the form σ sorting function for each is an integer (its sort)

Mai 30, Facts Facts are of the form 1,..., a n ), where m is a relation name& p is a peer name a 1,..., a n are data values (n is the arity of the set of data values includes the relations and peer names Examples paul) extensional paul) intentional

Mai 30, Examples of facts data & metadata: Paris, 11/11/2011) ontology: theKing) annotations: encyclopedia) localization: picasa/alice) access rights: friends, read) security:

Mai 30, Rules Rules are of the form :- (not) $R 1 ($U 1 ),..., (not) $R n ($U n ) where $R, $R i are relation terms $P, $P i are peer terms $U, $U i are tuples of terms Safety condition $R and $P must appear positively bound in the body each variable in a negative literal must appear positively bound in the body A term is a variable or a constant Examples coming up, stay tuned

Mai 30, Semantics A state (I, Γ, Γ * ) : each peer p has extensional facts I(p), defining the local state of p local rules Γ(p), defining the program of p rules Γ * (p,q) that have been delegated to p by some peer q

Mai 30, State transition Choose some peer p randomly – asynchronously Compute the transition of p the database updates at p the messages sent to other peers the delegations of rules to other peers Keep going forever (I 0, Γ 0, ) (I 1, Γ 1, Γ 1 * )... (I n, Γ n, Γ n * )... Fair sequence: each peer is selected infinitely often

Mai 30, 2012 The semantics of rules Classification based on locality and nature of head predicates (intentional or extensional) Local rule at my-laptop: all predicates in the body of the rules are from my-laptop Local with local intentional headclassic datalog Local with local extensional headdatabase update Local with non-local extensional headmessaging between peers Local with non-local intentional head view delegation Non-localgeneral delegation 22

Mai 30, Local rules with local intentional head Example: Rule at peer my-laptop friend is extensional, fof is intentional $y) :- :- iphone($z,$y) fof is the transitive closure of friend Datalog = WebdamLog with only local rules and local intentional head

Mai 30, Local rules with local extensional head A new fact is inserted into the local database $loc) :- $loc),

Mai 30, Local rules with non-local extensional head A new fact is sent to an external peer via a message Happy birthday!) :- $message, $peer, $date) Extensional facts: 6) sendmail, gmail.com, March 6) Happy birthday)

Mai 30, Local rules with non-local intentional head View delegation! $boy) :- $loc), $loc) Semantics of gossip-site is a join of relations girls and boys from my-iphone Formally, my-iphone delegates a rule gossip-site (g,b) for each g, b, l,

Mai 30, Non-local rules: general delegation (at my-iphone): $boy) :- $loc), $loc) Suppose that Julia's birthday) holds. Then my-iphone installs the following rule at alice-iphone (at alice-iphone): $boy) :- Julia's birthday) When Julia's birthday) no longer holds, my-iphone uninstalls the rule

Mai 30, Non-local rules: general delegation (at my-iphone): $boy) :- $loc), $loc) An alternative, more database-ish, way of looking at this: at my-iphone : $loc):- $loc) at alice-iphone : $boy) :- $loc), $loc) view delegation

Mai 30, Complexity of delegation: illustration fof(x,y) :- friend(x,y) (at p) :- $q ), $q (x,y) If q 1 ) holds, this rule installs (at q 1 ) :- 1 (x,y) If contains tuples 1 ),...., ) This rule will install rules! for i=1 to (at q i ) :- i (x,y) Data complexity transformed into program complexity

Mai 30, Summary of results [PODS 2011] Formal definition of the semantics of WebdamLog Results on expressivity the model with delegation is more general, unless all peers and programs are known in advance Convergence is very hard to achieve positive WebdamLog strongly stratified programs with negation

Mai 30, The Web as a distributed knowledge base WebdamLog: a rule-based language for the Web The WebdamLog system Inconsistencies and uncertainty Conclusion

Mai 30, WebdamLog peers [demo ICDE 2011, WebDB 2011] Support communication with other peers Support common security protocols Support wrappers to external systems such as Facebook Manage knowledge store knowledge (facts and rules) exchange knowledge with other peers perform reasoning

Mai 30, 2012 WebdamLog peers communication security engine peer Web services wrapper s

Mai 30, WebdamLog engine [ongoing work] Based on Bud developed at UC Berkeley, implemented in Ruby, open-source supports Bloom - an extension of datalog implements communication between peers serious experiments

Mai 30, WebdamLog inference: beyond Bud Translation of WebdamLog to Bloom (Buds language) Features of WebdamLog not supported in Bud 1. Variable relation and peer names 2. Delegation: non-local rules, non-local relations in the body 3. Adding and removing rules at runtime: needed because of delegation

36 Example of runtime inference ( rule 1 at p) $boy) :- $loc), $loc) (rule 2 at q) $boy) :- $boy), allPeers($peer) (rule 3 at q) $boy) :- $boy) direct knowledge hearsay

Mai 30, Adding facts at runtime John) × q rule 3 × × Julias birthday) Julia's birthday) rule 1 × × John) × + Maintain a provenance graph for update management

Mai 30, Removing facts at runtime Julia's birthday) rule 1 Julias birthday) × Avoid recomputation at each update using provenance × × John) × q rule 3 × × John) +

Mai 30, Provenance graphs Records the history of derivation Provenance semiring semantics [Green et al. 07] alternative or joint use of data facts, rules, peers are nodes Useful for performance optimization Other uses explain results to users specify and verify access rights

Mai 30, The Web as a distributed knowledge base WebdamLog: a rule-based language for the Web The WebdamLog system Inconsistencies and uncertainty Conclusion

Mai 30, Motivation Contradictions (in intentional or extensional data) come from errors, lies, rumors, updates FD violations: some think Alice was born in Paris, others that she was born in London opinions: some think Brahms is great; others dont Uncertainty comes from lack of information contradictions Probabilities may be used to measure uncertainty 80% think Alice was born in Paris, 20% in London sources: we observed that Peter is wrong 20% of the time

Mai 30, Roadmap We consider reasoning in an uncertain and inconsistent world We do this first for the centralized setting then with distribution finally with probabilities Datalog + FDs WebdamLo g and sampling

Mai 30, Datalog example Where is Alice? A relation IsIn(person, city, peer) with the FD (person, peer) city peer believes person to be in city Consider a datalog rule IsIn($per, $city, $p) :- IsIn($per, city, $p), friend($p, $p) IsIn(Alice, London, Bob)IsIn(Alice, Paris, Sue) friend(my-iphone, Bob)friend(my-iphone, Sue)

Mai 30, 2012 Datalog with nondeterministic fact-at-a-time semantics Immediate consequence operator: a single fact is derived only if it does not contradict known facts A possible world is a maximal consequence. Example: IsIn($per, $city, $p) :- IsIn($per, city, $p), friend($p, $p) IsIn(Alice, London, Bob)IsIn(Alice, Paris, Sue) friend(my-iphone, Bob)friend(my-iphone, Sue) Infer: IsIn(Alice, London, my-iphone) 44 In practice set-at-a-time semantics is more efficient Infer: IsIn(Alice, Paris, my-iphone)

Mai 30, 2012 Discussion Inflationary non-deterministic semantic (stubborn choices) Related to 2-stable models Proof theory Possible facts NP-complete Sure facts coNP-complete Many possible alternative semantics 45

Mai 30, 2012 Distributed setting: use WebdamLog To simplify, we focus only on local and deductive rules The semantics is inflationary and non-deterministic A subtlety: Each peer has to recall the choices made to always make the same choice in the future (when talking to other peers): stubborn The causes of uncertainty Uncertainty in base facts Uncertainty in the order of peer activations Uncertainty in choosing immediate consequences 46

Mai 30, 2012 Probabilities Probabilistic interpretation to measure uncertainty For base facts, use independent probabilistic events Uniform distribution for the next peer to activate Uniform distribution in choosing the next immediate consequence Can be done efficiently if there is a single FD & more complicated otherwise 47

Mai 30, 2012 Example: captures voting Bobs rules :- Suppose each peer has similar rules Claim: For acyclic networks, the probability of a peer inferring a fact is exactly its relative support at his friends Note: this also give semantics for more complicated cases such as networks with cycles 48

Mai 30, 2012 Query answering Resulting tuples of a query q have associated probabilities Exact evaluation using c-tables Too costly in practice Sampling technique Each peer makes probabilistic choices along the way Converges to the probability of q when the number of samples grows 49

Mai 30, The Web as a distributed knowledge base WebdamLog: a rule-based language for the Web The WebdamLog system Inconsistencies and uncertainty Conclusion

Mai 30, 2012 Thesis Let us turn the Web into a distributed knowledge base with billions of users supported by billions of systems analyzing information extracting knowledge exchanging knowledge inferring knowledge 51

Mai 30, 2012 Contribution WebdamLog A language for distributed data management [PODS 2011] Datalog with distribution, updates, messaging Main novelty: delegation System implementation Handles heterogeneity, localization and access control [WebDB 2011] WebdamlExchange peer In Java [demo ICDE 2011] WebdamLog engine based on Bud 52

Mai 30, 2012 On-going work The implementation More optimization strategies such as Magic Set Probabilistic WebdamLog Query processing Explaining results to users: top-k proofs Collaboration between peers to answer queries Lots of fun & many open questions 53

Mai 30, 2012 Issues Access control based on provenance Concurrency control Difficulty: right revocation Optimization Links with optimization in Active XML Verification of applications Links with business artifacts 54

ICDE 2012Mai 30, 2012 Joint work with Emilien Antoine, Meghyn Bienvenu, Daniel Deutch, Alban Galland, Kristian Lyndbaek, Julia Stoyanovich, Jules Testard

After a short break Two authors of the Web Data Management Book (aka Jorge) Two friends

Mai 30, 2012 Marie-Christine Rousset Reasoning in the Semantic Web Professor of CS at the Univ. Grenoble. PhD (1983) and a Thèse d'Etat (1988) in CS from Univ. Paris-Sud. Best paper award from AAAI in 1996 Junior member of Institut Universitaire de France and Senior member in 2011-now Interest;: Knowledge Representation, Information Integration, Pattern Mining and the Semantic Web. 57

Mai 30, 2012 Pierre Senellart Social networks Associate Professor at Telecom ParisTech PhD from Univ. of Paris-Sud Information Director for Journal of ACM Interest: Data management, Web data management, Probabilistic data Interest: Natural language processing Software Engineer at SYSTRAN SA 58