On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences
Data Evolvement This is the era of Data. – Databases, text, blogs, social data,… – Huge volumes Evolving Through Automatic Tools Sent Between Applications and Users
Data Provenance Understanding how and why data has evolved is of fundamental importance – For authentication Both origin and propagators of data should be trustworthy – For access control Confidentiality constraints interplay with the transformation – For hypothetical reasoning What if we change a piece of data? How can we optimally affect data evolvement
Example Alice posted photos with David David is worried about Eve seeing his photos OR AND NOT () ( )
Tracking Provenance The logic is already implemented (e.g. to decide what photos to show) We develop tools to “instrument” applications with provenance tracking. Simply maintaining an “activity log” is not good enough. – We want also the possible “reasons” for activities – E.g. “not blacklisted” is not an activity Instead we create formulas in generic algebraic constructions based on semirings We also develop tools that use the provenance information for analysis.
Generic Expression Trust: OR AND NOT () () False OR ( ( True OR True) AND NOT False ) = True Number of paths (if Alice and Eve are not friends) : 0 + ( ( ) x 1 ) = 2 min ( (0:05 min 0:08 ) + 0:00 ) = 0:05 Latency:
Provenance for SQL Queries Amsterdamer, D., Tannen, Provenance for Aggregate Queries [PODS ‘11] Amsterdamer, D., Tannen, On the limitations of Provenance for Queries with Difference [Tapp ‘11] D., Milo, Roy, Tannen, Circuits for Datalog Provenance [ICDT ‘14] Amsterdamer, D.,Green, Karvounarakis, Tannen, Semiring-based Provenance for SQL Queries (In preparation) D., Moskovitch, Provenance for Relational Updates [In preparation] Dep.EmpProv. Eng.AliceS Eng.BobT SalesCarolS EmpsGoodEmps EmpProv. AliceC BobS CarolT Dep.Prov. Eng.S·C+T·S = S + T = S SalesS·T = T π Dep (Emps GoodEmps)
Provenance for Social and Web Data Bienvenu, D., Suchaneck, Provenance for Web 2.0 Data [Secure Data Management ‘12] Abiteboul, Bienvenu, D., Deduction in the Presence of Distribution and Contradictions [WebDB ‘12] Abiteboul, D., Vianu, Deduction with Contradictions in Datalog [ICDT ‘14] Amarilli, D., Senellart, Provenance for Order-Aware Transformations (In preparation)
PROPOLIS: Provenance for Process Analysis D., Moskovich, Tannen, PROPOLIS: Provisioned Analysis of Data-Centric Processes [VLDB ’13] D., Moskovich, Tannen, A Provenance Framework for Data-Dependent Process Analysis (Submitted) D., Moskovich, Provenance for Distributed Processes (In preparation)
Thank you!