Bootstrapping Privacy Compliance in Big Data System Shayak Sen, Saikat Guha et al Carnegie Mellon University Microsoft Research Presenter: Cheng Li.

Slides:



Advertisements
Similar presentations
Policy Auditing over Incomplete Logs: Theory, Implementation and Applications Deepak Garg 1, Limin Jia 2 and Anupam Datta 2 1 MPI-SWS (work done at Carnegie.
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
- Vasvi Kakkad.  Formal -  Tool for mathematical analysis of language  Method for precisely designing language  Well formed model for describing and.
ICE1341 Programming Languages Spring 2005 Lecture #6 Lecture #6 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
C6 Databases.
Building Secure Distributed Systems The CIF model : Component Information Flow Lilia Sfaxi DCS Days - 26/03/2009.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
BUSINESS B2 Ethics.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Identity Management Based on P3P Authors: Oliver Berthold and Marit Kohntopp P3P = Platform for Privacy Preferences Project.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Ashish Kundu CS590F Purdue 02/12/07 Language-Based Information Flow Security Andrei Sabelfield, Andrew C. Myers Presentation: Ashish Kundu
ISBN Chapter 3 Describing Syntax and Semantics.
Process Model for Access Control Wael Hassan University of Ottawa Luigi Logrippo, Université du Québec en Outaouais.
8.2 Discretionary Access Control Models Weiling Li.
Comp 205: Comparative Programming Languages Semantics of Imperative Programming Languages denotational semantics operational semantics logical semantics.
PR-OWL: A Framework for Probabilistic Ontologies by Paulo C. G. COSTA, Kathryn B. LASKEY George Mason University presented by Thomas Packer 1PR-OWL.
Using Interfaces to Analyze Compositionality Haiyang Zheng and Rachel Zhou EE290N Class Project Presentation Dec. 10, 2004.
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
Large-Scale Deduplication with Constraints using Dedupalog Arvind Arasu et al.
Verifiable Security Goals
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Model Eco-systems Decision Systems Lab University of Wollongong.
1 Ivan Lanese Computer Science Department University of Bologna Italy Concurrent and located synchronizations in π-calculus.
Describing Syntax and Semantics
User Domain Policies.
Names and Bindings Introduction Names Variables The concept of binding Chapter 5-a.
CS-550 (M.Soneru): Protection and Security - 2 [SaS] 1 Protection and Security - 2.
A Modeling Language to Model Norms Karen Figueiredo Viviane Torres da Silva Universidade Federal Fluminense (UFF)
Annual Workshop February 5th, A Formal Approach to Analyze Privacy in Electronic Services MSEC Koen Decroix [Koen Decroix – MSEC - KU Leuven]
Names Variables Type Checking Strong Typing Type Compatibility 1.
Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.
CS162 Week 8 Kyle Dewey. Overview Example online going over fail03.not (from the test suite) in depth A type system for secure information flow Implementing.
Benjamin Gamble. What is Time?  Can mean many different things to a computer Dynamic Equation Variable System State 2.
EASEAndroid: Automatic Analysis and Refinement for SEAndroid Policy via Large-scale Audit Log Analytics Presenter: Hongyang Zhao Ruowen Wang, Xinwen Zhang,
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
Chapter 2. Core Defense Mechanisms. Fundamental security problem All user input is untrusted.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
1 Dept of Information and Communication Technology Creating Objects in Flexible Authorization Framework ¹ Dep. of Information and Communication Technology,
Next-generation databases Active databases: when a particular event occurs and given conditions are satisfied then some actions are executed. An active.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Requirements Specification. Welcome to Software Engineering: “Requirements Specification” “Requirements Specification”  Verb?  Noun?  “Specification”
14.1/21 Part 5: protection and security Protection mechanisms control access to a system by limiting the types of file access permitted to users. In addition,
Chapter 3 Part II Describing Syntax and Semantics.
Semantics In Text: Chapter 3.
ESDI Workshop on Conceptual Schema Languages and Tools
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
Computer Security: Principles and Practice
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
CS162 Week 8 Kyle Dewey. Overview Example online going over fail03.not (from the test suite) in depth A type system for secure information flow Implementing.
1 Ontology Evolution within Ontology Editors Presentation at EKAW, Sigüenza, October 2002 L. Stojanovic, B. Motik FZI Research Center for Information Technologies.
T imed Languages for Embedded Software Ethan Jackson Advisor: Dr. Janos Szitpanovits Institute for Software Integrated Systems Vanderbilt University.
A Framework for Automated and Composable Testing of Component-based Services Miguel A. Jiménez, Ángela Villota, Norha M. Villegas, Gabriel Tamura, Laurence.
ALLOY: A Formal Methods Tool Glenn Gordon Indiana University of Pennsylvania COSC 481- Formal Methods Dr. W. Oblitey 26 April 2005.
Lesson 14: Configuring File and Folder Access MOAC : Configuring Windows 8.1.
Introduction to Software Modeling
Verifiable Security Goals
Automated Experiments on Ad Privacy Settings
Type Checking, and Scopes
CS 326 Programming Languages, Concepts and Implementation
ece 720 intelligent web: ontology and beyond
Multiple Aspect Modeling of the Synchronous Language Signal
Formal Methods in software development
Semantics In Text: Chapter 3.
Policy reasoning A policy is a set of norms that define optimal behavior of agents in a system What does policy reasoning usually entail ? Proving that.
Presentation transcript:

Bootstrapping Privacy Compliance in Big Data System Shayak Sen, Saikat Guha et al Carnegie Mellon University Microsoft Research Presenter: Cheng Li

We have your everything Your bank account Your mobile Your social network Your shopping account

We will keep it as a secret

This is how we work Legal team craft privacy policy Privacy Champion interprets policy Developer writes code Audit Team verifies compliance

Life could be much easier encode refine code analysis

Outline Introduction LEGALEASE ◦ Goal ◦ Syntax ◦ Domain-Specific Attribute ◦ Formal Semantics ◦ Properties GROK Validation Discussion Conclusion

LEGALEASE Goal ◦ Usability: Policy clauses are structured very similarly to clauses in English language policy. ◦ Expressivity: Clauses are built around an attribute abstraction that allows the language to evolve as policy evolves. ◦ Compositional Reasoning: LEGALEASE provides meaningful syntactic restrictions to allow compositional reasoning.

Outline Introduction LEGALEASE ◦ Goal ◦ Syntax ◦ Domain-Specific Attribute ◦ Formal Semantics ◦ Properties GROK Validation Discussion Conclusion

LEGALEASE Syntax Domain-Specific attributes are defined in concept lattice L EGLEASE Policies are checked at each node in the data dependency graph. Each node is labeled with attr’s name and set of values. ALLOW: permits node labeled with subset of values. DENY: forbids node labeled with sets that overlaps the attribute values.

LEGALEASE Example ◦ Full IP address will not be used for advertising. IP address may be used for detecting abuse. In such cases it will not be combined with account information. ◦ DENY DataType IPAddress UseForPurpose Advertising EXCEPT ALLOW DataType IPAddress:Truncated ALLOW DataType IPAddress UseForPurpose AbuseDetect EXCEPT DENY DataType IPAddress, AccountInfo

Outline Introduction LEGALEASE ◦ Goal ◦ Syntax ◦ Domain-Specific Attribute ◦ Formal Semantics ◦ Properties GROK Validation Discussion Conclusion

LEGALEASE Domain-specific Attribute ◦ Attribute values are organized as a concept lattice. ◦ Advantages of concept lattice:  Abstracts away semantics.  The lattice structure allows users to concisely define sets of elements through their least upper bound.  The lattice structure allows us to statically check the policy for certain classes of errors.

LEGALEASE Attribute define in the implementation ◦ InStore attribute: encode certain policies around collection and storage of data.

LEGALEASE Attribute define in the implementation ◦ UseForPurpose attribute: Encode the data usage.

LEGALEASE Attribute define in the implementation ◦ AccessByRole attribute: For encoding internal access-control based policies.

LEGALEASE Attribute define in the implementation ◦ DataType attribute:  Policy datatypes: types of data

LEGALEASE Attribute define in the implementation ◦ DataType attribute:  Policy datatypes: Category of data types  Limited typestate: A limited way of tracking history.

LEGALEASE Attribute define in the implementation ◦ DataType attribute:  Combining policy datatypes and typestates:  t:s where t is policy datatypes and s is typestates.

Outline Introduction LEGALEASE ◦ Goal ◦ Syntax ◦ Domain-Specific Attribute ◦ Formal Semantics ◦ Properties GROK Validation Discussion Conclusion

LEGALEASE Formal Semantics ◦ Notions:  T – a vector of sets of latice elements.  T x – the value of attribute x in T.  T G – Graph node.  T C – Policy clause vector.

LEGALEASE Formal Semantics ◦ where is ALLOW T C applies to a graph node T G if T G ⊑ T C ◦ is for each x, DENY T C applies to T G if

LEGALEASE Formal Semantics ◦ A graph node is allowed by an ALLOW clause if and only if the clause applies and is allowed by each exception.

LEGALEASE Formal Semantics ◦ A graph node is denied by an DENY clause if and only if the clause applies and is denied by each exception.

Outline Introduction LEGALEASE ◦ Goal ◦ Syntax ◦ Domain-Specific Attribute ◦ Formal Semantics ◦ Properties GROK Validation Discussion Conclusion

LEGALEASE Properties ◦ Totality: C should either allow T or deny it. ◦ Unicity: C cannot allow T and deny T at the same time. ◦ Monotonicity: If C 1 C 2, then for any T G, C 1 allows T G implies that C 2 allows T G and C 2 ;C 2 denies T G implies C 1 denies T G.

Outline Introduction LEGALEASE GROK Validation Discussion Conclusion

GROK GROK System Nodes are labeled with attribute Confidence value Different granularity

GROK Data Flow Edges and Labeling Nodes ◦ Log Analysis: Use log to bootstrap the coarse- grained data flow graph  Label file nodes with InStore attribute, entity nodes with AccessByRole attribute. (high confidence)  Label UseForPurpose attribute for each job. (low confidence)

Log Analysis

GROK Data Flow Edges and Labeling Nodes ◦ Syntactic Analysis: Label Datatype attr by syntactically analyzing the source code of the job that read or wrote data. (low confidence)

Syntactic Analysis

GROK Data Flow Edges and Labeling Nodes ◦ Semantic Analysis: Refine file nodes to a collection of column nodes. Refine job nodes to a sub-graph of nodes.

Semantic Analysis

GROK Data Flow Analysis ◦ Copy DataType attribute of one node to all nodes that data flows to. ◦ Join two attributes that has the same confidence value. ◦ If data flow through UDF(user defined function), check whether typestate has been modified. If it does, assign low confidence value.

GROK Verifying Labels ◦ Attributes verified by developers are assigned with high confidence value. low = IPAddress low confidence attribute related source file related low confidence attribute low = IPAddress low = UserAgent … source file reverse mapping Contact the developer with highest- ranking source file

GROK Implementation GRO K static semantic analyzer data flow analyzer processes individual jobs from the cluster log into the nodes and edges in data dependency graph without attr collates all the graph node, syntactic analysis and conservative data flow analysis, augmented with attrs.

Outline Introduction LEGALEASE GROK Validation Discussion Conclusion

Validation Scale ◦ 100 day period, 77 thousand jobs each day, submitted by over 7 thousand entities in over 300 functional units. ◦ 1.1 million unique lines of code, 21% changes on a day-to-day basis.

Validation Coverage simulate syntactic analyses on real-world DDG add dataflow analysis add manual verification

Validation Usability ◦ Online survey ◦ 12 participants from Microsoft privacy champions. ◦ Majority of participants were able to use LEGALEASE to code policy clauses

Validation Expressiveness

Outline Introduction LEGALEASE GROK Validation Discussion Conclusion

Discussion Expressiveness: LEGALEASE cannot express policies based on first-order temporal-logic. However, LEGALEASE is enough to express privacy policies. Infer sensitive data: Unless explicitly labeled, GROK cannot detect inference from non- sensitive data to sensitive data. Precision: Major source of precision comes from overly conservative treatment of UDF.

Discussion False Negatives: The authors are unable to characterize the exact nature of false negatives in the system due to lack of ground truth. Assurance: The system can not guarantee the result in face of adversarial developers’ behavior.

Outline Introduction LEGALEASE GROK Validation Discussion Conclusion

Conclusion Automated privacy compliance checking ◦ LEGALEASE: stating privacy policies as a form of restrictions on information flows. ◦ GROK: data inventory that maps low level data types in code to high level policy concepts. Evaluation results show that ◦ LEGALEASE is expressive enough to capture real-world privacy policies. ◦ GROK could bootstrap labeling the graph with LEGALEASE at massive scale.