OpenDP: A Pitch for a Community Effort

Slides:



Advertisements
Similar presentations
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Advertisements

Fluff Matters! Information Governance in an Online Era Lisa Welchman.
- 1 - Component Based Development R&D SDM Theo Schouten.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
SciX: WP 1. WP1 tasks Literature study Current and future web business models and payment systems Formal process model (as is) Repository and e-journal.
Systems Engineer An engineer who specializes in the implementation of production systems This material is based upon work supported by the National Science.
Copyright 2004, SPSS Inc. 1 Using the SPSS MR Data Model Sam Winstanley Solution Architect - SPSS 21 st January 2004.
State of Kansas Statewide Financial Management System Pre-Implementation Project Steering Committee Meeting January 11, 2008.
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Jisc Data Spring Pitch: Cloud Workbench Ben Butchart EDINA.
Dr. Tom WayCSC What is Software Engineering? CSC 4700 Software Engineering Lecture 1.
OnTimeMeasure Integration with Gush Prasad Calyam, Ph.D. (PI) Tony Zhu (Software Programmer) Alex Berryman (REU Student) GEC10 Selected.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
1 High Performance Buildings Research & Implementation Center (HiPerBRIC) National Lab-Industry-University Partnership February 5, 2008.
TeraGrid Science Gateways: Scaling TeraGrid Access Aaron Shelmire¹, Jim Basney², Jim Marsteller¹, Von Welch²,
Transparency and Open Data: GSS Response Iain Bell HoP MoJ.
When Search is not Enough Case Study: The Advertising Research Foundation Gilbane Boston November 27, 2007 Gilbane Boston November 27, 2007.
Michael Witt Interdisciplinary Research Librarian & Assistant Professor Purdue Libraries & Distributed Data Curation Center (D2C2) Eliciting.
Considering Community and Open Source Lois Brooks Stanford Terry Ryan UCLA A Decision Framework for Selecting.
Computer Aided Design By Brian Nettleton This material is based upon work supported by the National Science Foundation under Grant No Any opinions,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Demo of Scalable Pluggable Types Michael Ernst MIT Dagstuhl Seminar “Scalable Program Analysis” April 17, 2008.
Transition to Practice. We define “Transition to Practice” as making privacy tools and systems operational.
Jasig CAS Roadmap Scott Battaglia Rutgers, the State University of New Jersey.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
What we mean by Big Data and Advanced Analytics
VisIt Project Overview
BruinTech Vendor Meet & Greet December 3, 2015
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Pragmatics 4 Hours.
Using Ada-C/C++ Changer as a Converter Automatically convert to C/C++ to reuse or redeploy your Ada code Eliminate the need for a costly and.
Prototyping in the software process
Legacy and future of the World Data System (WDS) certification of data services and networks Dr Mustapha Mokrane, Executive Director, WDS International.
Software Prototyping.
Joslynn Lee – Data Science Educator
MANAGEMENT OF STATISTICAL PRODUCTION PROCESS METADATA IN ISIS
aspects of archive system design
Data Science and Statistical Agencies
Lecture 1 What is Software Engineering? CSC 4700 Software Engineering
An Introduction to the IVC Software Framework
An Overview of Data-PASS Shared Catalog
DSpace-CRIS introduction DSpace Steering Group, 2017 Nov. 1st
Web Applications Security INTRO
CFI John R Evans Leaders Fund Digital Data Management
Adopting Dataverse at the Qualitative Data Repository
API Documentation Guidelines
Summit 2017 Breakout Group 1: Advanced Research Computing (ARC)
UMass Lowell Dept. of Computer Science  Graduate School of Education
Data Management: Documentation & Metadata
Current Developments in Differential Privacy
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
Knowledge Translation Across RERC Activities
Lecture 1: Multi-tier Architecture Overview
Creating an eResearch environment – Lessons learned
Cost Xpert Group Copyright © 2001, Marotz, Inc..
Chapter 17: Client/Server Computing
Model Base Validation Techniques for Software
FDA Topics Going Forward…???
Scott Thorne & Chuck Shubert
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Comparison to existing state of security experimentation
FUTURE PLANS NSF site visit October 19, 2015 Salil Vadhan
Interoperable Measurement Frameworks: Internet2 E2E piPEs and NLANR Advisor Eric L. Boyd Internet2 17 April 2019.
Why IIIF? Shane Huddleston Jeff Mixter Dave Collins Product Manager
Rapid software development
Dataverse for citing and sharing research data
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Presentation transcript:

OpenDP: A Pitch for a Community Effort Simons Workshop on Data Privacy: From Foundations to Applications March 8, 2019 Salil Vadhan Harvard University with support from: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our funders.

The Proposal A community effort to build a trustworthy and open-source suite of differential privacy tools that can be easily adopted by custodians of sensitive data to make it available for research. Why? High demand, compelling use cases. Magnify impact of academic research on DP.

Motivating Use Cases Archival data repositories (Dataverse, ICPSR, Zenodo) enabling secondary reuse and replication. Government agencies making data available to the public, both for official statistics and open data mandates. Companies sharing data for academic research (e.g. Social Science One). ⇒ focus on centralized model for DP.

Current State of DP Tools Heavily engineered systems for specific applications (e.g. 2020 Decennial Census). Aacademic research prototypes that don’t integrate with each other, have not been externally vetted, and/or are not easily adopted. Closed commercial products (e.g. Leapyear).

Principles Open Source Security & Privacy Scalability Extensibility worldwide open-source community processes and incentives for contribution. Security & Privacy careful vetting of any security-critical or privacy-critical code can ship code to the sensitive data Scalability handle petabyte-scale data Extensibility can grow from the continuing research developments in the field

Components Library of vetted DP algorithms Budgeting interfaces Language(s)? R, Python, SQL, …? Formal vs. human verification? Budgeting interfaces PSI-style GUI PinQ-style programming interface Data formats Flat tables vs. multi-relational vs. …? APIs Containers (e.g. Docker) Authentication & authorization (e.g. OAuth2) Large-scale data engine (e.g. Spark/SparkR)

Using the Tools Full System Just the library And other subsets… Web service Easily deployed & configured with no DP expertise Generates DP code to run on remote dataset in secure storage Tracks budgets Allows for analyst queries Just the library For data custodians that understand DP Take advantage of vetted implementations of state-of-art algorithms And other subsets…

Assembling a Team Principal DP Scientist(s) DP Researchers (e.g. postdocs, faculty sabbaticals) Chief Technology Officer Project Manager Security Committee Software Engineers Open-Source Coordinator Steering Committee Domain experts/data scientists?

Next Steps Planning workshop Raising funds Assembling a team With data custodians, domain scientists, reps of other open-source projects,… Raising funds Assembling a team Targeting the first uses (one year out?)

For Discussion What will it take to get you all engaged? Can generic DP methods be useful enough? Are we focusing on the right use cases? Can it be architected to evolve with the development of the field? What kind of team & governance is needed for success? Is manual vetting sufficiently scalable? Is this realistic?