Valid Statistical Analysis for Logistic Regression with Multiple Sources Rob Hall (Dept of Machine Learning, CMU) Joint work with Yuval Nardi and Steve.

Slides:



Advertisements
Similar presentations
Key Management Nick Feamster CS 6262 Spring 2009.
Advertisements

Secure Multiparty Regression Based on Homomorphic Encryption Rob Hall Joint work with Yuval Nardi (Technion) and Steve Fienberg 1
Secure Evaluation of Multivariate Polynomials
Copula Regression By Rahul A. Parsa Drake University &
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
Computer Science Dr. Peng NingCSC 774 Adv. Net. Security1 CSC 774 Advanced Network Security Topic 5 Group Key Management.
Introduction to Modern Cryptography, Lecture 12 Secure Multi-Party Computation.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
YSLInformation Security -- Public-Key Cryptography1 Elliptic Curve Cryptography (ECC) For the same length of keys, faster than RSA For the same degree.
x – independent variable (input)
Proactive Secure Mobile Digital Signatures Work in progress. Ivan Damgård and Gert Læssøe Mikkelsen University of Aarhus.
Cryptography (continued). Enabling Alice and Bob to Communicate Securely m m m Alice Eve Bob m.
CRYPTOGRAPHY WHAT IS IT GOOD FOR? Andrej Bogdanov Chinese University of Hong Kong CMSC 5719 | 6 Feb 2012.
WS Algorithmentheorie 03 – Randomized Algorithms (Public Key Cryptosystems) Prof. Dr. Th. Ottmann.
Cryptography1 CPSC 3730 Cryptography Chapter 7 Confidentiality Using Symmetric Encryption.
WS Algorithmentheorie 03 – Randomized Algorithms (Public Key Cryptosystems) Prof. Dr. Th. Ottmann.
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Econ 140 Lecture 31 Univariate Populations Lecture 3.
Efficient and Robust Private Set Intersection and multiparty multivariate polynomials Dana Dachman-Soled 1, Tal Malkin 1, Mariana Raykova 1, Moti Yung.
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
A Secure Protocol for Computing Dot-products in Clustered and Distributed Environments Ioannis Ioannidis, Ananth Grama and Mikhail Atallah Purdue University.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Secure storage of cryptographic keys within random volumetric materials Roarke Horstmeyer 1, Benjamin Judkewitz 1, Ivo Vellekoop 2 and Changhuei Yang 1.
Systems of Equations as Matrices and Hill Cipher.
Secure Computation (Lecture 7-8) Arpita Patra. Recap >> (n,t)-Secret Sharing (Sharing/Reconstruction) > Shamir Sharing > Lagrange’s Interpolation for.
Authentication Applications Unit 6. Kerberos In Greek and Roman mythology, is a multi-headed (usually three-headed) dog, or "hellhound” with a serpent's.
On the Practical Feasibility of Secure Distributed Computing A Case Study Gregory Neven, Frank Piessens, Bart De Decker Dept. of Computer Science, K.U.Leuven.
Tools for Privacy Preserving Distributed Data Mining
Slide 1 Vitaly Shmatikov CS 380S Introduction to Secure Multi-Party Computation.
Information Security Lab. Dept. of Computer Engineering 182/203 PART I Symmetric Ciphers CHAPTER 7 Confidentiality Using Symmetric Encryption 7.1 Placement.
1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”
Cryptography and Network Security Chapter 13 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSCE 715: Network Systems Security Chin-Tser Huang University of South Carolina.
Based on Bruce Schneier Chapter 8: Key Management Dulal C Kar.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Lecture 23 Symmetric Encryption
Gillian Raab, Chris Dibben, & Paul Burton UNECE-Eurostat Work Session on Statistical Data Confidentiality, Helsinki, 2015 Running an analysis of combined.
FHE Introduction Nigel Smart Avoncrypt 2015.
Chapter 3 DeGroot & Schervish. Functions of a Random Variable the distribution of some function of X suppose X is the rate at which customers are served.
Software Security Seminar - 1 Chapter 4. Intermediate Protocols 발표자 : 이장원 Applied Cryptography.
Key Management Network Systems Security Mort Anvari.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
1 Probability and Statistics Confidence Intervals.
Introduction to Elliptic Curve Cryptography CSCI 5857: Encoding and Encryption.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
Security By Meenal Mandalia. What is ? stands for Electronic Mail. much the same as a letter, only that it is exchanged in a different.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
Multi-Party Computation r n parties: P 1,…,P n  P i has input s i  Parties want to compute f(s 1,…,s n ) together  P i doesn’t want any information.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Cryptography and Network Security Chapter 13
The Secure Sockets Layer (SSL) Protocol
Digital Signatures.
Public Key Cryptosystem
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Kerberos Kerberos is a network authentication protocol and it is designed to provide strong authentication for client server applications. It uses secret.
Of Probability & Information Theory
Ioannis Ioannidis, Ananth Grama and Ioannis Ioannidis
The Secure Sockets Layer (SSL) Protocol
Key Management Network Systems Security
Task 6 Statistical Approaches
Key Distribution Reference: Pfleeger, Charles P., Security in Computing, 2nd Edition, Prentice Hall, /18/2019 Ref: Pfleeger96, Ch.4.
Secure Diffie-Hellman Algorithm
APPLICATIONS OF LINEAR ALGEBRA IN INFORMATION TECHNOLOGY
Presentation transcript:

Valid Statistical Analysis for Logistic Regression with Multiple Sources Rob Hall (Dept of Machine Learning, CMU) Joint work with Yuval Nardi and Steve Fienberg 1

Setting Patient IDTobaccoAgeWeightHeart Disease 0001??170? 0002??150N 0003N45165N Patient IDTobaccoAgeWeightHeart Disease 0001Y35?Y 0002Y40?? 0004N50165N Logistic regression (or any glm) 2

Alternatives Multiple organizations with databases want to do a statistical calculation (e.g., regression). Each would benefit by mining the pooled data. Not allowed/willing to share data (e.g., HIPAA). Share transformed data? Secure multiparty computation? 3

In an Ideal World Hospitals send data to a “trusted party.” “Trusted party” computes regression, sends same coefficients back to each hospital. This is an “ideal” scenario - trusted parties don’t exist. Using cryptography, we can do the computation as if they did. 4

Secure Multiparty Computation A protocol computes a “functionality:” Messages are exchanged and coins are flipped, each party has a “view” It is secure whenever the messages can be simulated (“semi-honest” model): 5 Party 1’s dataEach party gets a copy of the outputParty 2’s data

Additive Random Shares Split a secret quantity so each party has a share: Marginally each share is uniformly distributed on. Messages consisting of shares are easy to simulate. Finite precision reals only slightly trickier. 6

Multiplication Using homomorphic encryption: – encrypts – computes: – decrypts: is encrypted when sent, so message is easy to simulate. are uniform in. Local productDifferent parties 7

Linear Regression The MLE is: 1.Compute Shares of, 2.Secure matrix inversion Similar to Newton’s method on the function: 3.Secure matrix multiply. 4.Modular addition of shares. 8

Logistic Regression (IRLS) Newton-Raphson iterates: Approximate sigmoid by the empirical CDF: Secure computation of “greater than” is well known. Approximation error decreases with. 9

CPS - Experimental Verification 10

CPS - Experimental Verification 11 No. in Household

CPS - Experimental Verification 12 Age(3)

Ongoing Work Faster approximations to logistic functions. Record linkage (assumed here). Imputation of missing data. Secure computation of goodness-of-fit statistics. Log-linear models. Other GLMs. 13

Questions For the technical details and a working implementation please see: 14