1 Longest Common Subsequence as Private Search Payman Mohassel and Mark Gondree U of CalgaryNPS.

Slides:



Advertisements
Similar presentations
Computational Privacy. Overview Goal: Allow n-private computation of arbitrary funcs. –Impossible in information-theoretic setting Computational setting:
Advertisements

Longest Common Subsequence
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
DYNAMIC PROGRAMMING ALGORITHMS VINAY ABHISHEK MANCHIRAJU.
Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Overview What is Dynamic Programming? A Sequence of 4 Steps
Chapter 7 Dynamic Programming.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Lecture 1 (Part 3) Design Patterns for Optimization Problems.
Secure Outsourcing of Sequence Comparisons Mikhail Atallah and Jiangtao Li CERIAS and Department of Computer Sciences Purdue University PET2004: Workshop.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 2 (Part 1) Tuesday, 9/11/01 Dynamic Programming.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2002 Lecture 2 Tuesday, 2/5/02 Dynamic Programming.
How Should We Solve Search Problems Privately? Kobbi Nissim – BGU A. Beimel, T. Malkin, and E. Weinreb.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Design of Optimal Multiple Spaced Seeds for Homology Search Jinbo Xu School of Computer Science, University of Waterloo Joint work with D. Brown, M. Li.
Finding approximate palindromes in genomic sequences.
Designing Multiple Simultaneous Seeds for DNA Similarity Search Yanni Sun, Jeremy Buhler Washington University in Saint Louis.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Design Patterns for Optimization Problems Dynamic Programming.
A fast parallel algorithm for finding the longest common sequence of multiple biosequences. Yixin Chen, Andrew Wan, Wei Liu Washington University / Yangzhou.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 1 (Part 3) Tuesday, 9/3/02 Design Patterns for Optimization.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2008 Design Patterns for Optimization Problems Dynamic Programming.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Analysis of Algorithms
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2005 Design Patterns for Optimization Problems Dynamic Programming.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Lecture 5 Dynamic Programming. Dynamic Programming Self-reducibility.
Comp. Genomics Recitation 2 12/3/09 Slides by Igor Ulitsky.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
ADA: 7. Dynamic Prog.1 Objective o introduce DP, its two hallmarks, and two major programming techniques o look at two examples: the fibonacci.
Dynamic Programming Dynamic Programming Dynamic Programming is a general algorithm design technique for solving problems defined by or formulated.
Dynamic Programming.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Algorithm Paradigms High Level Approach To solving a Class of Problems.
Decidable Questions About Regular languages 1)Membership problem: “Given a specification of known type and a string w, is w in the language specified?”
Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb (Technion)
Identifying abnormal sequence alignments Ricardo Restrepo School of Mathematics Georgia Insitute of Technology.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
Greedy Methods and Backtracking Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Toolkits version 1.0 Special Cource on Computer Architectures
Expected accuracy sequence alignment Usman Roshan.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
CS38 Introduction to Algorithms Lecture 10 May 1, 2014.
What Dynamic Programming (DP) is a fundamental problem solving technique that has been widely used for solving a broad range of search and optimization.
Part 2 # 68 Longest Common Subsequence T.H. Cormen et al., Introduction to Algorithms, MIT press, 3/e, 2009, pp Example: X=abadcda, Y=acbacadb.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
TU/e Algorithms (2IL15) – Lecture 4 1 DYNAMIC PROGRAMMING II
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
Lecture 5 Dynamic Programming
Summary of lectures Introduction to Algorithm Analysis and Design (Chapter 1-3). Lecture Slides Recurrence and Master Theorem (Chapter 4). Lecture Slides.
String Processing.
Lecture 5 Dynamic Programming
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
CSE 589 Applied Algorithms Spring 1999
Longest Common Subsequence
String Processing.
Lecture 5 Dynamic Programming
Lecture 5 Dynamic Programming
Longest Common Subsequence
Presentation transcript:

1 Longest Common Subsequence as Private Search Payman Mohassel and Mark Gondree U of CalgaryNPS

2 Longest Common Subsequence A = a d e f t c g B = c e d e t f g LCS(A,B) = detg Strings A, B  |A| = n, |B| = m LCS(A,B)  Subsequence of A  Subsequence of B  Maximum length

3 Three Variants LCS length : 4  Length of LCS LCS string : “ detg ”  A string that is an LCS LCS embed : ({2,3,5,7},{3,4,5,7})  Position of an LCS in A and B A = a d e f t c g B = c e d e t f g

4 Motivation Genomic computation  Measure of similarity between DNA sequences File comparison  Used in diff program Many problems related to LCS  Edit distance (sequence alignment)  Shortest common supersequence  Longest increasing sequence Beyond LCS  Problems with dynamic programming solutions  Problems formulated using DFAs

5 Privacy Concerns DNA data is sensitive  Patients’ privacy: health, background Many institutions hold genomic data  Governments  Pharmaceutical companies  Research institutes Privacy rules (e.g. HIPAA)  Can lead to fines or legal action  Focus on data access and sharing, not computation

6 Secure Two-Party LCS AB Alice Bob LCS-Len(A,B) Atallah et. al. WPES 03 Jha et. al. S&P 08 With implementation Using secure computation techniques What if we output more than length?

7 Output More Than Length LCS string or embedding More than one possible output  Can be exponentially many Which one do we output?  There are privacy concerns  Output can work as a “covert channel” Private search  Beimel et. al. STOC 06, CRYPTO 07  Output sampling, equivalence protecting algorithms

8 Output Sampling Algorithms Choice of output leaks information  E.g. lexicographically first LCS reveals Any string before the output not an LCS Sample solution space randomly Not possible for all problems  Stable matching

9 Output Sampling for LCS Fill in the LCS length table Fill in the LCS counting table Devise a randomized backtracking strategy using the tables

10 Step 1: LCS Length a b f c e a d afbcaaad Length of LCS between “afbcaa” and “abfc” Between “afbcaaad” and “abfcead” Dynamic programming O(mn) time

11 a b f c e a d afbcaaad Step 2: LCS Counting Number of LCS strings between “afbcaa” and “abfce” Between “afbcaaad” and abfcead” Greenberg, 2003 O(mnlogS) time S = number of solutions Different for string and embed

12 Step 3: Randomized Backtracking Backtracking on the counting table Create multiple choices at each step To get a uniformly random solution  Cover all possible solutions  No solution overlap between choices  Weighted coin-toss based on number of solutions in each choice LCS embed strategy: at (i,j)  A[i] and B[j] are in LCS  A[i] in LCS, B[j] not  B[j] in LCS, A[i] not  Neither one in LCS Different strategy for LCS string O(mnlogS) time

13 Equivalence Protecting Algorithms What if protocol is run multiple times?  Leaks additional information Protect privacy of inputs with the same set of solutions

14 Equivalence Classes (A 1,B 1 )(A 2,B 2 ) LCS-Str(A,B)  Set of all LCS strings between A and B (A 1,B 1 ) ≈ (A 2,B 2 ) LCS-Str(A 1,B 1 ) = LCS-Str(A 2,B 2 ) (A 3,B 3 ) (A 3,B 3 ) ≈(A 2,B 2 ) LCS-Str(A 3,B 3 ) ≠ LCS-Str(A 2,B 2 )

15 Equivalence Protecting Algorithms Samples LCS-Str(A,B) randomly Sampling is seeded For every (C,D) ≈ (A,B) use the same seed (A,B) (C,D) (E,F) output 1 output 2

16 A Generic Construction Compute a (A rep,B rep )  Representative of (A,B)’s equivalence class  Canonical representative algorithm Compute the seed s = F(A rep,B rep ) Run an output sampling algorithm on (A rep,B rep ) using s for randomness Canonical representative F Output sampling (A rep,B rep ) (A,B) (A rep,B rep ) s Output Equivalence protecting algorithm

17 Equivalence Protecting Algorithms Problem reduced to  Canonical representative algorithms LCS embed  Direct solution LCS string  Via reduction to DFAs

18 Step 1 LCS-to-DFA (A,B) M A,B M A,B is a DFA of size O(mn) L(M A,B ) = LCS-Str(A,B) M A generates all subsequence of A M B generates all subsequences of B Combine and prune the DFAs Runs in O(mn) time Equivalence protecting algorithm that outputs a word in L(M A,B )

19 Step 2 DFA-Rep M A,B M rep For M, M’ where L(M) = L(M’) DFA-Rep(M) = DFA-Rep(M’) L(M A,B ) = L(M rep ) Generate a minimal-state DFA, M rep Myhill-Nerode theorem: M rep is unique Runs O(mn) time

20 Equivalence Protecting Algorithm for LCS String LCS-to-DFA (A,B) DFA-Rep M A,B M rep Sample a word in L(M rep ) F seed output

21 Summary Private search algorithms for LCS  LCS string  LCS embed Matching best counting algorithms  O(mnlogS) time Can be implemented in a variety settings Private search algorithms for  DFA represented functionalities  Problems with D.P. solutions

22 Open Problems Privacy  Alternative privacy definitions The setting can effect privacy concerns Take into account input distribution Algorithms  Better private search for LCS Computationally indistinguishable form uniform  Better counting for LCS