Secure Outsourcing of Sequence Comparisons Mikhail Atallah and Jiangtao Li CERIAS and Department of Computer Sciences Purdue University PET2004: Workshop.

Slides:



Advertisements
Similar presentations
Hashing.
Advertisements

Longest Common Subsequence
DYNAMIC PROGRAMMING ALGORITHMS VINAY ABHISHEK MANCHIRAJU.
Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
Greedy Algorithms Amihood Amir Bar-Ilan University.
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Outline The power of DNA Sequence Comparison The Change Problem
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Lecture 1 (Part 3) Design Patterns for Optimization Problems.
Space Efficient Alignment Algorithms and Affine Gap Penalties
Sparse Normalized Local Alignment Nadav Efraty Gad M. Landau.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Design Patterns for Optimization Problems Dynamic Programming.
Co-operative Private Equality Test(CPET) Ronghua Li and Chuan-Kun Wu (received June 21, 2005; revised and accepted July 4, 2005) International Journal.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 2: KMP Algorithm Lecturer:
Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date:
Inexact Matching General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic programming.
Sequence Alignment Cont’d. Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 1 (Part 3) Tuesday, 9/3/02 Design Patterns for Optimization.
By Makinen, Navarro and Ukkonen. Abstract Let A and B be two run-length encoded strings of encoded lengths m’ and n’, respectively. we will show an O(m’n+n’m)
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
1 A Linear Space Algorithm for Computing Maximal Common Subsequences Author: D.S. Hirschberg Publisher: Communications of the ACM 1975 Presenter: Han-Chen.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
P.Krusche / A. Tiskin - Efficient LLCS Computation using Bulk-Synchronous Parallelism Efficient Longest Common Subsequence Computation using Bulk-Synchronous.
Class 2: Basic Sequence Alignment
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
ADA: 7. Dynamic Prog.1 Objective o introduce DP, its two hallmarks, and two major programming techniques o look at two examples: the fibonacci.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
20/10/2015Applied Algorithmics - week31 String Processing  Typical applications: pattern matching/recognition molecular biology, comparative genomics,
CPSC 335 Randomized Algorithms Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Cryptographic Attacks on Scrambled LZ-Compression and Arithmetic Coding By: RAJBIR SINGH BIKRAM KAHLON.
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
1 Approximate Algorithms (chap. 35) Motivation: –Many problems are NP-complete, so unlikely find efficient algorithms –Three ways to get around: If input.
Greedy Methods and Backtracking Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Weikang Qian. Outline Intersection Pattern and the Problem Motivation Solution 2.
1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA.
Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro.
Hidden Access Control Policies with Hidden Credentials Keith Frikken, Mikhail Atallah, Jiangtao Li CERIAS and Department of Computer Sciences Purdue University.
1 Longest Common Subsequence as Private Search Payman Mohassel and Mark Gondree U of CalgaryNPS.
A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out.
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
Lecture 15 Algorithm Analysis
Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova.
Chapter 7 Dynamic Programming 7.1 Introduction 7.2 The Longest Common Subsequence Problem 7.3 Matrix Chain Multiplication 7.4 The dynamic Programming Paradigm.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
Dipankar Ranjan Baisya, Mir Md. Faysal & M. Sohel Rahman CSE, BUET Dhaka 1000 Degenerate String Reconstruction from Cover Arrays (Extended Abstract) 1.
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki.
Cryptographic methods. Outline  Preliminary Assumptions Public-key encryption  Oblivious Transfer (OT)  Random share based methods  Homomorphic Encryption.
CSG523/ Desain dan Analisis Algoritma
Approximate Matching of Run-Length Compressed Strings
Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
CS 154, Lecture 6: Communication Complexity
String matching.
The Communication Complexity of Distributed Set-Joins
Dynamic Programming Computation of Edit Distance
Approximate Matching of Run-Length Compressed Strings
Lecture 8. Paradigm #6 Dynamic Programming
Lecture 5 Dynamic Programming
Presentation transcript:

Secure Outsourcing of Sequence Comparisons Mikhail Atallah and Jiangtao Li CERIAS and Department of Computer Sciences Purdue University PET2004: Workshop on Privacy Enhancing Technologies May 26th 2004, Toronto, Canada

Outline Motivation Sequence comparisons Framework and building blocks Outsourcing Protocols for sequence comparisons Two special cases Conclusion and future work

Motivation Grid computing Computational outsourcing Privacy concerns

Secure Outsourcing A computationally weak client can outsource a computationally intensive task to one or more external agents Agents must learn nothing about the client’s data and the result of the computation Client does work (computation/communication) that is linear in the size of the data External agents do all the super-linear work External agents’ work should be as close as possible to the complexity bounds of the best known algorithm for the outsourced problem

Roadmap Motivation Sequence comparisons Framework and building blocks Outsourcing Protocols for sequence comparisons Two special cases Conclusion and future work

Secure Outsourcing of Sequence Comparisons Problem definition –Client has λ=λ 1 …λ n and μ=μ 1 …μ m –Wants to compute the similarity between λ and μ –Lacks computational power and space to compute it locally What we have achieved –Client outsources the computation of edit distance between λ and μ to two agents –Agents learn nothing about λ, μ, and result, if they do not collude

String Editing Problem String edit distance –Least-cost set of insertions, deletions, and substitutions to transform one string into the other Dynamic programming solution –Θ(mn) time complexity Applications –Molecular sequence comparison, text editing, pattern matching, speech recognition, etc.

Dynamic Programming Algorithm for String Editing M(i,j) is the minimum in cost of transform the prefix of λ of length i into the prefix of μ of length j Insertion Cost Deletion Cost Substitution Cost … m … n

Roadmap Motivation Sequence comparisons Framework and building blocks Outsourcing Protocols for sequence comparisons Two special cases Conclusion and future work

Disguising Sequences Hiding the sequences’ lengths –Client introduces a new symbol ‘$’ such that I($)=D($)=0, S(a,$)=S($,a)=+∞ –Client pads λ and μ with symbols ‘$’ –Note that distance(λ, μ) remains the same Splitting λ and μ –Client splits λ into λ’=λ 1 ’…λ n ’ and λ”=λ 1 ”…λ n ” such that λ i =λ i ’+λ i ” mod σ for 1≤i≤n –Similarly splits μ into μ’ and μ” –Client sends λ’ and μ’ to Agent 1, sends λ” and μ” to Agent 2

Agents Compute M’, M” M’M” M = M’+M” A1A

Secure Table Lookup Protocol Input: A1 has a’ and b’, A2 has a” and b”, where a=a’+a” mod σ, b=b’+b” mod σ. Output: A1 gets c’, A2 gets c”, c’+c”=S(a,b). Idea: One agent rotates the table S by a’ rows, b’ columns into S’, S’(a”,b”) = S(a,b) σxσ size Substitution Cost Table S Rotated Cost Table S’ by 1 row, 1 column a=2, b=1, S(a,b) =2 a’=1,b’=1 a”=1,b”=0 b a a” b”

Secure Table Lookup Protocol (cont.) A1A2 E(S’(i,j)) E(S(a,b)+r) S → S’ Picks (a”,b”) entry: E(S(a,b)) c’=S(a,b)+rc”=-r Performance: –3 rounds –O(σ 2 ) computation and communication Security: –A1 and A2 learn nothing about a, b, and S(a,b) Picks E E(x)·E(y)=E(x+y) Chooses r and computes E(S(a,b))·E(r)

Min-Finding Protocol [AKD] Input: A1 has (a 1,…,a k ), A2 has (b 1,…,b k ). Let c i = a i +b i, for i=1,…,k. Output: A1 gets α, A2 gets β, such that α+β=min(c 1,…,c k ) Observation: Blind-and-permute Using secure comparison protocol

Roadmap Motivation Sequence comparisons Framework and building blocks Outsourcing Protocols for sequence comparisons Two special cases Conclusion and future work

Preliminary Version M’ M” Interaction A1 A2 λ, μ M’(n,m)M”(n,m) λ”, μ”λ’, μ’ M’ M”

Preliminary Version (cont.) SumAgent 1Agent 2 M(i-1,j-1)M’(i-1,j-1)M”(i-1,j-1) M(i-1,j)M’(i-1,j)M”(i-1,j) M(i,j-1)M’(i,j-1)M”(i,j-1) D(λ i )M’(i,0)-M’(i-1,0)M”(i,0)-M”(i-1,0) I(μ j )M’(0,j)-M’(0,j-1)M”(0,j)-M”(0,j-1) S(λ i,μ j )c’c” Two agents run min-finding protocol

Performance: –Client: O(m+n) –Agents: O(σ 2 mn) Bottleneck: Secure table lookup protocol is executed mn times, each time needs O(σ 2 ) computation/communication. Preliminary Version (cont.)

Improved Version Improvement: Batch the computation of S(λ i,μ j ) together for each row. Performance: Computation of S(λ i,μ j ) for row i needs O(σm) communication. Overall performance improves by a factor of σ.

Batched Secure Table Lookup Protocol Input: –A1 has a’,b 1 ’,…,b m ’, A2 has a”,b 1 ”,…,b m ”, such that a= a’+ a” mod σ, b k = b k ’+ b k ” mod σ, for k=1,…,m. Output: –A1 has c’ 1,…,c’ m, A2 has c” 1,…,c” m, such that c’ i +c” i = S(a,b i ) Main Idea : σxσ size Substitution Cost Table S Rotated Cost Table S’ by a’ row row a row a” row a row a rotated by b” entry b entry b’

Batched Secure Table Lookup Protocol (cont.) Performance: –S’ table sent only once, OT executed m times. –O(σ 2 ) +O(σm) E(S(a,b j ”)+r j ) E(S(a,b j ”+1)+r j ) E(S(a,b j ”+2)+r j ) … E(S(a,b j ”-1)+r j ) A1A2 Oblivious Transfer: get b j ’ entry E(S(a,b j ”+b j ’)+r j ) = E(S(a,b j )+r j ) E(S(a,0)) E(S(a,2)) … E(S(a,σ-1)) For each j=1,…,m

Roadmap Motivation Sequence comparisons Framework and building blocks Outsourcing Protocols for sequence comparisons Two special cases Conclusion and future work

Two Special Cases Case S(a,b) = |a-b| –Sequences are additively split without modular, i.e., λ=λ’+λ”, μ k =μ’+μ” –Use max-finding protocol –Avoid table lookup Case I(a)=D(a)=1, S(a,b)=+∞ if a≠b –Longest common subsequence problem –Set alphabet to be {0,2,4,…,2σ-2} –Run the protocol of case S(a,b)=|a-b|

Roadmap Motivation Sequence comparisons Framework and building blocks Outsourcing Protocols for sequence comparisons Two special cases Conclusion and future work

Conclusion and Future Work Conclusion –We developed an efficient protocol for outsourcing of string editing problem in a privacy-preserving way Future Work –Other compute-intensive problems

Thank You!