Randomized Algorithms William Cohen. Outline SGD with the hash trick (review) Background on other randomized algorithms – Bloom filters – Locality sensitive.

Slides:



Advertisements
Similar presentations
Implementation of relational operations
Advertisements

Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
© 2004 Goodrich, Tamassia Hash Tables1  
Randomized / Hashing Algorithms
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Bloom Filters Kira Radinsky Slides based on material from:
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
Look-up problem IP address did we see the IP address before?
1 Lecture 18 Syntactic Web Clustering CS
Hash Tables1 Part E Hash Tables  
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Hashing CSE 331 Section 2 James Daly. Reminders Homework 3 is out Due Thursday in class Spring Break is next week Homework 4 is out Due after Spring Break.
Hash Tables Universal Families of Hash Functions Bloom Filters Wednesday, July 23 rd 1.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
Efficient Logistic Regression with Stochastic Gradient Descent – part 2 William Cohen.
Efficient Logistic Regression with Stochastic Gradient Descent William Cohen.
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
© 2004 Goodrich, Tamassia Hash Tables1  
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Document duplication (exact or approximate) Paolo Ferragina Dipartimento di Informatica Università di Pisa Slides only!
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Data Structures Using C++
Midterm Exam Review Notes William Cohen 1. General hints in studying Understand what you’ve done and why – There will be questions that test your understanding.
Randomized / Hashing Algorithms Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman.
Randomized Algorithms Part 3 William Cohen 1. Outline Randomized methods - so far – SGD with the hash trick – Bloom filters – count-min sketches Today:
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Efficient Logistic Regression with Stochastic Gradient Descent William Cohen 1.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
Efficient Logistic Regression with Stochastic Gradient Descent
Review William Cohen. Sessions Randomized algorithms – The “hash trick” – Bloom filters how would you use it? how and why does it work? – Count-min sketch.
CS4432: Database Systems II
Efficient Logistic Regression with Stochastic Gradient Descent: The Continuing Saga William Cohen.
Efficient Logistic Regression with Stochastic Gradient Descent William Cohen 1.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CSCI 210 Data Structures and Algorithms
Randomized Algorithms Graph Algorithms
The Variable-Increment Counting Bloom Filter
Graph Algorithms (plus some review….)
Bloom Filters - RECAP.
Lecture 11: Nearest Neighbor Search
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS5112: Algorithms and Data Structures for Applications
Hash Functions for Network Applications (II)
CSE 326: Data Structures Lecture #14
Presentation transcript:

Randomized Algorithms William Cohen

Outline SGD with the hash trick (review) Background on other randomized algorithms – Bloom filters – Locality sensitive hashing

Learning as optimization for regularized logistic regression Algorithm: Initialize arrays W, A of size R and set k=0 For each iteration t=1,…T – For each example (x i,y i ) Let V be hash table so that p i = … ; k++ For each hash value h: V[h]>0: » W[h] *= (1 - λ2μ) k-A[j] » W[h] = W[h] + λ(y i - p i )V[h] » A[j] = k

Learning as optimization for regularized logistic regression Initialize arrays W, A of size R and set k=0 For each iteration t=1,…T – For each example (x i,y i ) k++; let V be a new hash table; let tmp=0 For each j: x i j >0: V[hash(j)%R] += x i j Let ip=0 For each h: V[h]>0: – W[h] *= (1 - λ2μ) k-A[j] – ip+= V[h]*W[h] – A[h] = k p = 1/(1+exp(-ip)) For each h: V[h]>0: – W[h] = W[h] + λ(y i - p i )V[h] regularize W[h]’s

An example 2^26 entries = 1 8bytes/weight

Results

A variant of feature hashing Hash each feature multiple times with different hash functions Now, each w has k chances to not collide with another useful w’ An easy way to get multiple hash functions – Generate some random strings s 1,…,s L – Let the k-th hash function for w be the ordinary hash of concatenation w  s k

A variant of feature hashing Why would this work? Claim: with 100,000 features and 100,000,000 buckets: – k=1  Pr(any duplication) ≈1 – k=2  Pr(any duplication) ≈0.4 – k=3  Pr(any duplication) ≈0.01

Hash Trick - Insights Save memory: don’t store hash keys Allow collisions – even though it distorts your data some Let the learner (downstream) take up the slack Here’s another famous trick that exploits these insights….

Bloom filter interface Interface to a Bloom filter – BloomFilter(int maxSize, double p); – void bf.add(String s); // insert s – bool bd.contains(String s); // If s was added return true; // else with probability at least 1-p return false; // else with probability at most p return true; – I.e., a noisy “set” where you can test membership (and that’s it)

Bloom filters Another implementation – Allocate M bits, bit[0]…,bit[1-M] – Pick K hash functions hash(1,s),hash(2,s),…. E.g: hash(s,i) = hash(s+ randomString[i]) – To add string s: For i=1 to k, set bit[hash(i,s)] = 1 – To check contains(s): For i=1 to k, test bit[hash(i,s)] Return “true” if they’re all set; otherwise, return “false” – We’ll discuss how to set M and K soon, but for now: Let M = 1.5*maxSize // less than two bits per item! Let K = 2*log(1/p) // about right with this M

Bloom filters Analysis: – Assume hash(i,s) is a random function – Look at Pr(bit j is unset after n add’s): – … and Pr(collision): – …. fix m and n and minimize k: k =

Bloom filters Analysis: – Plug optimal k=m/n*ln(2) back into Pr(collision): – Now we can fix any two of p, n, m and solve for the 3 rd : – E.g., the value for m in terms of n and p: p =

Bloom filters: demo

Bloom filters An example application – Finding items in “sharded” data Easy if you know the sharding rule Harder if you don’t (like Google n-grams) Simple idea: – Build a BF of the contents of each shard – To look for key, load in the BF’s one by one, and search only the shards that probably contain key – Analysis: you won’t miss anything…but, you might look in some extra shards – You’ll hit O(1) extra shards if you set p=1/#shards

Bloom filters An example application – discarding singleton features from a classifier Scan through data once and check each w: – if bf1.contains(w): bf2.add(w) – else bf1.add(w) Now: – bf1.contains(w)  w appears >= once – bf2.contains(w)  w appears >= 2x Then train, ignoring words not in bf2

Bloom filters An example application – discarding rare features from a classifier – seldom hurts much, can speed up experiments Scan through data once and check each w: – if bf1.contains(w): if bf2.contains(w): bf3.add(w) else bf2.add(w) – else bf1.add(w) Now: – bf2.contains(w)  w appears >= 2x – bf3.contains(w)  w appears >= 3x Then train, ignoring words not in bf3

Bloom filters More on Thursday….

LSH: key ideas Bloom filter: – Set represented by a small bit vector – Can answer containment queries fairly accurately Locality sensitive hashing: – map feature vector x to bit vector bx – ensure that bx preserves “similarity” of the original vectors

Random Projections

Random projections u -u-u 2γ2γ

u -u-u 2γ2γ Any other direction will keep the distant points distant. So if I pick a random r and r.x and r.x’ are closer than γ then probably x and x’ were close to start with.

Random projections u -u-u 2γ2γ To make those points “close” we need to project to a direction orthogonal to the line between them

Random projections u -u-u Put this another way: when is r.x>0 and r.x’<0 ?

Random projections u -u-u r.x > 0 r.x’ < 0 some where in here is ok

Random projections u -u-u r.x > 0 r.x’ < 0 some where in here is ok

Random projections u -u-u r.x > 0 r.x’ < 0 some where in here is ok Claim :

Some math Pick random vector r, define Claim : So : And :

LSH: key ideas Goal: – map feature vector x to bit vector bx – ensure that bx preserves “similarity” Basic idea: use random projections of x – Repeat many times: Pick a random hyperplane r Compute the inner product or r with x Record if x is “close to” r (r.x>=0) – the next bit in bx Theory says that is x’ and x have small cosine distance then bx and bx’ will have small Hamming distance Famous use: near-duplicate web page detection in AltaVista (Broder, 97)

LSH: key ideas Naïve algorithm: – Initialization: For i=1 to outputBits: – For each feature f: » Draw r(f,i) ~ Normal(0,1) – Given an instance x For i=1 to outputBits: LSH[i] = sum(x[f]*r[i,f] for f with non-zero weight in x) > 0 ? 1 : 0 Return the bit-vector LSH – Problem: you need many r’s to be accurate storing these is expensive Ben will give us more ideas Thursday

LSH: “pooling” (van Durme) Better algorithm: – Initialization: Create a pool: – Pick a random seed s – For i=1 to poolSize: » Draw pool[i] ~ Normal(0,1) For i=1 to outputBits: – Devise a random hash function hash(i,f): » E.g.: hash(i,f) = hashcode(f) XOR randomBitString[i] – Given an instance x For i=1 to outputBits: LSH[i] = sum( x[f] * pool[hash(i,f) % poolSize] for f in x) > 0 ? 1 : 0 Return the bit-vector LSH

LSH: key ideas Advantages: – with pooling, this is a compact re-encoding of the data you don’t need to store the r’s, just the pool – leads to very fast nearest neighbor method just look at other items with bx’=bx also very fast nearest-neighbor methods for Hamming distance – similarly, leads to very fast clustering cluster = all things with same bx vector More Thursday….

Finding neighbors of LSH vectors After hashing x 1,…,x n we have bx 1, … bx n How do I find closest neighbors of bx i ? One approach (Indyk & Motwana): – Pick random permutation p of bits – Permute to bx 1, … bx n get bx 1 p, … bx n p – Sort them – Check B closest neighbors of bx i p in that sorted list, and keep the k closest – Repeat many times Each pass finds some close neighbors of every element – So, can do single-link-clustering with sequential scans