Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University * Some notes are directly copied from the original paper.
Outline Introduction Preliminaries Learning Reduction Compression and Reconstruction Empirical Results Conclusion
Introduction Large database of images; Goal: predict who or what is in a given image Samples: images with corresponding labels is the total number of entities in the whole database. One-against-all algorithm: Learn a binary predictor for each label (class). Computation is expensive when is large. e.g., Assume the output vector is sparse.
Introduction Main idea: “Learn to predict compressed label vectors, and then use sparse reconstruction algorithm to recover uncompressed labels from these predictions” Compressed sensing : For any sparse vector, it is highly possible to compress to logarithmic in dimension with perfect reconstruction of.
Preliminaries : input space; : output (label) space, where Training data: Goal: to learn the predictor with low mean- squared error Assume is very large; Expected value is sparse, with only a few non-zero entries.
Learning reduction Linear compression function where Goal: to learn a predictor Predict the label y with the Predictor F Predict the compressed label Ay with the Predictor H Samples Compressed Samples To minimize
Reduction-training and prediction Reconstruction Algorithm R: If is close to, then should be close to
Compression Functions Examples of valid compression functions:
Reconstruction Algorithms Examples of valid reconstruction algorithms: iterative and greedy algorithms Orthogonal Matching Pursuit (OMP) Forward-Backward Greedy (FoBa) Compressive Sampling Matching Pursuit (CoSaMP)
General Robustness Guarantees What if the reduction create a problem harder to solve than the original problem? Sparsity error is defined as where is the best k-sparse approximation of
Linear Prediction If there is a perfect linear predictor of, then there will be a perfect linear predictor of :
Experimental Results Experiment 1: Image data (collected by the ESP Game) 65k images, 22k unique labels; Keep the 1k most frequent labels; the least frequent occurs 39 times while the most frequent occurs about 12k times, 4 labels on average per image; Half of the data as training and half as testing. Experiment 2: Text data (collected from 16k labeled web page, 983 unique labels; the least frequent occurs 21 times, the most frequent occurs about 6500 times, 19 labels on average per web page; Half of the data as training and half as testing. Compression function A: select m random rows of the Hadamard matrix. Test the greedy and iterative reconstruction algorithm: OMP, FoBa, CoSaMp and Lasso. Use correlation decoding (CD) as a baseline method for comparisons.
Experimental Results Measure Measure the precision Top two: image data; Bottom: text data
Conclusion Application of compressed sensing to multi-label prediction problem with output sparsity; Efficient reduction algorithm with the number of predictions equal to logarithmic in original labels; Robustness Guarantees from compressed case to the original case; and vice versa for the linear prediction setting.