Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.

Slides:

Advertisements

Similar presentations

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.

SVM—Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Optimization Tutorial

Boosting Approach to ML

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Chapter 7 Memory Management Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2009, Prentice.

Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

B+-tree and Hashing.

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.

FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

Online Learning Algorithms

Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.

Machine Learning CS 165B Spring 2012

DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.

Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Active Learning for Class Imbalance Problem

Online Learning for Matrix Factorization and Sparse Coding

SVM by Sequential Minimal Optimization (SMO)

Self-paced Learning for Latent Variable Models

Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.

Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.

Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large.

Learning a Fast Emulator of a Binary Decision Process Center for Machine Perception Czech Technical University, Prague ACCV 2007, Tokyo, Japan Jan Šochman.

Sparse Matrix Factorizations for Hyperspectral Unmixing John Wright Visual Computing Group Microsoft Research Asia Sept. 30, 2010 TexPoint fonts used in.

Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Approximating Hit Rate Curves using Streaming Algorithms Nick Harvey Joint work with Zachary Drudi, Stephen Ingram, Jake Wires, Andy Warfield TexPoint.

March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.

Universit at Dortmund, LS VIII

Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.

Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.

Module 4.0: File Systems File is a contiguous logical address space.

By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.

File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.

Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.

CS 241 Section Week #9 (11/05/09). Topics MP6 Overview Memory Management Virtual Memory Page Tables.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.

Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.

On Utillizing LVQ3-Type Algorithms to Enhance Prototype Reduction Schemes Sang-Woon Kim and B. John Oommen* Myongji University, Carleton University*

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.

Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.

Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *

Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

A paper on Join Synopses for Approximate Query Answering

Chapter 9 – Real Memory Organization and Management

Classification with Perceptrons Reading:

Kijung Shin1 Mohammad Hammoud1

CS 4/527: Artificial Intelligence

Edge computing (1) Content Distribution Networks

External Memory Hashing

Virtual Memory: Working Sets

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Learning from Data Streams

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

CSE 542: Operating Systems

Presentation transcript:

Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization [Yu et al. 10’]  Restrict to use at most 2GB memory.  Compared methods:  Online methods: VW, Perceptron, MIRA, CW.  Block Minimization Framework [Yu et al. 2010]: BMD BM-PEGASOS.  Proposed methods: SBM with various settings. Motivation  Linear Classifiers handle large-scale data well.  But, a key challenge for large-scale classification is dealing with data sets that cannot fit in memory.  Examples include: Spam Filtering, Classifying Query Logs, Classifying Web Data and Images.  Recent public challenges are also large scale:  Spam filtering data in Pascal challenge: 20GB.  ImageNet challenge: ~130GB.  Existing methods for learning large-scale data:  Data smaller than memory: efficient training methods are well-developed.  Data beyond disk size: distributed solution.  When data cannot fit into memory:  Batch Learner: suffers due to disk swapping.  Online Learner: requires a large number of iterations to converge  large amount of I/O.  Block minimization[Yu et al. 2010]: requires many block loads since it treats all training examples uniformly.  Our Challenge: How to efficiently handle data larger than memory capacity in one machine.  Training time = running time in memory (training) + accessing data from disk.  Two orthogonal directions to reduce disk access: 1)Apply compression to lessen loading time. 2)Better learning from data in memory. E.g., better utilizing memory in learning  Selective Block Minimization (SBM) algorithm: select informative examples and cache them in memory. The SBM algorithm has following properties:  Our method significantly improves the I/O cost:  Spam filtering data: SBM obtained an accurate model with just a single pass over data set (the current best method requires 10 rounds).  As a result, SBM efficiently gives a stable model.  SBM maintains good convergence properties:  Usually a method that selects to cache data and treats samples non-uniformly can only converge to an approximate solution.  SBM can be proved to converge linearly to the global optimal solution on the entire data regardless of the cache selection strategy used.  Has been implemented in a branch of liblinear. Training Time and Testing Accuracy Experiments on streaming data  Split data into blocks and store them in compressed files.  At each time, load and train on a data block by solving a sub-problem.  The drawback: spends the same time and memory on important and unimportant samples.  Intuitive: try to only focus on informative samples.  If we have an oracle of support vectors, we only need to train the model on the support vectors.  At step it considers a data block consisting of 1)new data load from disk B j, 2)informative samples cached in memory  t, j it then solves:  Sub-problems can be solved by a linear classification package such as LIBLINEAR.  Finally, it updates the model and select cached samples  t, j+1 from {  t, j ∪ B j }.  Related to selective sampling and shrinking strategy.  The samples that are close to the separator are usually important – cache these samples.  Define a scoring function and keep the samples with higher scores in cache  Disk level shrinking. Selecting Cached data Selective Block Minimization Learning From Stream Data Implementation Issues  Store cached samples in memory: need a careful design to avoid duplicating data.  Deal with large :  We need to store all the variables even the corresponding instances are not in memory.  Can be very large if #instances is large: 1)If is sparse (support vectors << #instances) we can store nonzero in a hash table. 2) Otherwise, we can store unused in disk.  We can treat a large data set as a stream and process the data in an online manner.  Most online learners cannot obtain a good model with only single pass over data because they use a simple update rule on one sample at a time.  Apply SBM with only one iteration: considering a data block at a time can be better.  No need to store if the corresponding instances are removed from memory  Keep more data in memory.  The model is more stable and can adjust to the available resources. Training Time Analysis Contributions The Algorithm This research is sponsored by DARPA Machine Reading Program and Bootstrapping Learning Program TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA A A AA Linear SVM The code to regenerate the experiments can be download at: