Download presentation
Presentation is loading. Please wait.
Published byMelvyn Singleton Modified over 8 years ago
1
How can we maintain an error bound? Settle for a “per-step” bound What’s the probability of a mistake at each step? Not cumulative, but Equal footing with other sources of error Still hard: distributions are changing at each step can’t see other processors until they’re done Compute a retrospective bound Hilbert’s projective metric Used to analyze belief propagation; nice properties Separate “constant” from “changing” counts Measure error in the “constant” part at the end Use this to bound the error at every step –Start with initial counts a, b, c 0 =v 0 +h 0 –P=1 updates a,b, v 0 ! v 1 –AD-LDA: P=2 uses v 0 instead of v 1 ; updates a,b,h –When done, measure d(v 0, v 1 ); O(T) work Understanding Errors in Approximate Distributed Latent Dirichlet Allocation Alexander Ihler David Newman Dept. of Computer Science University of California, Irvine Latent Dirichlet Allocation Dept. of Computer Science University of California, Irvine AD-LDA and modificationsScaling and Experimental Results z di 1A2A3A 3B1B2B 2C3C1C Documents Words Adding a non-negative vector h never increases the metric Invariant to inversion Invariant to element-wise scaling Invariant to scalar normalization 2468 1 2 3 4 5 6 7 8 Number of Cores Speedup Factor Enron NIPS KOS Ideal 020406080100 10 -3 10 -2 10 Sample Error Probability Enron Bound Enron Error Can also compute “true” error (just not in parallel) Shape matches that of the error bound Peak early on Falls to steady-state level 10 6 7 -3 10 -2 Number of Data Sample Error Probability KOS Bound KOS Error NIPS Bound NIPS Error Enron Bound Enron Error reference KOS Bound KOS Error NIPS Bound NIPS Error Enron Bound Enron Error reference KOS Bound KOS Error NIPS Bound NIPS Error Enron Bound Enron Error reference Error bounds NICTA Victoria Research Lab U. Melbourne, Australia Un-collapsed Gibbs sampling Easy to make parallel Collapsed Gibbs sampling Fundamentally sequential Each sample depends (slightly) on all others Parallel efficiency and scaling similar to AD-LDA Same strengths, weaknesses local v. shared data Experiments: shared memory, multicore implementation Investigate scaling properties with Data set size, N Number of processors / blocks, P Number of topics, T Scaling is fairly predictable using a simple approximation: May deteriorate for very large T Extensions to DP models? From Newman et al. (2009) AD-LDA: just run CGS in parallel anyway (Newman et al, 2008; 2009; extensions) Distribute documents across P nodes: Ignore dependence between parallel samples (z, a) local, not shared; (b,c) copied across all nodes In practice, this works great Anecdotal examples: performs the same as LDA But can we know how it will do on new data? No way to know but run sequential LDA First modification: additional partitioning Subdivide data across documents and words Organize computation into orthogonal epochs No two concurrent jobs overlap Less work per epoch, but Fewer inconsistencies “b” no longer shared – only “c” Bulk, stable quantity; if constant, exact samples Properties of HPM Bounds the L1 norm case filed injunction court siut security check background privacy information Document 1: filed suit privacy injunction information case case court security injunction security privacy… Document 2: suit case injunction filed court filed case court background court suit… 0.6 0.4 0.9 0.1 Topic models for text corpora Topics are bags of words Documents are mixtures of topics … … Massive data sets, linear complexity Call for parallel or distributed algorithms (Nallapati et al 2007; Newman et al 2008; Asuncion et al 2009; Wang et al 2009; Yan et al 2009; …) Gibbs sampling: Collapsed sampler converges more quickly
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.