Forgetting Counts : Constant Memory Inference for a Dependent Hierarchical Pitman-Yor Process Nicholas Bartlett, David Pfau, Frank Wood Presented by Yingjian Wang Nov. 17, 2010
Background The sequential memoizer Forgetting The dependent HPY Experiment results Outline
Background 2006,Teh, ‘A hierarchical Bayesian language model based on Pitman-Yor processes’ N-gram Markov chain language model with the HPY prior. 2009, Wood, ‘A Stochastic Memoizer for Sequence Data’ The Sequential Memoizer (SM) with linear space/time inference scheme. (lossless) 2010, Gasthaus, ’ Lossless compression based on the Sequence Memoizer’ Combine the SM with an arithmetic coder to develop a compressor (PLUMP/dePLUMP), see , Bartlett, ‘Forgetting Counts : Constant Memory Inference for a Dependent HPY’ Develop a constant memory/space inference for the SM, by using a dependent HPY. (with loss)
SM-Two concepts Memoizer (Donald Michie, 1968): A device whichDonald Michie returns former results under the same input instead of recalculating in order to save time. Stochastic Memoizer (Wood, 2009): The returned results can change since the prediction probability is based upon a stochastic process.
SM-model and trie model: The prefix trie: restaurants.
SM-the NSP (1) The Normalized Stable Process: (Perman, 1990) Pitman-Yor Process: A Normalized Stable Process Dirichlet Process: Concentration parameter: c=0 Discount parameter: d=0
Collapse the middle restaurants: Theorem: If: Then: Prefix tree: restaurants (Weiner, 1973; Ukkonen, 1995) SM-the NSP (2)
SM-linear space inference
Forgetting Motivation: to achieve constant memory inference on the basis of SM. How to do? --- Methods – Forgetting/delete the restaurants. Restaurants - the basic memory units in the context tree: How to delete? – two deletion schemes: random deletion; greedy deleting.
Deletion schemes Random deletion: uniformly delete one leaf restaurant. Greedy deletion: least negatively impacts the estimated likelihood of the observed sequence. Leaf restaurants
The SMC algorithm
The dependent HPY But wait, what we get after the deletion- addition? Will the processes be independent? – No (Since the seating arrangement in the parent restaurant has been changed.)
The experiment results