Presentation is loading. Please wait.

Presentation is loading. Please wait.

Smoothing 10/27/2017.

Similar presentations


Presentation on theme: "Smoothing 10/27/2017."— Presentation transcript:

1 Smoothing 10/27/2017

2 References Not Mentioned (much)
Capture-Recapture Kneser-Ney Additive Smoothing Laplace Smoothing Jeffreys Dirichlet Prior What’s wrong with adding one? 10/27/2017

3 10/27/2017

4 Good-Turing Assumptions: Modest, but…
10/27/2017

5 10/27/2017

6 Jurafsky (Lecture 4-7) https://www. youtube. com/watch
10/27/2017

7 Tweets Don’t forget about smoothing
just because your language model has gone neural. Sparse data is real. Always has been, always will be. 10/27/2017

8 Good-Turing Smoothing & Large Counts
Smoothing may potentially illustrate some of the limitations of statistical approach in natural language process. Good Turing method … suffers from noise as r gets larger… While the smoothing allowed to create a desired model and is substantiated by rigorous math, the process seems to also give some impression that mathematics was “bent” to create a desired model. While many statistical approach such as clustering and smoothing have given a far better performance in many of of natural language processing tasks, it raises a question of whether other approaches can enhance the currently dominant statistical approach in natural language processing. 10/27/2017

9 References 10/27/2017

10 10/27/2017

11 10/27/2017

12 10/27/2017

13 Is Smoothing Obsolete? (1 of 2)
Is it true that “there is no data like more data” even when that data is sparse? Were Allison et. al. correct in when they predicted that improved data processing capabilities will make smoothing techniques less necessary? In the case that the entire data set could be used to calculate population frequencies, MLE would return as the popular choice of frequency estimation since it is a straightforward and precise measure. But in a time of working with corpora too large and sparse not to sample, smoothing has been very important. 10/27/2017

14 Is Smoothing Obsolete? (2 of 2)
Good-Turing smoothing was in some sense a general solution to the data sparseness problem. Improvements on the Good-Turing solution include Simple Good-Turing (Gale and Sampson, 1995) and Katz (Katz, 1987). Many survey papers, such as Chen and Goodman’s, compare smoothing methods or try to combine various smoothing methods to produce a more accurate result. Clearly smoothing techniques have been important to NLP, information retrieval, and biology. Smoothing will only become obsolete in the case that our ability to process every word in a corpus (or every web page in a search result, as Gao et. al. were working with) scales with the growth of corpora. Thus the assertion that smoothing might become less and less necessary does not hold. 10/27/2017

15 “It never pays to think until you’ve run out of data” – Eric Brill
Moore’s Law Constant: Data Collection Rates  Improvement Rates Banko & Brill: Mitigating the Paucity-of-Data Problem (HLT 2001) No consistently best learner There is no data like more data Quoted out of context Fire everybody and spend the money on data 6/27/2014 Fillmore Workshop

16 There is no data like more data
Counts are growing 1000x per decade (same as disks) Counts are growing 1000x per decade (same as disks) Rising Tide of Data Lifts All Boats 8/24/2017

17 Neural Nets have more parameter
More parameters  more powerful But also require More data More smoothing Is there implicit smoothing with standard training mechanisms? Dropout Cross-validation As well as explicit smoothing f(x) weighting function in glove c^0.75 10/27/2017

18 10/27/2017

19 10/27/2017

20 Smoothing & Neural Nets
Despite the recent success of recurrent neural nets (RNNs) [which] do not rely on word counts, smoothing remains an important problem in the field of language modeling. … RNNs at first glance appear to sidestep the issue of smoothing altogether. However… low frequency words <UNK> tags weighting functions used in word embeddings In this paper we … [argue] that smoothing remains a problem even as the community shifts [to embeddings & neural networks] 10/27/2017

21 Smoothing & Embeddings
The advance in smoothing and its application on estimation of low frequency ngrams in traditional language model does not seem to pass down to word embedding models, which is commonly believed to outperform ngram models on most of nlp tasks. It argues that empirically all context counts needs to be raised to a power of 0.75: a 'magical' fixed hyperparameter to achieve the best accuracy no matter which task we are performing and what training data we are drawing from. 10/27/2017

22 Although the embedding models come to penalizes the overestimation for small counts,
it improperly down weights the large counts as well. An alternative method, for example, would be applying discounting based on the Good-Turing estimate of the probability unseen or low frequency words. There are many other smoothing methods and a revisit to such improvements would prove to dramatically impact our relative performance. 10/27/2017


Download ppt "Smoothing 10/27/2017."

Similar presentations


Ads by Google