Chapter 8: Extensions and Applications
Learning from Massive Datasets Can it be held in main memory?---Naïve Byaes Method Some learning schemes are incremental; some are not. What about time it takes to model?—should be linear or near linear What to do when data set is too large? Use a small subset of data for training---law of diminishing returns Some schemes do better with more data; but there is also a danger of overfitting Parallelization is another way---develop parallelized versions of learning schemes
Incorporating Domain Knowledge :Metadata---data about data---semantic, causal, and functional Text and web mining: Adversarial situations: Junk email filtering, for example