Critique of the dirty dozen: 12 years of KDD Daryl Pregibon AT&T Shannon Laboratory daryl@research.att.com KDD2001 San Francisco, CA
Summary There remains tremendous opportunity for data mining on the horizon To take full advantage of these opportunities some changes are necessary
The KDD Community (who we are) AI DB Stats/ML
KDD Activities (what we do) Theory Methods Applications
We do too much of e-verything e-commerce e-business e-tailing e-this e-that e-nough already!
We focus too much on predictive accuracy Data mining should be about story telling i.e., understanding and interpretability Why can’t we strive to have both - highly accurate predictions and interpretability?
We don’t do enough of…. Foundations/fundamentals Is there a Shannon-like theory for capacity in a data mining channel? We have many ways to quantify the amount of data in a DB (#rows/ #tables/ #bytes) so why can’t we do the same for the amount of information in a DB?
Scientific applications Genomic DBs change the dynamic --- will the KDD community respond? Automation We already have more data than anyone could ever look at --- where are the data mining agents? The classibots? The regressibots? Knowledge Discovery in Data as a process More than just tactics! Education How do we train the data mining generation?