Download presentation
Presentation is loading. Please wait.
Published byLeah Williamson Modified over 11 years ago
1
Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman
2
Observations Transfer of data mining research into deployed applications and commercial products – Greater success in vertical applications – Horizontal tools: Examples: SAS Enterprise Miner: Sophisticated Statisticians segment DB2 Intelligent Miner: database applications requiring mining Emergence of the application of data mining in non-conventional domains – Combination of structured and unstructured data New challenges due to security/privacy concerns DARPA initiative to fund data mining research
3
Identifying Social Links Using Association Rules Input: Crawl of about 1 million pages
4
Website Profiling using Classification Input: Example pages for each category during training
5
Discovering Trends Using Sequential Patterns & Shape Queries Input: i) patent database ii) shape of interest
6
Discovering Micro-communities Frequently co-cited pages are related. Pages with large bibliographic overlap are related.
7
New Challenges Privacy-preserving data mining Data mining over compartmentalized databases
8
Inducing Classifiers over Privacy Preserved Numeric Data 30 | 25K | …50 | 40K | … Randomizer 65 | 50K | … Randomizer 35 | 60K | … Reconstruct Age Distribution Reconstruct Salary Distribution Decision Tree Algorithm Model 30 become s 65 (30+35) Alices age Alices salary Johns age
9
Other recent work Cryptographic approach to privacy- preserving data mining – Lindell & Pinkas, Crypto 2000 Privacy-Preserving discovery of association rules – Vaidya & Clifton, KDD2002 – Evfimievski et. Al, KDD 2002 – Rizvi & Haritsa, VLDB 2002
10
Computation over Compartmentalized Databases
11
Some Hard Problems Past may be a poor predictor of future – Abrupt changes – Wrong training examples Actionable patterns (principled use of domain knowledge?) Over-fitting vs. not missing the rare nuggets Richer patterns Simultaneous mining over multiple data types When to use which algorithm? Automatic, data-dependent selection of algorithm parameters
12
Discussion Should data mining be viewed as rich querying and deeply integrated with database systems? – Most of current work make little use of database functionality Should analytics be an integral concern of database systems? Issues in data mining over heterogeneous data repositories (Relationship to the heterogeneous systems discussion)
13
Summary Data mining has shown promise but needs much more further research We stand on the brink of great new answers, but even more, of great new questions -- Matt Ridley
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.