Download presentation
Presentation is loading. Please wait.
1
Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited
2
Foundations What is data mining A collection of techniques? A set of composable operations (a la Relational Algebra)? Hints: Inductive Databases (Mannila) Relational Calculus + Statistical Quantifiers (Imielinski) Most Impact Research Opportunity? No tangible Progress
3
Privacy Implications Can we build accurate data models while preserving privacy of individual records? Hints Randomization (Agrawal & Srikant): Replace x by x+y where y is drawn from a known distribution Anonymization (Crypto literature) “Must Do” Problem Interesting cryptographic and Randomization approaches Lots of research opportunities
4
Web Mining: Beyond Click Streams Mining knowledge bases from the web Completeness Accuracy Malicious Spam Hints: Brin’s Book experiment etc. etc. Help make semantic web real Some progress (e.g.WWW-03 paper) Alternative : Mass Collaboration (e.g. ISWC-03 paper)
5
Web Mining: Beyond hrefs What other social behaviors exist on the web and how to make use of them? Hints: Viral marketing paper in this conf etc. etc. Some ideas (e.g.WWW-03 paper on opinion mining from newsgroups) Importance too early to judge
6
Actionable Patterns Principled use of domain knowledge for discarding uninteresting patterns performance Hints: Papers in the recent KDD conferences Extremely important Safeguard against false positives, particularly when mining rare events Long ways to go
7
Simultaneous mining over multiple data types Not just Relational tables Time series Textual documents But patterns across all of them Add web, audio and video data types to the list
8
Some more problems Online, incremental algorithms over data streams When to retire the past data Long sequential patterns Discovering richer patterns (trees and dags) Automatic, data-dependent selection of algorithm parameters Still interesting. Some progress (particularly, mining stream data), but much more remains to be done.
9
What not to work on? The field is too young! Let every flower bloom!!! Too early to say we don’t need new algorithms Impressive results of the PVSM algorithm Emphasize evaluation and benchmarks Interesting research issues No change in position
10
Applications most likely to benefit from data mining Web applications (I think) Bioinformatics (I hope!) Bioinformatics – upgrading to “I think”
11
Inhibitors Insufficient skill base (Education) Usability No change in position
12
Grand Challenge Find What’s there What has changed Across sovereign data repositories
13
The true delight is in the finding out, rather than in the knowing. Isaac Asimov Exciting prospect for the field
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.