Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited.

Similar presentations


Presentation on theme: "Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited."— Presentation transcript:

1 Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited

2 Foundations What is data mining A collection of techniques? A set of composable operations (a la Relational Algebra)? Hints: Inductive Databases (Mannila) Relational Calculus + Statistical Quantifiers (Imielinski) Most Impact Research Opportunity? No tangible Progress 

3 Privacy Implications Can we build accurate data models while preserving privacy of individual records? Hints Randomization (Agrawal & Srikant): Replace x by x+y where y is drawn from a known distribution Anonymization (Crypto literature) “Must Do” Problem Interesting cryptographic and Randomization approaches Lots of research opportunities

4 Web Mining: Beyond Click Streams Mining knowledge bases from the web Completeness Accuracy Malicious Spam Hints: Brin’s Book experiment etc. etc. Help make semantic web real Some progress (e.g.WWW-03 paper) Alternative : Mass Collaboration (e.g. ISWC-03 paper)

5 Web Mining: Beyond hrefs What other social behaviors exist on the web and how to make use of them? Hints: Viral marketing paper in this conf etc. etc. Some ideas (e.g.WWW-03 paper on opinion mining from newsgroups) Importance too early to judge

6 Actionable Patterns Principled use of domain knowledge for discarding uninteresting patterns performance Hints: Papers in the recent KDD conferences Extremely important Safeguard against false positives, particularly when mining rare events Long ways to go

7 Simultaneous mining over multiple data types Not just Relational tables Time series Textual documents But patterns across all of them Add web, audio and video data types to the list

8 Some more problems Online, incremental algorithms over data streams When to retire the past data Long sequential patterns Discovering richer patterns (trees and dags) Automatic, data-dependent selection of algorithm parameters Still interesting. Some progress (particularly, mining stream data), but much more remains to be done.

9 What not to work on? The field is too young! Let every flower bloom!!! Too early to say we don’t need new algorithms Impressive results of the PVSM algorithm Emphasize evaluation and benchmarks Interesting research issues No change in position

10 Applications most likely to benefit from data mining Web applications (I think) Bioinformatics (I hope!) Bioinformatics – upgrading to “I think”

11 Inhibitors Insufficient skill base (Education) Usability No change in position

12 Grand Challenge Find  What’s there  What has changed Across sovereign data repositories

13 The true delight is in the finding out, rather than in the knowing. Isaac Asimov Exciting prospect for the field


Download ppt "Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited."

Similar presentations


Ads by Google