Download presentation
Presentation is loading. Please wait.
Published byMelina Riley Modified over 9 years ago
1
Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000
2
Is data mining still a niche technology? 97,363 items on Northern Light re “data mining” 9,075,288 items re “data base” or “database” Is 100,000 items a niche? (OR: 14K, XML: 250K) Today data mining tools for experts (statisticians). (Decision Trees, Clusters, K-means, Neural nets…) High tech and High Touch aka: consulting and license fees And the vendors like it that way. Claim that you MUST understand the technology to use it.
3
But.. The Petabytes are Coming!! We will be/are drowning in data/email/web.. Abstraction & categorization are key technologies But, –They have to work. –They have to be trivial to learn. Successful Ubiquitous data mining (clustering/classifiers…) –Mail Filters/Classifiers –Resume readers –Shopping recommendations, Community finders –Web search engines
4
Key technical/research issues for transition to the mainstream? PROCESS PROBLEMS: Getting data into tool is hell Scrubbing data is hell Then comes the easy part: mining Then comes the really hard part: visualization and understanding Most of us: –Can’t understand neural nets (that’s bad). –Can’t understand statistics (that’s a fact).
5
Key technical/research issues for transition to the mainstream? Opportunities: It’s not just numbers Text mining Time series Domain specific –Web logs –Protein patterns –Spatial (e.g. geology, astronomy) –Image
6
New opportunities for KDM? Make data capture/scrub/import trivial Provide intuitive manipulation interfaces Provide simpler analysis concepts support/confidence concept precision/recall ranking pivot & rollup & cube Provide interactive visual data explorer. Case in point: I have yet to see a nice data cube visualizer. CHEVY FORD 1990 1991 1992 1993 RED WHITE BLUE By Color By Make & Color By Make & Year By Color & Year By Make By Year Sum
7
Research challenges that will impact data mining? Simpler analysis concepts Visualization tools to navigate data Better algorithms = Better answers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.