Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000.

Similar presentations


Presentation on theme: "Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000."— Presentation transcript:

1 Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000

2 Is data mining still a niche technology? 97,363 items on Northern Light re “data mining” 9,075,288 items re “data base” or “database” Is 100,000 items a niche? (OR: 14K, XML: 250K) Today data mining tools for experts (statisticians). (Decision Trees, Clusters, K-means, Neural nets…) High tech and High Touch aka: consulting and license fees And the vendors like it that way. Claim that you MUST understand the technology to use it.

3 But.. The Petabytes are Coming!! We will be/are drowning in data/email/web.. Abstraction & categorization are key technologies But, –They have to work. –They have to be trivial to learn. Successful Ubiquitous data mining (clustering/classifiers…) –Mail Filters/Classifiers –Resume readers –Shopping recommendations, Community finders –Web search engines

4 Key technical/research issues for transition to the mainstream? PROCESS PROBLEMS: Getting data into tool is hell Scrubbing data is hell Then comes the easy part: mining Then comes the really hard part: visualization and understanding Most of us: –Can’t understand neural nets (that’s bad). –Can’t understand statistics (that’s a fact).

5 Key technical/research issues for transition to the mainstream? Opportunities: It’s not just numbers Text mining Time series Domain specific –Web logs –Protein patterns –Spatial (e.g. geology, astronomy) –Image

6 New opportunities for KDM? Make data capture/scrub/import trivial Provide intuitive manipulation interfaces Provide simpler analysis concepts support/confidence concept precision/recall ranking pivot & rollup & cube Provide interactive visual data explorer. Case in point: I have yet to see a nice data cube visualizer. CHEVY FORD 1990 1991 1992 1993 RED WHITE BLUE By Color By Make & Color By Make & Year By Color & Year By Make By Year Sum

7 Research challenges that will impact data mining? Simpler analysis concepts Visualization tools to navigate data Better algorithms = Better answers


Download ppt "Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000."

Similar presentations


Ads by Google