Download presentation
Presentation is loading. Please wait.
Published byDarin Pardon Modified over 10 years ago
1
Data Mining: Crossing the Chasm Rakesh Agrawal IBM Almaden Research Center
2
Thesis The greatest challenge facing data mining is to make the transition from being an early market technology to mainstream technology We have the opportunity to make this transition successful
3
Outline Chasm in the technology adoption life cycle, à la Geoffrey Moore † Experience with Quest/Intelligent Miner Ideas for successful chasm crossing †Geoffrey A Moore. Crossing the Chasm. Harper Business. http://www.chasmgroup.com
4
Technology Adoption Life Cycle Techies: Try it! Visionaries: Get ahead of the herd! Pragmatists: Stick with the herd! Conservatives: Hold on! Skeptics: No way! Late Majority Early Majority Early Adopters Laggards Innovators Psychographic profile of each group is different
5
Innovators: Technology Enthusiasts Intrigued by any fundamental advance in technology Like to alpha test new products Can ignore the missing elements Want access to top technologists Want no-profit pricing (preferably free) Gatekeepers to early adopters
6
Early Adopters: Visionaries Driven by vision of dramatic competitive advantage via revolutionary breakthroughs Great imagination for strategic applications Not so price-sensitive Want rapid time to market Demand high degree of customization Fund the development of early market
7
Early Majority: Pragmatists Want sustainable productivity improvement through evolutionary change Astute managers of mission-critical apps Understand real-world issues and tradeoffs Focus on proven applications; want to see the solution in production Bulwark of the mainstream market
8
Late Majority: Conservatives Want to stay even with the competition Risk averse Price sensitive Need completely pre-assembled solutions Extend technology life cycles
9
Laggards: Skeptics Driven to maintain status quo Good at debunking marketing hype Disbelieve productivity-improvement arguments Can be formidable opposition to early adoption of a technology Retard the development of high-tech markets
10
Crack in the curve Early Market Mainstream Market Chasm The greatest peril in the development of a high-tech market lies in making the transition from an early market dominated by a few visionaries to a mainstream market dominated by pragmatists.
11
Visionaries vs. Pragmatists Adventurous First strike capability Early buy-in State of the art Think big Spend big Prudent Staying power Wait-and-see Industry standard Manage expectation Spend to budget
12
Is data mining following this curve? Yes!!! My personal viewpoint based on Quest/Intelligent Miner experience
13
Quest Started as skunk work in early nineties Inspired by needs articulated by industry visionaries: –Transaction data collected over a long period –Current tools/SQL don’t cut it –About ready to throw data
14
Approach Examine “real” applications Identify operations that cut across applications Design fast, scalable algorithms for each operation Develop applications by composing operations
15
Operations Associations Sequential Patterns Similar time series New Operations Completeness, scalability Classification Clustering Deviations Adopted from Statistics/Learning Scalability http://www.almaden.ibm.com/cs/quest
16
Bringing Quest to market Visionaries who inspired Quest did not become first customers: –Wanted evidence that the technology “worked” Frustrating attempts to interest major IBM customers: –Integration with existing applications –Too-far-out technology –Resistance from in-house analytic groups
17
First hits Small information-based companies who provided data in exchange for free results CIO who wanted to be seen as the technology pioneer in his industry CIO who wanted the success story to feature in the company’s annual report Led to the formation of a group offering services using Quest
18
Characteristics of engagements Mostly associations and sequential patterns Completeness a big plus Unanticipated uses Feedback for further development
19
Into the product land Formation of a small “out-of-plan” product group to productize Quest Facilitated by a closet mathematician Successes of the services group used for market validation Continued development and infusion of technology
20
Intelligent Miner Serious product Integrates technologies from various groups Fast, scalable, runs on multiple platforms Several “early market” success stories http://www.software.ibm.com/data/iminer/
21
Are we in the chasm? Perceived to be sophisticated technology, usable only by specialists Long, expensive projects Stand-alone, loosely-coupled with data infrastructures Difficult to infuse into existing mission- critical applications
22
Chasm Crossing Personal speculations on some technical challenges Do not imply IBM research/product directions
23
XML-based Data Mining Standard (1) Model Building: –A pair of standard DTDs for each operation –Interchangeable library of operator implementations Operator Model Parameters Data Specs Standard DTD Library Ack: Mattos, Pirahesh, Schwenkries
24
XML-based Data Mining Standard (2) Model Deployment: –Mapping XML object provides mapping between names and format in the model object and the data record –Model could have been developed on a different system Application Result Mapping Standard DTDs Standard DTD Library Model Data Record
25
Implications Standard interfaces for application developers to incorporate data mining Coupling with relational databases –mappings from DTDs to relational schemas –implementation using existing infrastructure
26
Data Mining Benchmarks UC Irvine repository Generating synthetic benchmarks modeled after real data sets is a hard problem –How to map names into meaningful literals –How to preserve empirical distributions Ack: Srikant, Ullman
27
Auto-focus data mining Automatic parameter tuning Automatic algorithm selection (à la join method selection in database query optimization) Ack: Andreas Arning
28
Web: Greatest opportunity Huge collection of data (e.g. Yahoo collecting ~50GB every day) Universal digital distribution medium makes data mining results actionable in fundamentally new ways But watch for privacy pitfall
29
Privacy-preserving data mining Technical vs. legislated solutions Implication for data mining algorithms when some fields of a data record have been fudged according to the user’s privacy sensitivity Ack: R. Srikant
30
Personalization Internet might provide for the first time tools necessary for users to capture information about themselves and to selectively release this information † Will we be providing these tools? † John Hagel, Marc Singer. Net Worth. Harvard Business School Press.
31
What about Association Rules? Very long patterns Separating wheat from chaff Principled introduction of domain knowledge
32
What else? Formal foundations of data mining
33
Summary Closely couple data mining with database systems Embed data mining into applications Focus on web Standard interfaces Benchmarks Auto focussing Personalization Privacy
34
Concluding remarks Data mining, a great technology –Combination of intriguing theoretical questions with large commercial interest in the technology Poised for transitioning into mainstream technology Will we rise to the challenge as a community?
35
Acknowledgments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.