Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006
2 Outline Mining Different Data Types –Spatial, Temporal, Time Series, Data Streams, Multimedia, XML, Web, Text etc. Distributed Data Mining (DDM) Mobile & Ubiquitous Data Mining (UDM) Data Mining E-Services Anytime, Anywhere Data Mining E-Services
3 Generations of Data Mining Four Generations of Data Mining Systems – Robert Grossman First Generation – Stand Alone, Centralised, Single Algorithm Second Generation – Integration with databases, support for high- dimensionality, complex data types Third Generation –Distribution and Heterogeniety Fourth Generation – Support for mining embedded, mobile and ubiquitous data sources
Distributed Data Mining
5 Distributed Data Mining Inherently distributed data MNC + Global Markets => Physical/geographical separation of users from the data sources Traditional data mining model involving the co-location of users, data and computational resources is inadequate
6 Distributed Data Mining (DDM) The inherent distribution of data and other resources as a result of organisations being distributed. The large volumes of data, the transfer of which results in exorbitant communication costs. The need to mine heterogeneous data, the integration of which is both non-trivial and expensive. The performance and scalability bottle necks of data mining.
7 Distributed Data Mining (DDM) DDM = Data Mining (DM) + Knowledge Integration (KI) DM - Performing traditional knowledge discovery at each distributed data site. KI - Merging the results generated from the individual sites into a body of cohesive and unified knowledge.
8 Parallel Data Mining (PDM) Principal distinction between DDM & Parallel DM –parallel mining involves parallel processors with or without shared memory Parallel data mining also includes development of parallel versions of traditional data mining techniques. Can be integration – DecisionCentre
9 DDM – Algorithms & Architectures Research in distributed data mining can be divided into two broad categories [Fu01]: Data Mining Algorithms. –focus on efficient techniques for knowledge integration. Distributed Data Mining Architectures. –focus on development of distributed data mining architectures –emphasizes the processes and technologies that support construction of software systems to perform distributed data mining
10 Taxonomy of DDM Architectures
11 Classification – DDM Systems DDM Architectural ModelsDDM Systems Client-serverDecisionCentre [CDG99], IntelliMiner [PaS99, PaS01], InterAct [PaD02] Agents Mobile Agent Stationary Agent JAM [SPT97], Infosleuth [UMG98, MUU99], BODHI [KPH99], Papyrus [Ram98], PADMA [KHS97a, KHS97b]
12 Client-Server DDM
13 Mobile Agent Model for DDM
14 Hybrid Model for DDM
Ubiquitous Data Mining
16 Ubiquitous Data Mining (UDM) Mining data in a resource-constrained environment to support the time critical information needs of mobile users Typical Characteristics –Mobile User – frequent disconnections –Handheld Device - >Resource constraints – memory, battery, processor, screen real-estate –Time critical –Real-time & On-line –Data Streams Example Scenarios Many Challenges
17 Current Research Kargupta’s Group Monash Univ. –AgentUDM –Adapative, Cost-efficient & Light-weight data mining techniques for data streams >Mohamed Medhat >LWC, LWF & LWClass >Watch this space!!!
Data Mining E-Services
19 Data Mining E-Services “…data analysis and mining functions themselves will be offered as business intelligence e-services that accept operational data from clients and return models or rules” Umesh Dayal, 2001 Why? – Knowledge is a key resource – Cost of data mining infrastructure
20 Data Mining E-Services Current Commercial Landscape –Several ASPs -> DigiMine, Information Discovery, WhiteCross Systems, ListAnalyst.com etc. etc. –Mode of Operation Hybrid Model & Data Mining ASPs –Optimise Response Time >Leads to improved throughput –QoS Estimation –Location Preferences of Clients
21 Data Mining E-Services Current Commercial Landscape –Several ASPs -> DigiMine, Information Discovery, WhiteCross Systems, ListAnalyst.com etc. etc. –Mode of Operation Hybrid Model & Data Mining ASPs –Optimise Response Time >Leads to improved throughput –QoS Estimation –Location Preferences of Clients
Anytime, Anywhere Data Mining E-Services
23 My Thoughts Data is a commodity, Analysis is a service Access anytime, anywhere By anyone… –From large corporations to small business to individuals From home buyers to mobile salespersons to grocery shoppers…
24 My Thoughts A preliminary model for delivery –Datacentric Grids
References
26 References MobileComponents/projects/dame/ MobileComponents/projects/dame/ research.htmlhttp:// research.html / / tmlhttp:// tml main.htmlhttp:// main.html