Christos Faloutsos CMU SDM’07 Panel Data Mining Research: Current Status and Future Opportunities Christos Faloutsos CMU
Questions Q1: What are future challenges and opportunities for data mining that are not presently receiving as much attention as they deserve? Q2: Are there things we are doing now that we should be rethinking in considering future challenges and opportunities for data mining? SDM'07 C. Faloutsos, CMU
Past + current successes cross-disciplinarity: DM = Stat, ML, DB fascinating apps: Bio-informatics privacy security streams social network mining ... SDM'07 C. Faloutsos, CMU
Machine Learning to support Systems Biology: Subcellular Location - Bob Murphy Cell Images of many proteins Feature Extraction, Graphical Models, Clustering of proteins by pattern Combine to enable accurate simulation of cell behavior Generative Models for each pattern SDM'07 C. Faloutsos, CMU
Q1: Challenges to focus on Scalability – mining Tera and Peta bytes stream mining (anomaly, intrusion detection, sensors) graph mining (text/web mining, marketing, ...) autonomic systems search engines national security ... SDM'07 C. Faloutsos, CMU
Scalability Google: > 450,000 processors in clusters of ~2000 processors each target: hundreds of Tb, to several Peta-bytes Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003 SDM'07 C. Faloutsos, CMU
E.g.: self-* system @ CMU >200 nodes 40 racks of computing equipment 774kw of power. target: 1 PetaByte goal: self-correcting, self-securing, self-monitoring, self-... PT bytes, self-*, linux, gigabit link? SDM'07 C. Faloutsos, CMU 7
SDM'07 C. Faloutsos, CMU
DM for Tera- and Peta-bytes Two-way street: <- DM can use such infrastructures to find patterns -> DM can help such infrastructures become self-healing, self-adjusting, ‘self-*’ SDM'07 C. Faloutsos, CMU
Q2: What to do differently emphasis on Systems – DM collaboration SDM'07 C. Faloutsos, CMU