Clustering, performance evaluation, and Term Project 1.Term Project 2.Resource for review
Term Project Questions? Examples: –Research problems in Data MiningResearch problems in Data Mining –Industry problems in Data Mining/Data Warehousing –Explore new data with existing/new tools (C5, Cubist, Weka) –Explore data in comparative analysis (different algorithms, tool extensions, data selection, preprocessing ) –Focus on solving a problem (application or technical) and conduct a literature survey
Clustering (Dunham’s ppt Part II clustering ) –Similarity and distance measures –Hierarchical algorithms (single link…) –Partition algorithms (K-Means, PAM,…)
Additional Notes on EM Algorithms: Clustering Witten’s book , pdf ; –Background, introduction on Statistical based clustering (EM algorithm) Dunham’s book 47-51, Part I –Basic concept of EM algorithm
Performance Evaluation Witten’s book Chapter 5 (see on-line notes)