Download presentation
Presentation is loading. Please wait.
1
Clustering and Term Project
Plan for this week
2
Term Project Questions? Examples: Research problems in Data Mining
Industry problems in Data Mining Explore new data with existing/new tools Explore data with different process (tools, data selection, preprocessing) Focus on solving a problem (application or technical)
3
Data exploration Process (time%, importance%) --Dorian Pyle
Exploring the problem space (10, 15) Exploring the solution space (9, 14) Specifying the implementation (1, 51) method (increases profitability, reduces waste, decreases fraud, or meets X goal) Mining the data Preparing the data (60, 15) Surveying the data (15, 3) Modeling the data (5, 2)
4
Ten Golden Rules for Miners --Dorian Pyle
Select clearly defined problems that will yield tangible benefits. Specify the required solution. Define how the solution delivered is going to be used. Understand as much as possible about the problem and data set (the domain). Let the problem drive the modeling (tool and data preparation for model building)
5
Ten Golden Rules for Miners (cont.)
6. Stipulate assumptions. 7. Refine the model iteratively. 8. Make the model as simple as possible. 9. Define instability in the model (critical areas where changes in output vs. input). 10. Define uncertainty in the model (low confidence areas)
6
Selection of Research Paper for Review
Algorithm-centered Application-centered Survey-centered Selection Due Mar. 24
7
Plan of the Week Monday (Dunham’s ppt Part II clustering 74-128)
Similarity and distance measures Hierarchical algorithms (single link…) Partition algorithms (K-Means, MST,…)
8
Plan of the Week (cont.) Wednesday (Witten’s book , pdf ; Dunham’s book 47-51) Statistical based clustering (EM algorithm) Case study: a data mining application using Cubist Term Project: directions and discussion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.