Download presentation
Presentation is loading. Please wait.
1
Linear Clustering Algorithm BY Horne Ken & Khan Farhana & Padubidri Shweta
2
2 Overview Introduction Data Preprocessing Data Mining Data Visualization Experiment Conclusion
3
3 Responsibility Data Preprocessing : Farhana & Ken Data Mining : Ken Data Visualization: Shweta
4
4 Overview A Linear Clustering Algorithm Applications 1. Feature selection – Choose features based on information gain 2. Discretization – Partition based on data set characteristics
5
5 Data Preprocessing Data Ferret(Federated Electronic Research,Review,Extraction & Tabulation Tool) Install the software Web-version http://www.thedataweb.org/what_ferrett.html
6
6 Data Pre-processing : Step Extracted data from CPS (Current Population Survey) Pre-processing Number of features 43 Year 2007-2008 115,000/month rows over 50 states After preprocessing 23 Normalization
7
Data Mining Algorithm Choose an ordinal attribute (X) Order data points based on attribute List potential partition points (between successive values of X) For each potential partition point P Calculate distance of data points where X P Results Can partition data points Order data points by information gain
8
Data Mining Test dataset
9
Data Mining Test dataset 2
10
10 Experimental Setup Environment 1. Data Ferret : Data Pre-processing 2. Java Platform : Implement the Data Mining Algorithm 3. Data Visualization 1. Google App Engine Datastore API Python, javascript and Django Framework 2. Google Chart API Hardware: Windows XP laptop Core2 2.16 GHz 2.00 GB RAM (that hurt)
11
11 Visualization Demo Link for the web-site http://householdstructure-project.appspot.com/
12
12 Conclusions Preliminary results are encouraging Discretization was successful Lessons learnt and future work Comparison with other methods on well known datasets Evaluate performance in feature selection OPTIMIZE Don't pick a novel dataset & novel algorithm at the same time
13
Thank you Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.