BY ROSELINE ANTAI CLUTO A Clustering Toolkit
What is CLUTO? CLUTO is a software package which is used for clustering high dimensional datasets and for analyzing the characteristics of the various clusters.
Algorithms of CLUTO vcluster scluster Major difference: Input format vcluster: actual multidimensional representation of the objects to be clustered. scluster: The similarity matrix (or graph) between these objects.
Calling Sequence vcluster [optional parameters] MatrixFile Nclusters scluster [optional parameters] MatrixFile NClusters
Optional Parameters Standard specification -paramname or –paramname = value Three categories: Clustering algorithm parameters Reporting and Analysis parameters Cluster Visualization parameters
Clustering algorithm parameters Control how CLUTO computes the clustering solution. Examples 1. -clmethod=string ( rb, agglo,direct,graph, etc) 2. -sim = string (cos,corr,dist,jacc) 3. -crfun = string (i1,i2 etc) 4. -fulltree
Reporting and Analysis Parameters Control the amount of information that vcluster and scluster report about the clusters as well as the analysis performed on discovered clusters. Examples 1. -clustfile = string. ( Default is MatrixFile.clustering.Nclusters( or GraphFile)) 2. -clabelfile = string (name of the file that’s stores the labels of the columns. Used when –showfeatues, -showsummaries or –labeltree are used)
3. -rlabelfile=string 4. -rclassfile=string (Stores the labels of the rows – objects to be clustered). 5. -showtree 6. -showfeatures (descriptive and discriminating)
Cluster Visualization Parameters Simple plots of the original input matrix which show how the different objects (rows) and features (columns) are clustered together. Examples 1. -plottree = string; gives graphic representation of the entire hierarchical tree 2. -plotmatrix = string; shows how the rows of the original matrix are clustered together.
A practical example ../cluto/Linux/vcluster -clmethod=rb -sim=cos -fulltree - rlabelfile=Final_Results/rlabelfile - rclassfile=Final_Results/classfile -showtree -plotformat=gif - plottree=Final_Results/Images/PT-Final10d - plotmatrix=Final_Results/Images/PM-Final10d - plotclusters=Final_Results/Images/PC-Final10d - showfeatures Final_Results/FinalOutput10d-Vt.mat 4
Classfile and rlabelfile Evo Sem Imp Imp Deo Deo Imp Imp Deo Deo Imp Deo Deo Imp Sem Deo Sem Imp Imp Evo
Plotclusters output
The plot uses red to denote positive values and green to denote negative values. Bright red/green indicate large positive/negative values, whereas colors close to white indicate values close to zero.