Analysis of Clustering Algorithms

Analysis of Clustering Algorithms
Cluster 0 (Color CC0000) Analysis of Clustering Algorithms Ethan Summers and Kathryn Cooper College of Information Science and Technology University of Nebraska at Omaha, Omaha, NE 68182 Results ABSTRACT Results Algorithm Clusters Nodes in Clusters Unclustered nodes K-medoid 3 Cluster 0: 8 Cluster 1: 10 Cluster 2: 8 MCODE 2 Cluster 0: 5 Cluster 1: 4 17 MCL Cluster 0: 19 Cluster 1: 3 Cluster 2: 2 K-medoid Cluster 0: Red Cluster 1: Green Cluster 2: Purple In Bioinformatics, choosing the right algorithm for a problem is very important. Choosing the wrong algorithm or one that is less efficient can make or break a project. Analyzing algorithms beforehand is key. The goal of this project is to analyze three clustering algorithms for protein protein interaction networks and compare their function and results. A clustering algorithm takes a dataset, in this case a simulated PPI (protein-protein interaction) network and groups together similar data points based on some similarity criteria. It is important to know the difference between these algorithms to get the desired results. Methods K-medoid Even clusters Forced cluster amount No unclustered nodes Seems to be good if there needs to be a specific amount of clusters, and all nodes must be used MCODE Even clusters Fewer clusters Many unclustered nodes Seems to be good to pull specialized clusters out only if there is a connection MCL Skewed nodes Very large single cluster Few unclustered nodes Seems to create a large cluster, this could be due to it being the first. PPI data was simulated by random assignment of both values and edges A number of network clustering algorithms were investigated for use. Many of these were not used but included WDCM, MSARC, e-CCC biclustering, and SPICI Ultimately, the following algorithms were chosen and run using Cytoscape. Cytoscape is a UI that was used to run algorithms using a randomly generated list of nodes and edges. The algorithms used were: K-medoids, MCODE, and MCL These algorithms were chosen for their usability and ease of use with existing file formats. MCODE Cluster 0: Red Cluster 1: Green Conclusion and Future Directions Results There is no set format to be used for this type of system, so finding a GUI is very helpful There is a general sense of people making systems only for their own work, due to a lack of examples, guides, and standardization For the future look for software, like Cytoscape, rather than individual tools “Base” Network: Randomized set with 26 nodes and edges with edge weights betwen between 0-1 (chosen randomly) MCL Cluster 0: Red Cluster 1: Green Cluster 2: Purple Node1 Node2 Edge A G 0.1 P 0.9 Q 0.7 B W 0.2 O 0.4 0.6 C E 0.3 I Z D R 0.5 U T 0.8 Y S F X H 1 J K L V M N References Bader, G. D., & Hogue, C. W. (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, 4(1), 2. Morris, J. H., Apeltsin, L., Newman, A. M., Baumbach, J., Wittkop, T., Su, G., ... & Ferrin, T. E. (2011). clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC bioinformatics, 12(1), 436. Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., ... & Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11),

Analysis of Clustering Algorithms

Similar presentations

Presentation on theme: "Analysis of Clustering Algorithms"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analysis of Clustering Algorithms

Similar presentations

Presentation on theme: "Analysis of Clustering Algorithms"— Presentation transcript:

Similar presentations

About project

Feedback