Data Mining Workbenches: a overview &comparison focusing on open-source packages CS240A notes by C. Zaniolo
Most Popular Data Mining Software Rexer Analytics Survey (Early 2007) asked about the tools used often and occasionally. Clearly more popular than the rest were: SPSS or SPSS Clementine "Own Code" SAS or SAS Enterprise Miner Followed by R Weka C4.5 / C5.0
Critical Mass and Popularity Top ten most used packages by KDD Nuggets Survey (May 2007): SPSS/ SPSS Clementine Salford Systems CART/MARS/TreeNet/RF Yale (now Rapid Miner) SAS / SAS Enterprise Miner Angoss Knowledge Studio / Knowledge Seeker KXEN Weka R Microsoft SQL Server? MATLAB? Note: Microsoft Excel omitted as it's not really "data mining" software, and I've merged the tools offered by a single vendor (SPSS and SAS) You can see the full survey results
Comments Gregory Piatetsky-Shapiro, KDnuggets Editor: Votes from tool vendors were removed.. Comparing with 2008 KDnuggets Poll on data mining tools/software used, the big changes are growth in SPSS, RapidMiner, and R.
Popular Data Mining Software (cont.) Rexer Analytics Survey is taken every year and the summary report can be obtained free. 2009 SURVEY HIGHLIGHTS: Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners. SAS Enterprise Miner dropped in data miner’s tool rankings 2010 SURVEY HIGHLIGHTS: R: After a steady rise across the past few years, R overtook other tools to become the tool used by more data miners (43%) STATISTICA has also been climbing in the rankings. STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.