Download presentation
1
Apache Mahout Qiaodi Zhuang Xijing Zhang
2
What is Mahout? Mahout is a scalable machine learning library from Apache. It uses MapReduce paradigm which in combination with Hadoop can be used as an inexpensive solution to solve machine learning problems. [1].Anil, Robin, Ted Dunning, and Ellen Friedman. Mahout in action. Manning, 2011.
3
Problem & Challenge Many datasets now are:
Far too large for a single machine, cannot fit into main memory [2].
4
Mahout’s Algorithms: Clustering: Kmeans, Fuzzy Kmeans
Classification: SVM, Random Forests Recommender Pattern Mining Regression
5
K-means Algorithms: Input: a database D, of m records, r1, ..., rm and a desired number of clusters k Output: set of k clusters that minimizes the squared error criterion Begin Randomly choose k records as the centroids for the k clusters; repeat assign each record ri to a cluster such that the distance between ri and the cluster centroid (mean) is the smallest among the k clusters; recalculate the centroid (mean) for each cluster based on the records assigned to the cluster; until no change; End;
6
K-means Clustering in Mahout
[3].K-means Clustering in the Cloud -- A Mahout Test, R. M. Esteves et al.,IEEE Advanced Information Networking and Applications , 2011,
7
Evaluation The dataset is from the 1999 KDD cup.
It has 4,940,000 records, with 41 attributes and 1 label (converted to numerical. A 1.1 GB dataset was used. This file was randomly segmented into smaller files. [3].K-means Clustering in the Cloud -- A Mahout Test, R. M. Esteves et al.,IEEE Advanced Information Networking and Applications , 2011,
8
[3]. K-means Clustering in the Cloud -- A Mahout Test, R. M
[3].K-means Clustering in the Cloud -- A Mahout Test, R. M. Esteves et al.,IEEE Advanced Information Networking and Applications , 2011,
9
Future Classification Clustering Association Rules
Decision Trees such as J48 and ID3 Clustering DBSCAN and CoWeb Clustering techniques Association Rules Apriori
10
References: [1].Anil, Robin, Ted Dunning, and Ellen Friedman. Mahout in action. Manning, 2011. [2]. [3].K-means Clustering in the Cloud -- A Mahout Test, R. M. Esteves et al.,IEEE Advanced Information Networking and Applications , 2011, [4]. [5].
11
Question?
12
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.