Presentation is loading. Please wait.

Presentation is loading. Please wait.

Apache Mahout Qiaodi Zhuang Xijing Zhang.

Similar presentations


Presentation on theme: "Apache Mahout Qiaodi Zhuang Xijing Zhang."— Presentation transcript:

1 Apache Mahout Qiaodi Zhuang Xijing Zhang

2 What is Mahout? Mahout is a scalable machine learning library from Apache. It uses MapReduce paradigm which in combination with Hadoop can be used as an inexpensive solution to solve machine learning problems. [1].Anil, Robin, Ted Dunning, and Ellen Friedman. Mahout in action. Manning, 2011.

3 Problem & Challenge Many datasets now are:
Far too large for a single machine, cannot fit into main memory [2].

4 Mahout’s Algorithms: Clustering: Kmeans, Fuzzy Kmeans
Classification: SVM, Random Forests Recommender Pattern Mining Regression

5 K-means Algorithms: Input: a database D, of m records, r1, ..., rm and a desired number of clusters k Output: set of k clusters that minimizes the squared error criterion Begin Randomly choose k records as the centroids for the k clusters; repeat assign each record ri to a cluster such that the distance between ri and the cluster centroid (mean) is the smallest among the k clusters; recalculate the centroid (mean) for each cluster based on the records assigned to the cluster; until no change; End;

6 K-means Clustering in Mahout
[3].K-means Clustering in the Cloud -- A Mahout Test, R. M. Esteves et al.,IEEE Advanced Information Networking and Applications , 2011,

7 Evaluation The dataset is from the 1999 KDD cup.
It has 4,940,000 records, with 41 attributes and 1 label (converted to numerical. A 1.1 GB dataset was used. This file was randomly segmented into smaller files. [3].K-means Clustering in the Cloud -- A Mahout Test, R. M. Esteves et al.,IEEE Advanced Information Networking and Applications , 2011,

8 [3]. K-means Clustering in the Cloud -- A Mahout Test, R. M
[3].K-means Clustering in the Cloud -- A Mahout Test, R. M. Esteves et al.,IEEE Advanced Information Networking and Applications , 2011,

9 Future Classification Clustering Association Rules
Decision Trees such as J48 and ID3 Clustering DBSCAN and CoWeb Clustering techniques Association Rules Apriori

10 References: [1].Anil, Robin, Ted Dunning, and Ellen Friedman. Mahout in action. Manning, 2011. [2]. [3].K-means Clustering in the Cloud -- A Mahout Test, R. M. Esteves et al.,IEEE Advanced Information Networking and Applications , 2011, [4]. [5].

11 Question?

12 Thank you!


Download ppt "Apache Mahout Qiaodi Zhuang Xijing Zhang."

Similar presentations


Ads by Google