Presentation is loading. Please wait.

Presentation is loading. Please wait.

Redpoll A machine learning library based on hadoop Jeremy CS Dept. Jinan University, Guangzhou.

Similar presentations


Presentation on theme: "Redpoll A machine learning library based on hadoop Jeremy CS Dept. Jinan University, Guangzhou."— Presentation transcript:

1 Redpoll A machine learning library based on hadoop Jeremy Chow(coderplay@gmail.com) CS Dept. Jinan University, Guangzhou

2 Introduction What is redpoll? Who will use redpoll? Motivation Challenge from large-scale datasets More pratical when mining textual corpus Close to we chinese people Apache licensed

3 Basic Principles... Decomposition Mappers Reducer Assume that we have a set of m data points each of length n

4 Performance Bottlenecks Network bandwidth I/O speed Algorithm implementations Hadoop

5 Current Works Vector Writable utils Distance Measure utils Naive Bayes Canopy K-means An Infrastructure for textual DM An example for mining Sogou news

6 An example: Canopy Large, high dimensional Large, high dimensional datasets clustering Two different distance Two different distance Two stages Two stages Computation saving Applying many domains Applying many domains EM, GAC, K-means EM, GAC, K-means

7 An example: Canopy cont'd CanopyDriver CanopyMapper Input output CanopyReducer output ClusterDriver & ClusterMapper assign each point to canopies

8 What's the Next? SVM(Support Vector Machine) Fast in training and prediction Optimal hyperplane Kernels Duality Decomposition Parallelize approach

9 Algorithms under plan EM(Expectation Maximization) LSI(Latant Semantic Indexing) SVD (Singular Values Decomposition) PCA(Principal Components Analysis) PageRank KNN(k Nearest Neighbors) Linear Regression and so on...

10 Welcome to join us! Development Documentation Source code management Suggestion Any other things can help us

11 http://code.google.com/p/redpoll Check it out!


Download ppt "Redpoll A machine learning library based on hadoop Jeremy CS Dept. Jinan University, Guangzhou."

Similar presentations


Ads by Google