Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu
Project Overview Design and implement a simple internet distributed computing framework Compare application development for this environment with traditional parallel computing environment.
Grapevine An Internet Distributed Computing Framework - Kunal Agrawal, Kevin Chu
What is Internet Distributed Computing?
Motivation Supercomputers are very expensive Large numbers of personal computers and workstations around the world are naturally networked via the internet Huge amounts of computational resources are wasted because many computers spend most of their time idle Growing interest in grid computing technologies
Other Distributed Computing Efforts
Internet Distributed Computing Issues Nodes reliability Network quality Scalability Security Cross platform portability of object code Computing Paradigm Shift
Overview Of Grapevine
Client Application Grapevine Server Grapevine Volunteer Grapevine Volunteer Grapevine Volunteer
Grapevine Features Written in Java Parametrized Tasks Inter-task communication Result Reporting Status Reporting
Un-addressed Issues Node reliability Load Balancing Un-intrusive Operation Interruption Semantics Deadlock
Meta Classifier - Ang Huey Ting, Li Guoliang
Classifier Function(instance) = {True,False} Machine Learning Approach Build a model on the training set Use the model to classify new instance Publicly available packages : WEKA(in java), MLC++.
Meta Classifier Assembly of classifiers Gives better performance Two ways of generating assembly of classifiers Different training data sets Different algorithms Voting
Building Meta Classifier Different Train Datasets - Bagging Randomly generated ‘bags’ Selection with replacement Create different ‘flavors’ of the training set Different Algorithms E.g. Naïve Bayesian, Neural Net, SVM Different algorithms works well on different training sets
Why Parallelise? Computationally intensive One classifier = 0.5 hr Meta classifier (assembly of 10 classifiers) = 10 *0.5 = 5 hr Distributed Environment - Grapevine Build classifiers in parallel independently Little communication required
Distributed Meta Classifiers WEKA- machine learning package University of Waikato, New Zealand ka/ Implemented in Java Including most popular machine learning tools
Distributed Meta-Classifiers on Grapevine Distributed Bagging Generate different Bags Define bag and Algorithm for each task Submit tasks to Grapevine Node build Classifiers Receive results Perform voting
Preliminary Study Bagging on Quick Propagation in openMP Implemented in C
Trial Domain Benchmark corpus Reuters21578 for Text Categorization train documents test documents 90+ categories Perform feature selection Preprocess documents into feature vectors
Summary Successful internet distributed computing requires addressing many issues outside of traditional computer science Distributed computing is not for everyone