Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu.

Slides:



Advertisements
Similar presentations
An Introduction To Categorization Soam Acharya, PhD 1/15/2003.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Florida International University COP 4770 Introduction of Weka.
University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.
COM vs. CORBA.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
ClearTK: A Framework for Statistical Biomedical Natural Language Processing Philip Ogren Philipp Wetzler Department of Computer Science University of Colorado.
Work Stealing for Irregular Parallel Applications on Computational Grids Vladimir Janjic University of St Andrews 12th December 2011.
M-grid Using Ubiquitous Web Technologies to create a Computational Grid R J Walters and S Crouch 21 January 2009.
Large-Scale Machine Learning Program For Energy Prediction CEI Smart Grid Wei Yin.
Reference: Message Passing Fundamentals.
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
The ATHA Environment: Experience with a User Friendly Environment for Opportunistic Computing M.A.R.Dantas Department of Informatics (INE) University of.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Implementing Remote Procedure Calls an introduction to the fundamentals of RPCs, made during the advent of the technology. what is an RPC? what different.
An Exercise in Machine Learning
Ch 4. The Evolution of Analytic Scalability
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
ShopKeeper was designed from the ground up to manage your entire fleet maintenance operations … from 1 user to 100, including full security features that.
RUNNING PARALLEL APPLICATIONS BEYOND EP WORKLOADS IN DISTRIBUTED COMPUTING ENVIRONMENTS Zholudev Yury.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Optimized Java computing as an application for Desktop Grid Olejnik Richard 1, Bernard Toursel 1, Marek Tudruj 2, Eryk Laskowski 2 1 Université des Sciences.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
Submitted by: Shailendra Kumar Sharma 06EYTCS049.
Appendix: The WEKA Data Mining Software
An Example of Course Project Face Identification.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
1 Implementing Communications-Driven and Group Decision Support Systems Collaborating with peers at other locations is needed in many companies.
The Grid computing Presented by:- Mohamad Shalaby.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Class Imbalance in Text Classification
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
Computer Systems Lab TJHSST Senior Research Project Browser Based Distributed Computing Siggi Simonarson.
TRANS: T ransportation R esearch A nalysis using N LP Technique S Hyoungtae Cho, Melissa Egan, Ferhan Ture Final Presentation December 9, 2009.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Cloud Computing What, why, how? Noam Bercovici Renata Dividino.
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CATEGORIZATION OF NEWS ARTICLES USING NEURAL TEXT CATEGORIZER
Efficient Image Classification on Vertically Decomposed Data
Waikato Environment for Knowledge Analysis
Efficient Image Classification on Vertically Decomposed Data
Parallel and Multiprocessor Architectures – Shared Memory
Prepared by: Mahmoud Rafeek Al-Farra
Machine Learning with Weka
Overview of big data tools
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Lecture 10 – Introduction to Weka
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Project Overview Design and implement a simple internet distributed computing framework Compare application development for this environment with traditional parallel computing environment.

Grapevine An Internet Distributed Computing Framework - Kunal Agrawal, Kevin Chu

What is Internet Distributed Computing?

Motivation Supercomputers are very expensive Large numbers of personal computers and workstations around the world are naturally networked via the internet Huge amounts of computational resources are wasted because many computers spend most of their time idle Growing interest in grid computing technologies

Other Distributed Computing Efforts

Internet Distributed Computing Issues Nodes reliability Network quality Scalability Security Cross platform portability of object code Computing Paradigm Shift

Overview Of Grapevine

Client Application Grapevine Server Grapevine Volunteer Grapevine Volunteer Grapevine Volunteer

Grapevine Features Written in Java Parametrized Tasks Inter-task communication Result Reporting Status Reporting

Un-addressed Issues Node reliability Load Balancing Un-intrusive Operation Interruption Semantics Deadlock

Meta Classifier - Ang Huey Ting, Li Guoliang

Classifier Function(instance) = {True,False} Machine Learning Approach Build a model on the training set Use the model to classify new instance Publicly available packages : WEKA(in java), MLC++.

Meta Classifier Assembly of classifiers Gives better performance Two ways of generating assembly of classifiers Different training data sets Different algorithms Voting

Building Meta Classifier Different Train Datasets - Bagging Randomly generated ‘bags’ Selection with replacement Create different ‘flavors’ of the training set Different Algorithms E.g. Naïve Bayesian, Neural Net, SVM Different algorithms works well on different training sets

Why Parallelise? Computationally intensive One classifier = 0.5 hr Meta classifier (assembly of 10 classifiers) = 10 *0.5 = 5 hr Distributed Environment - Grapevine Build classifiers in parallel independently Little communication required

Distributed Meta Classifiers WEKA- machine learning package University of Waikato, New Zealand ka/ Implemented in Java Including most popular machine learning tools

Distributed Meta-Classifiers on Grapevine Distributed Bagging Generate different Bags Define bag and Algorithm for each task Submit tasks to Grapevine Node build Classifiers Receive results Perform voting

Preliminary Study Bagging on Quick Propagation in openMP Implemented in C

Trial Domain Benchmark corpus Reuters21578 for Text Categorization train documents test documents 90+ categories Perform feature selection Preprocess documents into feature vectors

Summary Successful internet distributed computing requires addressing many issues outside of traditional computer science Distributed computing is not for everyone