| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

Slides:



Advertisements
Similar presentations
Scalable Regression Tree Learning on Hadoop using OpenPlanet Wei Yin.
Advertisements

Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Large-Scale Machine Learning Program For Energy Prediction CEI Smart Grid Wei Yin.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
CS218 – Final Project A “Small-Scale” Application- Level Multicast Tree Protocol Jason Lee, Lih Chen & Prabash Nanayakkara Tutor: Li Lao.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
FLANN Fast Library for Approximate Nearest Neighbors
Birch: An efficient data clustering method for very large databases
K-Ary Search on Modern Processors Fakultät Informatik, Institut Systemarchitektur, Professur Datenbanken Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
1 The Google File System Reporter: You-Wei Zhang.
Tree-Based Density Clustering using Graphics Processors
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Efficient Model Selection for Support Vector Machines
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Machine Learning Queens College Lecture 7: Clustering.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
2 nd Austrian HPC Workshop Heuristiclab Hive Goals Realization Deployment Page1.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
CEng 713, Evolutionary Computation, Lecture Notes parallel Evolutionary Computation.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Software Systems Development
Curator: Self-Managing Storage for Enterprise Clusters
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Parallel Density-based Hybrid Clustering
Real-Time Ray Tracing Stefan Popov.
University of Technology
Optimized Rewriter Rules for Efficient Querying of JSON Data
DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS
K Nearest Neighbor Classification
CMPT 733, SPRING 2016 Jiannan Wang
Communication and Memory Efficient Parallel Decision Tree Construction
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Parallel Analytic Systems
Hybrid Programming with OpenMP and MPI
Database System Architectures
Parallel Programming in C with MPI and OpenMP
Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1.
MapReduce: Simplified Data Processing on Large Clusters
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1

|  Manage Data – Sparse Data – Heterogeneous Data – Semantic Represantation  Train Prediction Models – Data Intensive Application – On Demand Procedure  Make Prediction & Update Models – Fast Access to Trained Models – Update with new values 2 Smart Grid Project Services

|  Management of Data – Choose Underline Technology – Evaluate provided services  Training of Models – Design Training Tools – Take Advantage of Infrastructure – Give Efficient Solutions to Training  Access & Update Training Models – Update: Change Invariants that Effect Prediction – Do it Efficiently 3 Steps to Scalability

|  Requirements – Efficient Usage of Storage – Access Client to Data – Semantic Organization of Data  Possible Solutions – Distributed File System (HDFS) » Raw Data » Work out a Structure (XML, Ontology Schemas) – Column Oriented NoSQL Systems(Hbase,Cassandra) » Structure offered – Column Families » Implemented Operations » Still Needs Reasoning Operations 4 Managing Data

|  Regression Tree – Support Features – Tree Building – Scalable Implementation OpenPlanet  ARIMA Model – Short Term Prediction – Does Not Support Features? – On Demand Training » Small Prediction Window 5 Prediction Models

|  Brute Force – Efficient use of resources – Build a system from scratch  Decrease Problem Size – Group Data and Pick Representatives – Clustering of Data with Similar Features – Introduce Features into ARIMA model » Use features to cluster Data » Execute Model on Clustered Data » Customer  SuperCustomer 6 Scalable Prediction

|  Problem – Computationally Expensive – High Dimensional – Inevitable Parallelization  Challenges to Parallelization – Partitioning of Data to achieve Load Balance – Reduction of the Communication Cost  Approaches – Hierarchical Clustering : PBirch – Evolutionary Strategies Clustering – Density Based Clustering : PDBSCAN – Model Based Clustering : Autoclass System 7 Parallel Clustering

|  PBirch – Single Program Multiple Data(SPMD) – Message Passing Interface (MPI)  Steps – Distribute Data Equally – Build Tree on Each Processor – Execute Clustering on Leaf nodes - Parallel Kmeans  Results – Linear Speedup – Increased Communication Latency – Parallel Hierarchical Clustering

|  Model – Stochastic Optimization – Biological Evolution Concepts – Recombination, Mutation – Motive: Huge Range of Possible Solutions  Parallelization Techniques – Master – Slave Model » Master in charge of parent solutions » Slave in charge of recombination and mutation » Fits into mapreduce model  Proposed Solution – thes.pdf thes.pdf 9 Clustering with Evolutionary Strategies

|  PDBSCAN – Based on original DBSCAN Algorithm – Shared Nothing Architecture  Execution – Divide Input into Several Partitions – Concurrently Cluster Data Locally with DBSCAN – Merge Local Clusters into Global Clusters  dR*-Tree Introduced – Decreased Communication Cost – Efficient Access of Data – Distributed Data Pages – Replicated Indices on all Machines  Results – Near Linear Speedup to the number of Machines – Parallel Density Based Clustering

|  Auto-class System – Bayesian Classification – Probability of an Instance belonging to a class  Approach – SIMD  Single Instruction Multiple Data – Divide Input into Processors – Update Parameters for Classification Locally – No Need for Load Balancing  Results – Good Scaling – After a certain threshold the communication starts to hinder the performance 11 Parallel Model Based Clustering

|  Main Idea – Potential Model – Derived from Gravitational Force Model in Euclidean Space – Parameters: » Gravitational Constant, » Bandwidth Distance B ( Max Distance from center of cluster ) » δ threshold distance (avoid singularity problem)  Execution – Calculate Potential at each Point – Sort Points According to the Calculated Potential – Choose Cluster Centers by iteration over sorted array – If distance between to points in array > B create new cluster  Results – Near optimal Solution – Clustering By Sorting Potential Values

| Any Questions? 13

| Thank you for your attention! Vasilis Zois 14