Download presentation
Presentation is loading. Please wait.
Published byLynn Bennett Modified over 8 years ago
1
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1
2
| Manage Data – Sparse Data – Heterogeneous Data – Semantic Represantation Train Prediction Models – Data Intensive Application – On Demand Procedure Make Prediction & Update Models – Fast Access to Trained Models – Update with new values 2 Smart Grid Project Services
3
| Management of Data – Choose Underline Technology – Evaluate provided services Training of Models – Design Training Tools – Take Advantage of Infrastructure – Give Efficient Solutions to Training Access & Update Training Models – Update: Change Invariants that Effect Prediction – Do it Efficiently 3 Steps to Scalability
4
| Requirements – Efficient Usage of Storage – Access Client to Data – Semantic Organization of Data Possible Solutions – Distributed File System (HDFS) » Raw Data » Work out a Structure (XML, Ontology Schemas) – Column Oriented NoSQL Systems(Hbase,Cassandra) » Structure offered – Column Families » Implemented Operations » Still Needs Reasoning Operations 4 Managing Data
5
| Regression Tree – Support Features – Tree Building – Scalable Implementation OpenPlanet ARIMA Model – Short Term Prediction – Does Not Support Features? – On Demand Training » Small Prediction Window 5 Prediction Models
6
| Brute Force – Efficient use of resources – Build a system from scratch Decrease Problem Size – Group Data and Pick Representatives – Clustering of Data with Similar Features – Introduce Features into ARIMA model » Use features to cluster Data » Execute Model on Clustered Data » Customer SuperCustomer 6 Scalable Prediction
7
| Problem – Computationally Expensive – High Dimensional – Inevitable Parallelization Challenges to Parallelization – Partitioning of Data to achieve Load Balance – Reduction of the Communication Cost Approaches – Hierarchical Clustering : PBirch – Evolutionary Strategies Clustering – Density Based Clustering : PDBSCAN – Model Based Clustering : Autoclass System 7 Parallel Clustering
8
| PBirch – Single Program Multiple Data(SPMD) – Message Passing Interface (MPI) Steps – Distribute Data Equally – Build Tree on Each Processor – Execute Clustering on Leaf nodes - Parallel Kmeans Results – Linear Speedup – Increased Communication Latency – http://www.cs.gsu.edu/~wkim/index_files/papers/pbirch.pdf http://www.cs.gsu.edu/~wkim/index_files/papers/pbirch.pdf 8 Parallel Hierarchical Clustering
9
| Model – Stochastic Optimization – Biological Evolution Concepts – Recombination, Mutation – Motive: Huge Range of Possible Solutions Parallelization Techniques – Master – Slave Model » Master in charge of parent solutions » Slave in charge of recombination and mutation » Fits into mapreduce model Proposed Solution – http://www.cs.gsu.edu/~wkim/index_files/papers/clusteringwi thes.pdf http://www.cs.gsu.edu/~wkim/index_files/papers/clusteringwi thes.pdf 9 Clustering with Evolutionary Strategies
10
| PDBSCAN – Based on original DBSCAN Algorithm – Shared Nothing Architecture Execution – Divide Input into Several Partitions – Concurrently Cluster Data Locally with DBSCAN – Merge Local Clusters into Global Clusters dR*-Tree Introduced – Decreased Communication Cost – Efficient Access of Data – Distributed Data Pages – Replicated Indices on all Machines Results – Near Linear Speedup to the number of Machines – http://www.cs.gsu.edu/~wkim/index_files/papers/fastParallel_XU.pdf http://www.cs.gsu.edu/~wkim/index_files/papers/fastParallel_XU.pdf 10 Parallel Density Based Clustering
11
| Auto-class System – Bayesian Classification – Probability of an Instance belonging to a class Approach – SIMD Single Instruction Multiple Data – Divide Input into Processors – Update Parameters for Classification Locally – No Need for Load Balancing Results – Good Scaling – After a certain threshold the communication starts to hinder the performance 11 Parallel Model Based Clustering
12
| Main Idea – Potential Model – Derived from Gravitational Force Model in Euclidean Space – Parameters: » Gravitational Constant, » Bandwidth Distance B ( Max Distance from center of cluster ) » δ threshold distance (avoid singularity problem) Execution – Calculate Potential at each Point – Sort Points According to the Calculated Potential – Choose Cluster Centers by iteration over sorted array – If distance between to points in array > B create new cluster Results – Near optimal Solution – http://www.sciencedirect.com/science/article/pii/S0031320312001136 http://www.sciencedirect.com/science/article/pii/S0031320312001136 12 Clustering By Sorting Potential Values
13
| Any Questions? 13
14
| Thank you for your attention! Vasilis Zois vzois@usc.edu 14
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.