| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1

|  Manage Data – Sparse Data – Heterogeneous Data – Semantic Represantation  Train Prediction Models – Data Intensive Application – On Demand Procedure  Make Prediction & Update Models – Fast Access to Trained Models – Update with new values 2 Smart Grid Project Services

|  Management of Data – Choose Underline Technology – Evaluate provided services  Training of Models – Design Training Tools – Take Advantage of Infrastructure – Give Efficient Solutions to Training  Access & Update Training Models – Update: Change Invariants that Effect Prediction – Do it Efficiently 3 Steps to Scalability

|  Requirements – Efficient Usage of Storage – Access Client to Data – Semantic Organization of Data  Possible Solutions – Distributed File System (HDFS) » Raw Data » Work out a Structure (XML, Ontology Schemas) – Column Oriented NoSQL Systems(Hbase,Cassandra) » Structure offered – Column Families » Implemented Operations » Still Needs Reasoning Operations 4 Managing Data

|  Regression Tree – Support Features – Tree Building – Scalable Implementation OpenPlanet  ARIMA Model – Short Term Prediction – Does Not Support Features? – On Demand Training » Small Prediction Window 5 Prediction Models

|  Brute Force – Efficient use of resources – Build a system from scratch  Decrease Problem Size – Group Data and Pick Representatives – Clustering of Data with Similar Features – Introduce Features into ARIMA model » Use features to cluster Data » Execute Model on Clustered Data » Customer  SuperCustomer 6 Scalable Prediction

|  Problem – Computationally Expensive – High Dimensional – Inevitable Parallelization  Challenges to Parallelization – Partitioning of Data to achieve Load Balance – Reduction of the Communication Cost  Approaches – Hierarchical Clustering : PBirch – Evolutionary Strategies Clustering – Density Based Clustering : PDBSCAN – Model Based Clustering : Autoclass System 7 Parallel Clustering

|  PBirch – Single Program Multiple Data(SPMD) – Message Passing Interface (MPI)  Steps – Distribute Data Equally – Build Tree on Each Processor – Execute Clustering on Leaf nodes - Parallel Kmeans  Results – Linear Speedup – Increased Communication Latency – http://www.cs.gsu.edu/~wkim/index_files/papers/pbirch.pdf http://www.cs.gsu.edu/~wkim/index_files/papers/pbirch.pdf 8 Parallel Hierarchical Clustering

|  Model – Stochastic Optimization – Biological Evolution Concepts – Recombination, Mutation – Motive: Huge Range of Possible Solutions  Parallelization Techniques – Master – Slave Model » Master in charge of parent solutions » Slave in charge of recombination and mutation » Fits into mapreduce model  Proposed Solution – http://www.cs.gsu.edu/~wkim/index_files/papers/clusteringwi thes.pdf http://www.cs.gsu.edu/~wkim/index_files/papers/clusteringwi thes.pdf 9 Clustering with Evolutionary Strategies

|  PDBSCAN – Based on original DBSCAN Algorithm – Shared Nothing Architecture  Execution – Divide Input into Several Partitions – Concurrently Cluster Data Locally with DBSCAN – Merge Local Clusters into Global Clusters  dR*-Tree Introduced – Decreased Communication Cost – Efficient Access of Data – Distributed Data Pages – Replicated Indices on all Machines  Results – Near Linear Speedup to the number of Machines – http://www.cs.gsu.edu/~wkim/index_files/papers/fastParallel_XU.pdf http://www.cs.gsu.edu/~wkim/index_files/papers/fastParallel_XU.pdf 10 Parallel Density Based Clustering

|  Auto-class System – Bayesian Classification – Probability of an Instance belonging to a class  Approach – SIMD  Single Instruction Multiple Data – Divide Input into Processors – Update Parameters for Classification Locally – No Need for Load Balancing  Results – Good Scaling – After a certain threshold the communication starts to hinder the performance 11 Parallel Model Based Clustering

|  Main Idea – Potential Model – Derived from Gravitational Force Model in Euclidean Space – Parameters: » Gravitational Constant, » Bandwidth Distance B ( Max Distance from center of cluster ) » δ threshold distance (avoid singularity problem)  Execution – Calculate Potential at each Point – Sort Points According to the Calculated Potential – Choose Cluster Centers by iteration over sorted array – If distance between to points in array > B create new cluster  Results – Near optimal Solution – http://www.sciencedirect.com/science/article/pii/S0031320312001136 http://www.sciencedirect.com/science/article/pii/S0031320312001136 12 Clustering By Sorting Potential Values

| Any Questions? 13

| Thank you for your attention! Vasilis Zois vzois@usc.edu 14

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

Similar presentations

Presentation on theme: "| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

Similar presentations

Presentation on theme: "| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1."— Presentation transcript:

Similar presentations

About project

Feedback