Cloud Service Placement via Subgraph matching

Slides:

Advertisements

Similar presentations

ADAPTIVE FASTEST PATH COMPUTATION ON A ROAD NETWORK: A TRAFFIC MINING APPROACH Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag.

Advertisements

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.

gSpan: Graph-based substructure pattern mining

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.

Bahman Bahmani  Fundamental Tradeoffs  Drug Interaction Example [Adapted from Ullman’s slides, 2012]  Technique I: Grouping 

Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.

Error Tolerant Address Configuration for Data Center Networks with Malfunctioning Devices Xingyu Ma, Chengchen Hu, Kai Chen, Che Zhang, Hongtao Zhang,

Zheng Zhang, Exact Matchingslide 1 Exact (Graph) Matching Zheng Zhang e Presentation for VO Structural Pattern Recognition.

1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.

Indexing Network Voronoi Diagrams*

Chen Chen 1, Cindy X. Lin 1, Matt Fredrikson 2, Mihai Christodorescu 3, Xifeng Yan 4, Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University.

 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.

Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,

Workspace-based Connectivity Oracle An Adaptive Sampling Strategy for PRM Planning Hanna Kurniawati and David Hsu Presented by Nicolas Lee and Stephen.

33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.

Towards Scalable Critical Alert Mining Bo Zong 1 with Yinghui Wu 1, Jie Song 2, Ambuj K. Singh 1, Hasan Cam 3, Jiawei Han 4, and Xifeng Yan 1 1 UCSB, 2.

Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)

BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,

FLANN Fast Library for Approximate Nearest Neighbors

Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.

Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim

Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.

Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.

Graph Indexing: A Frequent Structure based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†

1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.

Network Aware Resource Allocation in Distributed Clouds.

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.

1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang

Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.

PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.

Querying Business Processes Under Models of Uncertainty Daniel Deutch, Tova Milo Tel-Aviv University ERP HR System eComm CRM Logistics Customer Bank Supplier.

Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.

Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University

Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.

IBM T. J. Watson Research © 2004 IBM Corporation On Scalable Storage Area Network(SAN) Fabric Design Algorithm Bong-Jun Ko (Columbia University) Kang-Won.

DIST: A Distributed Spatio-temporal Index Structure for Sensor Networks Anand Meka and Ambuj Singh UCSB, 2005.

Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

Efficient Peer-to-Peer Keyword Searching 1 Efficient Peer-to-Peer Keyword Searching Patrick Reynolds and Amin Vahdat presented by Volker Kudelko.

Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.

The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

Data Structures and Algorithms in Parallel Computing Lecture 7.

Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.

March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.

1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.

Graph Indexing From managing and mining graph data.

EuroSys Doctoral Workshop 2011 Resource Provisioning of Web Applications in Heterogeneous Cloud Jiang Dejun Supervisor: Guillaume Pierre

Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.

Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades

Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.

A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:

Da Yan (HKUST) James Cheng (CUHK) Wilfred Ng (HKUST) Steven Liu (HKUST)

Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.

Gspan: Graph-based Substructure Pattern Mining

Privacy Preserving Subgraph Matching on Large Graphs in Cloud

Probabilistic Data Management

Kijung Shin1 Mohammad Hammoud1

Spatial Online Sampling and Aggregation

On Efficient Graph Substructure Selection

Automatic Physical Design Tuning: Workload as a Sequence

Akshay Tomar Prateek Singh Lohchubh

Lu Xing CS59000GDM 9/21/2018.

Scale-Space Representation for Matching of 3D Models

Scale-Space Representation for Matching of 3D Models

Jongik Kim1, Dong-Hoon Choi2, and Chen Li3

Presentation transcript:

Cloud Service Placement via Subgraph matching Bo Zong (UCSB) Ramya Raghavendra (IBM T.J. Watson) Mudhakar Srivatsa (IBM T.J. Watson) Xifeng Yan (UCSB) Ambuj K. Singh (UCSB) Kang-Won Lee (IBM T.J. Watson)

Cloud datacenters Datacenters shared by multiple customers Popular cloud datacenters Provider: Amazon EC2, Microsoft Azure, Rackspace, etc Customers: renting virtual machines for their services Profitable business In 2013, Microsoft Azure sold $1 billion services By 2015, Amazon web services worth ~$50 billion

Cloud service placement User-defined cloud service Node: functional component Edge: data communication Label: required resources Placement Place the service into a cloud datacenter A set of servers hosting the service v1 v2 v3 v4 v5 v6 v7 10 Mbps 1 Mbps 1 Gbps 2 Mbps 8G 4G 2G 12G 32G Memory requirement cloud service

How to find compatible placement? … Cloud abstraction Cloud abstraction Node: servers Edge: communication link Label: available resources Compatible placement requires Isomorphic substructure Sufficient resources Benefit both customers and providers Customer: expected performance Provider: higher throughput v1 v2 v3 v4 v5 v6 v7 Cloud service How to find compatible placement?

Outline Introduction Problem statement Gradin Experiment results Graph index for frequent label updates Accelerate fragment join Experiment results

Dynamic subgraph matching v1 v2 v3 v4 Q 0.1 0.7 0.6 0.3 Input Data graph G (frequently updated numerical labels) Query graph Q Output: Q’s compatible subgraphs from G NP-hard S1 0.2 0.1 S2 0.2 0.9 0.5 S3 0.2 0.9 0.3 NOT compatible NOT compatible Compatible

Existing techniques for subgraph matching Graph indexing techniques Include exact, approximate, or probabilistic subgraph matching Cannot adapt to graphs of partially ordered numerical labels Cannot scale with frequent label updates Non-indexing techniques Cannot scale with the size of data/query graphs

Outline Introduction Problem statement Gradin Experiment results Graph index for frequent label updates Accelerate fragment join Experiment results

Offline graph index construction v1: 0.1 v2: 0.7 v3: 0.6 v4: 0.3 Canonical labeling Decompose graph information Structure: canonical labeling Label: vectors Indexing graphs of numerical labels 1  2 2  3 2  4 4  1 ID (v3, v2, v1, v4) Label (0.6, 0.7, 0.1, 0.3) labeling(s1) labeling(s2) labeling(s3) labeling(s4) labeling(s5) Label vectors for fragments (subgraphs) G s1 s2 s3 s4 s5

Online query processing Large search space ... Cq1 Cq2 Cqk Compatible graph fragments q1 q2 qk ... Query fragments Graph Index Index filtering + Fragment join Output Compatible subgraphs Q Query graph Frequent label updates

Outline Introduction Problem statement Gradin Experiment results Graph index for frequent label updates Accelerate fragment join Experiment results

Frequent label updates Where are the label updates from? Cloud services come and go Upgrade hardware in datacenters 30% of node labels are updated in a 3000-node data graph > 5M graph fragments will be updated State-of-the-art R-tree variant takes > 5 minutes Inverted index scans all fragments for searching FracFilter Fast fragment searching Fast index updating

FracFilter: construction Grid index for each indexed structure Map fragments into grids Grid density λ: #partitions in each dimension Partition the space into grids Index label vectors labeling(s1) labeling(s2) labeling(s3) labeling(s4) labeling(s5) Label vectors for fragments (subgraphs)

FracFilter: search Input: label vector of a query fragment Output: compatible graph fragments Procedure: Find the grid for query fragment Fragments in R1 are compatible Fragments in R3 are not compatible Verify fragments in R2 Parsimonious verification Reduce ~ 80% comparisons No quality loss

Tune searching performance by grid density Analytical result: we need 𝑑 𝑛 𝑠 𝜆 𝑑 ( 𝜆+1 2 ) 𝑑−1 comparisons in average 𝜆, grid density 𝑑, the number of dimensions for structure s 𝑛 𝑠 , the number of graph fragments Assumption 1: graph fragments are uniformly distributed Assumption 2: query fragment is uniformly distributed When 𝜆 increases, FracFilter is more efficient for searching The efficiency gain will diminish

FracFilter: update Update FracFilter for a graph fragment Identify the old grid, and delete the fragment Find the new grid, and insert the fragment Analytical result: 2(1− 1 𝜆 𝑑 ) operations per update in average 𝜆, grid density 𝑑, number of dimensions Assumption: graph fragments are uniformly distributed When 𝜆 increases, FracFilter is less efficient for updating

Outline Introduction Problem statement Gradin Experiment results Graph index for frequent label updates Accelerate fragment join Experiment results

Minimum fragment cover Large search space A small query graph can have many query fragments A 10-star query has 720 3-star query fragments Only need some of them to cover the query graph Which query fragments do we need? Minimum fragment cover

Minimum fragment cover Input Query fragment { 𝑞 1 , 𝑞 2 ,…, 𝑞 𝑘 } for Q Matched graph fragments { 𝐶 1 , 𝐶 2 ,…, 𝐶 𝑘 } Output: a subset 𝐹⊆{ 𝑞 1 , 𝑞 2 ,…, 𝑞 𝑘 } Fragments in F cover all edges of Q 𝐽= 𝑞 𝑖 ∈𝐹 log⁡(| 𝐶 𝑖 |) is minimized A greedy algorithm can approximate it within 𝑂(log⁡(𝑚′)) 𝑚′, #edges in a query graph NP-hard

Large search space (cont.) Joining two query fragments could be daunting Both q1 and q2 have 1M matched graph fragments Joining q1 and q2 needs ~ 10 12 comparisons How to make it faster? Fingerprint based pruning

Fingerprint based pruning Given a fragment join order, extract fragment fingerprints in linear time Minimum fragment cover Organize fragments by fingerprints f1 f2 … fn Graph fragments qi Fingerprint (v1, v2) (v2, v3, v5) (v3, v5, v4) Join order: q1  q2  q3 Ci

Fingerprint based pruning (cont.) … fn graph fragments qi+1 Fingerprint q1  …  qi … g Prune unnecessary search qi  qi+1 Graph fragment g, a partial match from q1  …  qi Extract the fingerprint f’ which g is looking for Check the graph fragments in 𝐶 𝑖+1 of fingerprint f’ Ci+1  f’

Outline Introduction Problem statement Gradin Experiment results Graph index for frequent label updates Accelerate fragment join Experiment results

Setup Data graph Numerical labels and updates Query graph BCUBE: 3000 ~ 15000 nodes, average degree 18~20 CAIDA: 26475 nodes, 106762 edges Numerical labels and updates Label generator derived from ClusterData Query graph All possible graphs of no more than 10 edges Generate Labels from Uniform(0, 1) Baselines: VF2, UpdAll, UpdNo, NaiveGrid, and NaiveJoin

Query processing Return up to 100 compatible subgraphs in < 2 seconds Up to 10 times and 5 times faster than UpdNo and NaiveGrid Close to UpdAll

Index updating Process 9M updates in 2 seconds Up to 10 times faster than UpdAll

Scalability Gradin is up to 20 times faster than UpdAll on index update Gradin is 10 times and 8 times faster than UpdNo and NaiveGrid

Conclusion Cloud service placement via subgraph matching Gradin Indexing graphs of numerical labels Fast query processing Fast index updating Gradin outperforms the baseline algorithms on query processing and index updating. First, we proposed the idea of using subgraph matching queries to search compatible subgraphs for cloud service placement. Second, we proposed a graph indexing technique Gradin to approach the problem. Gradin can … Those features fit the applications in cloud service placement In addition, experimental results show that…

Thank you