Probably Approximately

Slides:

Advertisements

Similar presentations

String Similarity Measures and Joins with Synonyms

Advertisements

1 Random Sampling from a Search Engines Index Ziv Bar-Yossef Maxim Gurevich Department of Electrical Engineering Technion.

Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.

1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.

Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic.

Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.

Quality Aware Privacy Protection for Location-based Services Zhen Xiao, Xiaofeng Meng Renmin University of China Jianliang Xu Hong Kong Baptist University.

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.

Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

TSF: Trajectory-based Statistical Forwarding for Infrastructure-to-Vehicle Data Delivery in Vehicular Networks Jaehoon Jeong, Shuo Guo, Yu Gu, Tian He,

ON LINK-BASED SIMILARITY JOIN A joint work with: Liwen Sun, Xiang Li, David Cheung (University of Hong Kong) Jiawei Han (University of Illinois Urbana.

Fast Random Walk with Restart and Its Applications

Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space

VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.

Click to edit Present’s Name Xiaoyang Zhang 1, Jianbin Qin 1, Wei Wang 1, Yifang Sun 1, Jiaheng Lu 2 HmSearch: An Efficient Hamming Distance Query Processing.

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen

Efficient Computation of Reverse Skyline Queries VLDB 2007.

Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.

1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.

1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.

1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu Yu Kang, Yangfan Zhou, Zibin Zheng, and Michael R. Lyu {ykang,yfzhou,

Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.

Multi-object Similarity Query Evaluation Michal Batko.

Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas.

RoundTripRank Graph-based Proximity with Importance and Speciﬁcity Yuan FangUniv. of Illinois at Urbana-Champaign Kevin C.-C. ChangUniv. of Illinois at.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree

Kijung Shin Jinhong Jung Lee Sael U Kang

Glen Jeh & Jennifer Widom KDD  Many applications require a measure of “similarity” between objects.  Web search  Shopping Recommendations  Search.

A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai.

SimRank: A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom Stanford University ACM SIGKDD 2002 January 19, 2011 Taikyoung Kim SNU.

A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.

Zhuode Liu 2016/2/13 University of Texas at Austin CS 381V: Visual Recognition Discovering the Spatial Extent of Relative Attributes Xiao and Lee, ICCV.

Outline Introduction State-of-the-art solutions Equi-Truss Experiments

Scalable Person Re-identification on Supervised Smoothed Manifold

Nanyang Technological University

Tian Xia and Donghui Zhang Northeastern University

Optimizing Parallel Algorithms for All Pairs Similarity Search

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

FORA: Simple and Effective Approximate Single-Source Personalized PageRank Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, Yin Yang School of Information.

RE-Tree: An Efficient Index Structure for Regular Expressions

TT-Join: Efficient Set Containment Join

CARPENTER Find Closed Patterns in Long Biological Datasets

Sublinear Algorithms for Personalized PageRank, with Applications

ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs Yu Liu , Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai.

High-accuracy PDE Method for Financial Derivative Pricing Shan Zhao and G. W. Wei Department of Computational Science National University of Singapore,

Mining Frequent Itemsets over Uncertain Databases

On Efficient Graph Substructure Selection

Random Sampling over Joins Revisited

Randomized Algorithms CS648

Junqi Zhang+ Xiangdong Zhou+ Wei Wang+ Baile Shi+ Jian Pei*

October 6, 2011 Dr. Itamar Arel College of Engineering

GANG: Detecting Fraudulent Users in OSNs

A Framework for Testing Query Transformation Rules

Efficient Processing of Top-k Spatial Preference Queries

Wei Wang University of New South Wales, Australia

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.

Towards Maximum Independent Sets on Massive Graphs

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

Presentation transcript:

Probably Approximately ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs Yu Liu1, Bolong Zheng2, Xiaodong He1, Zhewei Wei1, Xiaokui Xiao3, Kai Zheng2, Jiaheng Lu4 1DEKE, MOE and School of Information, Renmin University of China. 2Department of Computer Science, University of Queensland. 3School of Computer Engineering, National University of Singapore. 4Department of Computer Science, University of Helsinki. Motivation and Background Optimizations Probe deterministically Batch up Reverse reachability tree Definition of SimRank [Jeh & Widom, KDD02] Recursive equation Random Surfer-Pairs Model Probably Approximately Correct (PAC) Theoretical Guarantee Applications Experiments and Conclusion Small Datasets Problem Statement Top-k: Single source: The Monte Carlo (MC) Algorithm [Fogaras & Racz] Single pair: Single source/Top-k: Sampling-based algorithm Pooling Top-k: State-of-the-arts Query vertex: 3 Method Time Space Drawbacks TopSim [Lee et.al, ICDE12] ~O(|D|2t) - 1. Heuristics -> No accuracy guarantee; 2. Slow on large graphs. TSF [Shao et.al, VLDB15] O(RgRqt|V|) O(Rg|V|) 1. Assumption -> No accuracy guarantee; 2. Large sized index! SLING [Tian & Xiao, SIGMOD16] O(|V|/ε), O(|E|log21/ε) O(|V|/ε) 1. Large index and preprocessing time; 2. Do not support dynamic graph k=3 The ProbeSim Algorithm Large Datasets Basic Idea Forward random walk + Backward searching strategy Intuition: ProbeSim vs. MC Conclusion First single source and top-k SimRank algorithm for dynamic graphs of billion scale, and with theoretical accuracy guarantee Outperforms existing methods in query efficiency, accuracy and scalability First evaluation on large graphs by pooling MC ProbeSim Our code available at https://github.com/dokirabbithole/ProbeSim_vldb_pub