Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center University of Illinois.

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

gSpan: Graph-based substructure pattern mining
Composite Subset Measures Lei Chen, Paul Barford, Bee-Chung Chen, Vinod Yegneswaran University of Wisconsin - Madison Raghu Ramakrishnan Yahoo! Research.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
Chen Chen 1, Cindy X. Lin 1, Matt Fredrikson 2, Mihai Christodorescu 3, Xifeng Yan 4, Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)
Network Optimization Models: Maximum Flow Problems In this handout: The problem statement Solving by linear programming Augmenting path algorithm.
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Components and Architecture CS 543 – Data Warehousing.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Maximizing the Lifetime of Wireless Sensor Networks through Optimal Single-Session Flow Routing Y.Thomas Hou, Yi Shi, Jianping Pan, Scott F.Midkiff Mobile.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
CSE 550 Computer Network Design Dr. Mohammed H. Sqalli COE, KFUPM Spring 2007 (Term 062)
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Chapter 2 Database System Concepts and Architecture
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
XCube XML For Data Warehouses By Sven Groot. Data warehouses Contains data drawn from several databases and external sources Contains data drawn from.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Network Aware Resource Allocation in Distributed Clouds.
OnLine Analytical Processing (OLAP)
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
A Graph-based Friend Recommendation System Using Genetic Algorithm
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
Self-Similarity of Complex Networks Maksim Kitsak Advisor: H. Eugene Stanley Collaborators: Shlomo Havlin Gerald Paul Zhenhua Wu Yiping Chen Guanliang.
On Node Classification in Dynamic Content-based Networks.
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Evaluating Network Security with Two-Layer Attack Graphs Anming Xie Zhuhua Cai Cong Tang Jianbin Hu Zhong Chen ACSAC (Dec., 2009) 2010/6/151.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 11 Channel Assignment for Maximum Throughput in Multi-Channel Access Point Networks Xiang Luo, Raj Iyengar and Koushik Kar Rensselaer Polytechnic Institute.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
The Cost of Fault Tolerance in Multi-Party Communication Complexity Binbin Chen Advanced Digital Sciences Center Haifeng Yu National University of Singapore.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
What is OLAP?.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Graph Indexing From managing and mining graph data.
Network Dynamics and Simulation Science Laboratory Structural Analysis of Electrical Networks Jiangzhuo Chen Joint work with Karla Atkins, V. S. Anil Kumar,
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Managing Data Resources File Organization and databases for business information systems.
Multi-Core Parallel Routing
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Instructor: Shengyu Zhang
On Relationships Offering New Drill-across Possibilities
DataMart (Data Warehouse) Tool:
Algorithms (2IL15) – Lecture 7
Online Analytical Processing Stream Data: Is It Feasible?
Presentation transcript:

Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center University of Illinois at Chicago

Outline Motivation Framework Efficient Computation Experiments Conclusion

Online Analytical Processing Jim Gray, 1997 OLAP as a powerful analytical tool

The Usefulness of OLAP Multi-dimensional Different perspectives Multi-level Different granularities Can we offer roll-up/drill-down and slice/dice on graph data? Traditional OLAP cannot handle this, because they ignore links among data objects

The Prevalence of Graphs Chemical compounds, computer vision objects, circuits, XML Especially various information networks Biological networks Bibliographic networks Social networks World Wide Web (WWW)

Applications WWW >= 3 billion nodes, >= 50 billion arcs Facebook >= 100 million active users Combining topological structures and node/edge attributes Great challenge to view and analyze them We propose Graph OLAP to tackle this issue

Scenario #1 A bibliographic network The collaboration patterns among researchers for SIGMOD 2004

Scenario #2

Outline Motivation Framework Data Model Two types of Graph OLAP Dimension, Measure and OLAP operations Efficient Computation Experiments Conclusion

Data Model We have a collection of network snapshots G = {G 1, G 2,..., G N } Each snapshot G i = (I 1,i, I 2,i,..., I k,i ; G i ) I 1,i, I 2,i,..., I k,i are k informational attributes describing the snapshot as a whole G i = (V i, E i ) is an attributed graph, with attributes attached with its nodes V i and edges E i Since G 1, G 2,..., G N only represent different observations of a network, V 1, V 2,..., V N actually correspond to the same set of objects

Two Types of OLAP Informational OLAP (abbr. I-OLAP) Topological OLAP (abbr. T-OLAP)

Informational OLAP Dimensions come from informational attributes attached at the whole snapshot level, so-called Info-Dims e.g., scenario #1

I-OLAP Characteristics Overlay multiple pieces of information Do not change the objects whose interactions are being looked at In the underlying snapshots, each node is a researcher In the summarized view, each node is still a researcher

Topological OLAP Dimensions come from the node/edge attributes inside individual networks, so-called Topo-Dims e.g., scenario #2

T-OLAP Characteristics Zoom in/Zoom out Network topology changed: “generalized” nodes and “generalized” edges In the underlying network, each node is a researcher In the summarized view, each node becomes an institute that comprises multiple researchers

Measures in Graph OLAP Measure is an aggregated graph I-aggregated graph T-aggregated graph Other measures like node count, average degree, etc. can be treated as derived Graph plays a dual role Data source Aggregate measure

Generality of the Framework Measures could be complex e.g., maximum flow, shortest path, centrality Combine I-OLAP and T-OLAP into a hybrid case

Graph OLAP Operations Graph I-OLAPGraph T-OLAP Roll-up Overlay multiple snapshots to form a higher-level summary via I-aggregated graph Shrink the topology and obtain a T- aggregated graph that represents a compressed view, whose topological elements (i.e., nodes and/or edges) have been merged and replaced by corresponding higher-level ones Drill-down Return to the set of lower- level snapshots from the higher-level overlaid (aggregated) graph A reverse operation of roll-up Slice/dice Select a subset of qualifying snapshots based on Info-Dims Select a subgraph of the network based on Topo-Dims

Outline Motivation Framework Efficient Computation Measure classification Optimizations Constraint pushing Experiments Conclusion

Two Categories of Strategies Top-down Generalized cells later How to combine and leverage intermediate results? Bottom-up Generalized cells first How to early-stop?

Measure Classification How to combine and leverage intermediate results? Distributive The computation of high-level cells can be directly built on low-level cells Algebraic Not distributive, but can be easily derived from several distributive measures Holistic Neither distributive nor algebraic

Examples Distributive: collaboration frequency Use distributiveness to drive computation up the cuboid lattice Algebraic: maximum flow Will prove later Semi-distributive Holistic: centrality Need to go down to the raw data and start from scratch

Optimizations Special measures may have special properties that can help optimize the calculations We discuss two of them here, with regard to I-OLAP Localization Attenuation

Localization During computation, only a neighborhood of the networks needs to be consulted e.g., the collaboration frequency of “R. Agrawal” and “R.Srikant” for [sigmod, all-years] only depends on their collaboration frequencies in each SIGMOD conferences Perfect (i.e., 0-neighborhood) localization k-neighborhood is less ideal, but still useful e.g., # of common friends shared by “R. Agrawal” and “R.Srikant”

Attenuation Consider the transporting capability (i.e., maximum flow) from source S to destination T Multiple transportation networks, each one is operated by a separate company With regard to I-OLAP, each network is a “snapshot”, and overlaying more than one snapshots means to share link capacities among companies

Attenuation Data graph C Node: cities Edge: capacity of a link Measure graph F Node: cities Edge: when maximum flow is transmitted, the quantity that passes through a link

Attenuation Maximum flow is algebraic F can be derived from C Just run the maximum flow algorithm The capacity graph C is obviously distributive Lemma Let F be a flow in C and let C F be its residual graph, where residual means that C F = C - F, then F ′ is a maximum flow in C F if and only if F + F ′ is a maximum flow in C

Attenuation Consider two snapshots that are overlaid Maximum flow F 1, F 2 already calculated from C 1, C 2 Without attenuation Compute the overall maximum flow F from C 1 + C 2 With attenuation Take F 1 + F 2 as basis Compute the residual maximum flow F ′ from (C 1 - F 1 ) + (C 2 - F 2 ), and augment it onto F 1 + F 2 Thus, our input attenuates from C 1 + C 2 to (C 1 + C 2 ) - (F 1 + F 2 ), which substantially decreases the efforts

Constraint Pushing Iceberg graph cube Partial materialization Satisfying some interestingness requirement Push the constraints Anti-monotone e.g., maximum flow |f| ≥ δ |f| Monotone e.g., diameter d ≥ δ d

Outline Motivation Framework Efficient Computation Experiments Conclusion

OLAP a Bibliographic Network We get the coauthorship data from DBLP Measure Information Centrality Two Info-Dims Area Database (DB): PODS/SIGMOD/VLDB/ICDE/EDBT Data Mining (DM): ICDM/SDM/KDD/PKDD Information Retrieval (IR): SIGIR/WWW/CIKM Time

OLAP a Bibliographic Network

Efficiency A test that computes maximum flow as the measure Synthetically generate flow networks Details in the paper, with each “snapshot” representing an individual player in the transportation industry Like the Multi-Way method, calculate low-level cells before merging them into high-level ones One takes advantage of the attenuation heuristic The other does not

Efficiency

Outline Motivation Framework Efficient Computation Experiments Conclusion

We propose a Graph OLAP framework to perform multi-dimensional, multi-level analysis on network data Measure is an aggregated graph Informational/Topological dimensions lead to I-OLAP, T-OLAP

Conclusion Mainly focusing on I-OLAP, we discuss how a graph cube can be efficiently computed and materialized distributive, algebraic, holistic Optimizations: localization, attenuation Constraint pushing

Future Works Technical issues for T-OLAP Selective drilling and discovery-driven InfoNet-OLAP