Outlier Detection for Information Networks Manish Gupta 15 th Jan 2013.

Slides:



Advertisements
Similar presentations
Md. Mahbub Hasan University of California, Riverside.
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Lower Bound for Sparse Euclidean Spanners Presented by- Deepak Kumar Gupta(Y6154), Nandan Kumar Dubey(Y6279), Vishal Agrawal(Y6541)
Every edge is in a red ellipse (the bags). The bags are connected in a tree. The bags an original vertex is part of are connected.
The Small World of Software Reverse Engineering Ahmed E. Hassan and Richard C. Holt SoftWare Architecture Group (SWAG) University Of Waterloo.
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
1st Meeting Industrial Geometry Computational Geometry ---- Some Basic Structures 1st IG-Meeting.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
(hyperlink-induced topic search)
Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.
Honglei Zhuang1, Jing Zhang2, George Brova1,
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.
Fast Algorithms for Top-k Personalized PageRank Queries Manish Gupta Amit Pathak Dr. Soumen Chakrabarti IIT Bombay.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,
DIDAR – Database Intrusion Detection with Automated Recovery Asankhaya Sharma Govindarajan S Srivatsan V Prof. DVLN Somayajulu.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Community Evolution in Dynamic Multi-Mode Networks Lei Tang, Huan Liu Jianping Zhang Zohreh Nazeri Danesh Zandi & Afshin Rahmany Spring 12SRBIAU, Kurdistan.
Using Hyperlink structure information for web search.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Spanning Trees Introduction to Spanning Trees AQR MRS. BANKS Original Source: Prof. Roger Crawfis from Ohio State University.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Presentation : Finding a Team of Experts in Social Networks Jack Cheng Ka Ho The Chinese University of Hong Kong SEEM 5010 Advanced Database and Information.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Lower Bound for Sparse Euclidean Spanners Presented by- Deepak Kumar Gupta(Y6154), Nandan Kumar Dubey(Y6279), Vishal Agrawal(Y6541)
Recsplorer: Recommendation Algorithms Based on Precedence Mining ACM SIGMOD Conference
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
1 Authors: Glen Jeh, Jennifer Widom (Stanford University) KDD, 2002 Presented by: Yuchen Bian SimRank: a measure of structural-context similarity.
Topic 12 Graphs 1. Graphs Definition: Two types:
Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Query-based Graph Cuboid Outlier Detection
1 3/21/2016 MATH 224 – Discrete Mathematics First we determine if a graph is connected.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
Spanning Trees Alyce Brady CS 510: Computer Algorithms.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Data Mining is the process of analyzing data and summarizing it into useful information Data Mining is usually used for extremely large sets of data It.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Finding Dense and Connected Subgraphs in Dual Networks
Minimum Spanning Tree Chapter 13.6.
Jiawei Han Department of Computer Science
Community detection in graphs
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Large Graph Mining: Power Tools and a Practitioner’s guide
Community Distribution Outliers in Heterogeneous Information Networks
On Efficient Graph Substructure Selection
Chapter 2: Organizing Data
Connected Components Minimum Spanning Tree
Graph Database Mining and Its Applications
Fine-Grained Complexity Analysis of Improving Traveling Salesman Tours
CS223 Advanced Data Structures and Algorithms
GreedyMaxCut a b w c d e 1.
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Jiawei Han Department of Computer Science
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*
The connected word recognition problem Problem definition: Given a fluently spoken sequence of words, how can we determine the optimum match in terms.
The Problem Large video files look nicer but require more space to store and send over networks Smaller files look worse but require less space The tricky.
For Friday Read chapter 9, sections 2-3 No homework
Learning to Cluster Faces on an Affinity Graph
Presentation transcript:

Outlier Detection for Information Networks Manish Gupta 15 th Jan 2013

Problem 1: TopK Outlier Cuboid Detection for Graph OLAP Consider the DBLP co-authorship network One can store it in OLAP with dimensions as research areas and years Query: In which research area and in which set of years, there were exceptionally high collaborations between Stanford, IITBombay and Berkeley authors? Research area and years determine different levels of cuboids Given: A subgraph query and a weighted network (like DBLP) Find: TopK outlier cuboids from graph OLAP such that the percentage edge weight covered by matches is exceptionlly high A possible result: (DM+DB, ) We hope to explore an application of genetic algorithms in this project StanfordIITBombay Berkeley

Problem 2: Outlier Substructures in an Information Network Given: A heterogeneous information network and a heterogeneous query Consider the DBLP network of authors, conferences and title terms Consider a simple query: A Data Mining researcher Patterns: Most data mining researchers – Are connected to other data mining authors, conferences or terms – Are connected to very few very popular authors – Etc Given the query, one can find all matches in the network For a match, given the usual connectivity patterns and the neighborhood for the match – One can compute p = probability of generation of the match – Outlier score can then be computed as 1-p We hope to explore a generative way of modeling a subgraph neighborhood (like Block models) in this project