Patterns around Gnutella Network Nodes Sui-Yu Wang.

Slides:



Advertisements
Similar presentations
Sampling Research Questions
Advertisements

 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
gSpan: Graph-based substructure pattern mining
Understanding Electrical TransmissionDemonstration E1 A Guide to the National Grid Transmission Model Demonstration E1 Monitoring the National Grid frequency.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
Advanced Topics in Data Mining Special focus: Social Networks.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Data Mining Techniques Outline
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
Information from Samples Alliance Class January 17, 2012 Math Alliance Project.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Let G be a pseudograph with vertex set V, edge set E, and incidence mapping f. Let n be a positive integer. A path of length n between vertex v and vertex.
Thinking Mathematically
Broadcast & Convergecast Downcast & Upcast
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Graph limit theory: Algorithms László Lovász Eötvös Loránd University, Budapest May
Section 10.1 Introduction to Trees These class notes are based on material from our textbook, Discrete Mathematics and Its Applications, 6 th ed., by Kenneth.
Graphs Rosen, Chapter 8. Isomorphism (Rosen 560 to 563) Are two graphs G1 and G2 of equal form? That is, could I rename the vertices of G1 such that the.
1 Oblivious Routing in Wireless networks Costas Busch Rensselaer Polytechnic Institute Joint work with: Malik Magdon-Ismail and Jing Xi.
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
8- 1 Chapter Eight McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard.
Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,
Data Mining and Decision Support
Chapter 7: The Distribution of Sample Means. Frequency of Scores Scores Frequency.
University of Nevada, Reno Resolving Anonymous Routers Hakan KARDES CS 790g Complex Networks.
Semantic Overlay Networks for P2P Systems Arturo Crespo and Hector Garcia-Molina.
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds Mukund Deshpande, Michihiro Kuramochi, George Karypis University of Minnesota,
Gspan: Graph-based Substructure Pattern Mining
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Mining in Graphs and Complex Structures
School of Computing Clemson University Fall, 2012
Graph Graphs and graph theory can be used to model:
Rule Induction for Classification Using
Introduction to Sampling Distributions
Mining Frequent Subgraphs
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Mayank Bhatt, Jayasi Mehar
Information from Samples
Connected Components Minimum Spanning Tree
Graph Database Mining and Its Applications
Discovering Larger Network Motifs
Trees 11.1 Introduction to Trees Dr. Halimah Alshehri.
Chapter 7: The Distribution of Sample Means
5.4 T-joins and Postman Problems
INTRODUCTION TO NETWORK FLOWS
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Patterns around Gnutella Network Nodes Sui-Yu Wang

Introduction Recent study shows that the distribution of topology in Gnutella network is not purely random. This might imply the possibility of the existence of frequent patterns around nodes in the network. The construction of this model not only help further understanding of this network but also possible improvement of routing algorithm.

Goal Find out the existence of frequent patterns Verify the validity of the model Use this model to predict patterns around nodes that is not in the training data

Representation of the Network (1) Undirected Graph G = { N, E} N { center, depth_1,…, depth_n } E { 1, 2,…, TTL } The depth of nodes other than the center node is defined as the shortest path from that node to the center

Representation of the Network (2) A 2 2 B C B C D E D E G = { N(A) = (depth 2), N(B) = (center), N(c) = (depth 2), N(D) = (depth 1), N(E) = (depth 3), E(A,B) = 2, E(B,D) = 1, E(B,C) = 2, E(C,E) = 1 Each G is called one transaction

Frequent Subgraph Discovery Developed by Michihiro Kuramochi, George Karypis Able to mine patterns in a set of transaction give minimum frequency the patterns appear in the set Gives parent-child relation between subgraphs

Power Law The frequency,, of an out degree, d, is proportional to the out degree to the power of the constant, O

Stratified Sampling Principle of Stratification : partitions are best performed by partitioning data so that samples in each strata are most similar to each other Population of nodes are partitioned into strata –Partition by size of transaction –Partition by the power law

Experiment (1) Find out the frequent patterns in two set of data collected at the same time but belong to different connected component The comparison between two distributions is performed by comparing the relation of frequent subgraphs in each strata The maximum depth in each graph is set to be 3 The TTL is 1 Data is partitioned by the size of transaction

Relative FrequencySet 1Set 2 Graph Graph 2100 Graph Graph Graph Graph

Experiment (1) There are one pattern of size 3, 2 patterns of size 4, and 2 patterns of size 5 missing in data set 2 –Missing parent will cause missing child Grouping based on power law shows similar result Possible reason for difference –Size of data –Classification error –Incomplete observation of the true distribution

Experiment (2) Two connected component of size 591 and 524 taken from different time –Data from transaction of size less than 15 –All subgraphs matches

Set1Set2 Graph Graph2100 Graph Graph Graph Graph

Experiment (3) Grouping by size of transaction TTL = 3 Depth = 3 Result shown are patterns with size 6 of transactions of size 20 to 50 Set 1 size 269 and set 2 size 491 –Five patterns are missing from set 1 –One patterns are missing from set 2

Set1Set2 Graph Graph Graph Graph

Prediction Model Suppose the model has a graph G with two children and The frequency of them are, and If a node finds a has a subgraph isomorphism with G, the chances of finding and in are / and / respectively