CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 8: TIME SERIES AND GRAPH MINING.

Slides:



Advertisements
Similar presentations
Single Source Shortest Paths
Advertisements

Lower Bound for Sparse Euclidean Spanners Presented by- Deepak Kumar Gupta(Y6154), Nandan Kumar Dubey(Y6279), Vishal Agrawal(Y6541)
Greedy Algorithms.
Traveling Salesperson Problem
Weighted graphs Example Consider the following graph, where nodes represent cities, and edges show if there is a direct flight between each pair of cities.
Comments We consider in this topic a large class of related problems that deal with proximity of points in the plane. We will: 1.Define some proximity.
CS 206 Introduction to Computer Science II 03 / 27 / 2009 Instructor: Michael Eckmann.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
More Graph Algorithms Minimum Spanning Trees, Shortest Path Algorithms.
Discussion #34 1/17 Discussion #34 Warshall’s and Floyd’s Algorithms.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Applied Discrete Mathematics Week 12: Trees
All Pairs Shortest Paths and Floyd-Warshall Algorithm CLRS 25.2
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Shortest Paths Definitions Single Source Algorithms –Bellman Ford –DAG shortest path algorithm –Dijkstra All Pairs Algorithms –Using Single Source Algorithms.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Shortest Paths Definitions Single Source Algorithms
CS 206 Introduction to Computer Science II 11 / 05 / 2008 Instructor: Michael Eckmann.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
CS 206 Introduction to Computer Science II 03 / 30 / 2009 Instructor: Michael Eckmann.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
The Shortest Path Problem
Google and the Page Rank Algorithm Székely Endre
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
Data Structures and Algorithms Graphs Minimum Spanning Tree PLSD210.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Shortest Path Problem Weight of the graph –Nonnegative real number assigned to the edges connecting to vertices Weighted graphs –When a graph.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Nattee Niparnan. Dijkstra’s Algorithm Graph with Length.
Lectures on Greedy Algorithms and Dynamic Programming
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
SimRank : A Measure of Structural-Context Similarity
Introduction to Graph Theory
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
CS 3343: Analysis of Algorithms Lecture 18: More Examples on Dynamic Programming.
Foundation of Computing Systems
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
CS38 Introduction to Algorithms Lecture 10 May 1, 2014.
MotivationLocating the k largest subsequences: Main ideasResults Problem definitions Problem instance ( k=5 ) Bibliography
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Glen Jeh & Jennifer Widom KDD  Many applications require a measure of “similarity” between objects.  Web search  Shopping Recommendations  Search.
1 GRAPHS – Definitions A graph G = (V, E) consists of –a set of vertices, V, and –a set of edges, E, where each edge is a pair (v,w) s.t. v,w  V Vertices.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
CSE 373: Data Structures and Algorithms Lecture 21: Graphs V 1.
2016/7/2Appendices A and B1 Introduction to Distributed Algorithm Appendix A: Pseudocode Conventions Appendix B: Graphs and Networks Teacher: Chun-Yuan.
Lecture #11 PageRank (II)
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Algorithms and Data Structures Lecture XIII
CS 3343: Analysis of Algorithms
CS 3343: Analysis of Algorithms
CS330 Discussion 6.
Enumerating Distances Using Spanners of Bounded Degree
Greedy Algorithms / Dijkstra’s Algorithm Yin Tat Lee
CSE 373: Data Structures and Algorithms
Chapter 22: Elementary Graph Algorithms I
Algorithms and Data Structures Lecture XIII
Autumn 2015 Lecture 10 Minimum Spanning Trees
Minimum Spanning Tree Algorithms
Honors Track: Competitive Programming & Problem Solving Avoiding negative edges Steven Ge.
Algorithms: Design and Analysis
Chapter 24: Single-Source Shortest Paths
CSE 373 Data Structures and Algorithms
Chapter 24: Single-Source Shortest Paths
PageRank PAGE RANK (determines the importance of webpages based on link structure) Solves a complex system of score equations PageRank is a probability.
CS 3013: DS & Algorithms Shortest Paths.
More Graphs Lecture 19 CS2110 – Fall 2009.
Presentation transcript:

CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 8: TIME SERIES AND GRAPH MINING

Definition of Time Series Motifs 1. Length of the motif 2. Support of the motif 3. Similarity of the Pattern 4. Relative Position of the Pattern Given a length, the most similar/least distant pair of non-overlapping subsequences

Problem Formulation The most similar pair of non- overlapping subsequences time:1000 The closest pair of points in high dimensional space  Optimal algorithm in two dimension : Θ(n log n)  For large dimensionality d, optimum algorithm is effectively Θ(n 2 d)

Lower Bound  If P, Q and R are three points in a d-space d(P,Q)+d(Q,R) ≥ d(P,R) d(P,Q) ≥ |d(Q,R) - d(P,R)|  A third point R provides a very inexpensive lower bound on the true distance  If the lower bound is larger than the existing best, skip d(P, Q) d(P,Q) ≥ |d(Q,R) - d(P,R)| ≥ BestPairDistance PQ R

Circular Projection r r Pick a reference point r Circularly Project all points on a line passing through the reference point r distance Equivalent to computing distance from r and then sorting the points according to distance r

The Order Line r P Q r |d(Q, r) - d(P, r)| d(Q, r) d(P, r) k = 1 k = 2 k = 3 k=1:n-1 Compare every pair having k-1 points in between Compare every pair having k-1 points in between Do k scans of the order line, starting with the 1 st to k th point Do k scans of the order line, starting with the 1 st to k th point BestPairDistance r 0

Correctness If we search for all offset=1,2,…,n-1 then all possible pairs are considered. If we search for all offset=1,2,…,n-1 then all possible pairs are considered. ◦n(n-1)/2 pairs for any offset=k, if none of the k scans needs an actual distance computation then for the rest of the offsets=k+1,…,n-1 no distance computation will be needed. for any offset=k, if none of the k scans needs an actual distance computation then for the rest of the offsets=k+1,…,n-1 no distance computation will be needed. r

Graph Similarity Edit distance/graph isomorphism: ◦Tree Edit Distance Feature extraction ◦IN/out degree ◦Diameter Iterative methods ◦SimRank

Diameter Largest Shortest path in the graph. 1 let dist be a |V| × |V| array of minimum distances initialized to ∞ (infinity) 2 for each vertex v 3 dist[v][v] ← 0 4 for each edge (u,v) 5 dist[u][v] ← w(u,v) // the weight of the edge (u,v) 6 for k from 1 to |V| 7 for i from 1 to |V| 8 for j from 1 to |V| 9 if dist[i][j] > dist[i][k] + dist[k][j] 10 dist[i][j] ← dist[i][k] + dist[k][j] 11 end if

Simrank For a node v in a graph, we denote by I(v) and O(v) the set of in-neighbors and out-neighbors of v, respectively. 1.A solution s( ∗, ∗ ) ∈ [0, 1] to the n 2 SimRank equations always exists and is unique. 2.Symmetric 3.Reflexive

Tree Edit Distance

Tree Edit Distance

Applications Find the most frequent tree structure in a phylogenetic tree. Match a query subtree with a set of XML documents.

Ranking Nodes Page Rank PR(A) is the PageRank of page A, PR(Ti) is the PageRank of pages Ti which link to page A, C(Ti) is the number of outbound links on page Ti and d is a damping factor which can be set between 0 and 1. PR(A) = (1-d) + d (PR(T1)/C(T1) PR(Tn)/C(Tn))

Example PR(A) = PR(C) PR(B) = (PR(A) / 2) PR(C) = (PR(A) / 2 + PR(B)) These equations can easily be solved. We get the following PageRank values for the single pages: PR(A) = 14/13 = PR(B) = 10/13 = PR(C) = 15/13 =

Matlab Script Matlab script for the example in the previous slide syms x y z; eqn1 = x == *z eqn2 = y == *x eqn3 = z == *x + 0.5*y [A,B] = equationsToMatrix([eqn1, eqn2, eqn3], [x, y, z]) X = linsolve(A,B)

HITS: Hyperlink-Induced Topic Search