Download presentation
Presentation is loading. Please wait.
1
Sofus A. Macskassy Fetch Technologies sofmac@fetch.com
Using Graph-based Metrics with Empirical Risk Minimization to Speed Up Active Learning on Networked Data Sofus A. Macskassy Fetch Technologies
2
Context Types of learning with labeled data
Supervised learning is given a training set Semi-supervised (or transductive learning) is given a partially labeled data set Both learning methodologies seeks to induce a model to predict labels on unlabeled instances Active Learning seeks to help the learner induce the best model with the fewest labeled instances Picks the next training instance which will (probably) get the most boost in performance KDD Sofus Macskassy
3
Motivation One well-known and popular active learning strategy iterates through all instances and computes the likely boost in performance if one were to label that particular instance. This is known as empirical risk minimization (ERM). Problem: ERM is very costly (induce and evaluate a new classifier for each class for each unlabeled instance) Proposed solution: Identify a small set of candidate instances on which to compute ERM KDD Sofus Macskassy
4
Key Observation from Prior Work
Prior work on pairing a graph-based semi-supervised learning method with active learning observed that ERM tended to pick instances that were in the center of clusters Can we leverage this observation? KDD Sofus Macskassy
5
Can we improve running time of ERM?
Idea: Keep ERM, but limit ERM computation to the “best” candidates rather than all How? Use clustering and pick among the most central instances in each cluster? Closest in spirit to prior key observation Pick from the top-K most uncertain instances? Prior work on uncertainty sampling Use graph-based metrics to identify central instances and pick among the most central? Global and possibly a more consistent metric KDD Sofus Macskassy
6
Selecting Best Candidates (1)
Uncertainty labeling Use the current model to identify the unlabeled instances it is most uncertain of Uncertainty of vertex v: KDD Sofus Macskassy
7
Selecting Best Candidates (2)
Highest betweenness Betweenness centrality which instance has the most information flow = number of shortest paths between s and t = number paths that go through v Note: Need to compute all shortest-paths – can do this efficiently: O(nE) ~ O(n2) for sparse graphs KDD Sofus Macskassy
8
Selecting Best Candidates (3)
Highest closeness Closeness centrality which instance is “closest” to all other instances Note: Need to compute all pairwise distances – can do this efficiently: O(nE) ~ O(n2) for sparse graphs KDD Sofus Macskassy
9
Selecting Best Candidates (4)
Highest cluster closeness Central nodes in a cluster Cluster the graph Chose most central instances in cluster: = vertices in cluster that v belongs to Clustering details in paper KDD Sofus Macskassy
10
Real world data is not “clean” (from CoRA data)
all edges prob meth. (yellow) theory (green) genetic algs (red) rule learning (blue) neural nets (pink) RL (white) case-based (orange)
11
Empirical Study: Which method is best?
Compare strategies (uncertainty, betweenness) to: Full ERM (current optimal) Random sampling (baseline) Metrics: accuracy and time-to-run Methodology Initialize: Randomly pick 1 instance per class Iteratively pick next instance using each methodology and record accuracy on remaining instances until 100 instances have been picked Repeat 10 times, record average accuracy KDD Sofus Macskassy
12
11 Benchmark Data Sets WebKB [Craven 1998] (8 data sets)
4 computer science websites (sizes: ) Each graph had a 6 class problem and a 2 class problem Industry classification [Bernstein et al. 2003] (2 data sets) 2 sources (prnews , Yahoo!) – sizes: 1798/218 Network = companies that co-occur in financial news stories 12 class problem CoRA [McCallum et al. 2000] (1 data set) 4240 academic papers Network = citations 6 class problem KDD Sofus Macskassy
13
erm vs. betweenness vs. uncertainty
KDD Sofus Macskassy
14
Combine new strategies?
None of the new strategies worked very well by themselves. However, the top ERM pick was often at the top of at least one strategy’s picks… New hybrid approach: Pick the top-K instances from each strategy (uncertainty, cluster closeness, betweenness) Pick the instance with the highest ERM score KDD Sofus Macskassy
15
erm vs hybrid KDD Sofus Macskassy
16
Which strategies were used in the hybrid?
Dataset Cluster Closeness Uncertainty Sampling Betweenness Centrality Number of ties cora 11.00% 12.30% 76.90% 1 industry-pr 9.00% 36.80% 54.40% 2 industry-yh 1.90% 30.40% 68.90% 12 cornell-binary 12.60% 62.30% 36.90% 118 cornell-multi 3.50% 75.50% 29.90% 89 texas-binary 9.80% 40.10% 59.20% 91 texas-multi 16.30% 68.50% 28.50% 113 washington-binary 20.60% 72.50% 28.00% 211 washington-multi 25.70% 72.00% 25.10% 228 wisconsin-binary 8.80% 62.00% 27.40% 282 wisconsin-multi 24.00% 69.80% 25.40% 192 “larger” graphs KDD Sofus Macskassy
17
Conclusions Empirical Risk Minimization is a strong active learning strategy but it is too slow We have shown that we can efficiently identify a small set of candidate nodes that contain the (close to) best instance as defined by ERM If the data is relational, we found that using graph metrics such as clustering, closeness and betweenness can identify a good candidate set Performs comparably to full blown ERM but runs an order of magnitude faster Potential for great speedups in larger graphs KDD Sofus Macskassy
18
Future Work Using graph metrics to identify a candidate set seems like it has a lot of potential but it needs more work Need better understanding of metric behavior and how it relates to ERM For example, why did cluster closeness not do so well in practice? How do we incorporate labeled information into metrics? More work on network metrics needed KDD Sofus Macskassy
19
Thank you Sofus Macskassy KDD Sofus Macskassy
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.