Download presentation
Presentation is loading. Please wait.
1
08/22/2004 MRDM 2004 Workshop 1 Link Mining Lise Getoor University of Maryland, College Park joint work with Indrajit Bhattacharya, Qing Lu and Prithviraj Sen
2
08/22/2004 MRDM 2004 Workshop 2 Roadmap Intro to Link Mining –Link Mining Tasks –Link Mining Challenges Some Current Projects –Link-based Classification Link-based classification using a variety of link descriptions Link-based classification using labeled and unlabeled data –Link-based Clustering Entity detection Group Detection Conclusion
3
08/22/2004 MRDM 2004 Workshop 3 Link Mining Traditional machine learning and data mining approaches assume: –A random sample of homogeneous objects from single relation Real world data sets: –Multi-relational, heterogeneous and semi-structured Link Mining –newly emerging research area at the intersection of research in social network and link analysis, hypertext and web mining, graph mining, relational learning and inductive logic programming
4
08/22/2004 MRDM 2004 Workshop 4 Linked Data Heterogeneous, multi-relational data represented as a graph or network –Nodes are objects May have different kinds of objects Objects have attributes Objects may have labels or classes –Edges are links May have different kinds of links Links may have attributes Links may be directed, are not required to be binary
5
08/22/2004 MRDM 2004 Workshop 5 Sample Domains web data (web) bibliographic data (cite) epidimiological data (epi) communication data (comm) customer networks (cust) collaborative filtering problems (cf) trust networks (trust) biological data (bio)
6
08/22/2004 MRDM 2004 Workshop 6 Link Mining Tasks Link-based Object Classification Object Type Prediction Link Type Prediction Predicting Link Existence Link Cardinality Estimation Object Consolidation Group Detection Subgraph Discovery Metadata Mining
7
08/22/2004 MRDM 2004 Workshop 7 Link-based Object Classification Predicting the category of an object based on its attributes and its links and attributes of linked objects web: Predict the category of a web page, based on words that occur on the page, links between pages, anchor text, html tags, etc. cite: Predict the topic of a paper, based on word occurrence, citations, co-citations epi: Predict disease type based on characteristics of the patients infected by the disease
8
08/22/2004 MRDM 2004 Workshop 8 Object Class Prediction Predicting the type of an object based on its attributes and its links and attributes of linked objects comm: Predict whether a communication contact is by email, phone call or mail. cite: Predict the venue type of a publication (conference, journal, workshop)
9
08/22/2004 MRDM 2004 Workshop 9 Link Type Classification Predicting type or purpose of link based on properties of the participating objects web: predict advertising link or navigational link; predict an advisor-advisee relationship epi: predicting whether contact is familial, co-worker or acquaintance
10
08/22/2004 MRDM 2004 Workshop 10 Predicting Link Existence Predicting whether a link exists between two objects web: predict whether there will be a link between two pages cite: predicting whether a paper will cite another paper epi: predicting who a patient’s contacts are
11
08/22/2004 MRDM 2004 Workshop 11 Link Cardinality Estimation I Predicting the number of links to an object web: predict the authoratativeness of a page based on the number of in-links; identifying hubs based on the number of out- links cite: predicting the impact of a paper based on the number of citations epi: predicting the number of people that will be infected based on the infectiousness of a disease.
12
08/22/2004 MRDM 2004 Workshop 12 Link Cardinality Estimation II Predicting the number of objects reached along a path from an object Important for estimating the number of objects that will be returned by a query web: predicting number of pages retrieved by crawling a site cite: predicting the number of citations of a particular author in a specific journal
13
08/22/2004 MRDM 2004 Workshop 13 Object Consolidation Predicting when two objects are the same, based on their attributes and their links aka: record linkage, duplicate elimination, identity uncertainty web: predict when two sites are mirrors of each other. cite: predicting when two citations are referring to the same paper. epi: predicting when two disease strains are the same bio: learning when two names refer to the same protein
14
08/22/2004 MRDM 2004 Workshop 14 Group Detection Predicting when a set of entities belong to the same group based on clustering both object attribute values and link structure web – identifying communities cite – identifying research communities
15
08/22/2004 MRDM 2004 Workshop 15 Subgraph Identification Find characteristic subgraphs Focus of graph-based data mining (Cook & Holder, Inokuchi, Washio & Motoda, Kuramochi & Karypis, Yan & Han) bio – protein structure discovery comm – legitimate vs. illegitimate groups chem – chemical substructure discovery
16
08/22/2004 MRDM 2004 Workshop 16 Metadata Mining Schema mapping, schema discovery, schema reformulation cite – matching between two bibliographic sources web - discovering schema from unstructured or semi- structured data bio – mapping between two medical ontologies
17
08/22/2004 MRDM 2004 Workshop 17 Link Mining Tasks Link-based Object Classification Object Type Prediction Link Type Prediction Predicting Link Existence Link Cardinality Estimation Object Consolidation Group Detection Subgraph Discovery Metadata Mining
18
08/22/2004 MRDM 2004 Workshop 18 Link Mining Challenges Logical vs. Statistical dependencies Feature construction Instances vs. Classes Collective Classification Collective Consolidation Effective Use of Labeled & Unlabeled Data Link Prediction Closed vs. Open World Challenges common to any link-based statistical model (Bayesian Logic Programs, Conditional Random Fields, Probabilistic Relational Models, Relational Markov Networks, Relational Probability Trees, Stochastic Logic Programming to name a few)
19
08/22/2004 MRDM 2004 Workshop 19 Logical vs. Statistical Dependence Coherently handling two types of dependence structures: –Link structure - the logical relationships between objects –Probabilistic dependence - statistical relationships between attributes Challenge: statistical models that support rich logical relationships Model search complicated by the fact that attributes can depend on arbitrarily linked attributes -- issue: how to search this huge space
20
08/22/2004 MRDM 2004 Workshop 20 Model Search P2P2 P A1A1 P3P3 P1P1 ? A1A1 P2P2 P3P3 P1P1 I1I1 I1I1
21
08/22/2004 MRDM 2004 Workshop 21 Feature Construction In many cases, objects are linked to a set of objects. To construct a single feature from this set of objects, we may either use: –Aggregation –Selection
22
08/22/2004 MRDM 2004 Workshop 22 P2P2 P1P1 P3P3 Aggregation I1I1 mode P2P2 P3P3 P1P1 P A1A1 ? P2P2 P1P1 I2I2 P9P9 P4P4 P5P5 P A2A2 ? P6P6 P8P8 P7P7 P
23
08/22/2004 MRDM 2004 Workshop 23 P2P2 P1P1 P3P3 Selection I1I1 P2P2 P3P3 P1P1 P A1A1 ? P2P2 P3P3 P
24
08/22/2004 MRDM 2004 Workshop 24 Individuals vs. Classes Does model refer –explicitly to individuals –classes or generic categories of individuals On one hand, we’d like to be able to model that a connection to a particular individual may be highly predictive On the other hand, we’d like our models to generalize to new situations, with different individuals
25
08/22/2004 MRDM 2004 Workshop 25 Instance-based Dependencies A1A1 P3P3 I1I1 Papers that cite P 3 are likely to be P3P3
26
08/22/2004 MRDM 2004 Workshop 26 Class-based Dependencies A1A1 ? I1I1 Papers that cite are likely to be ?
27
08/22/2004 MRDM 2004 Workshop 27 Collective classification Using a link-based statistical model for classification Inference using learned model is complicated by the fact that there is correlation between the object labels
28
08/22/2004 MRDM 2004 Workshop 28 Collective consolidation Using a link-based statistical model for object consolidation Consolidation decisions should not be made independently
29
08/22/2004 MRDM 2004 Workshop 29 Labeled & Unlabeled Data In link-based domains, unlabeled data provide three sources of information: –Helps us infer object attribute distribution –Links between unlabeled data allow us to make use of attributes of linked objects –Links between labeled data and unlabeled data (training data and test data) help us make more accurate inferences
30
08/22/2004 MRDM 2004 Workshop 30 Link Prior Probability The prior probability of any particular link is typically extraordinarily low For medium-sized data sets, we have had success with building explicit models of link existence It may be more effective to model links at higher level--required for large data sets!
31
08/22/2004 MRDM 2004 Workshop 31 Closed World vs. Open World The majority of SRL approaches make a closed world assumption, which assumes that we know all the potential entities in the domain In many cases, this is unrealistic Work by Milch, Marti, Russell on BLOG
32
08/22/2004 MRDM 2004 Workshop 32 Link Mining Summary Link Mining Tasks –Link-based Object Classification –Object Type Prediction –Link Type Prediction –Predicting Link Existence Link Mining Challenges –Logical vs. Statistical dependencies –Feature construction –Instances vs. Classes –Collective Classification –Link Cardinality Estimation –Object Consolidation –Group Detection –Subgraph Discovery –Metadata Mining –Collective Consolidation –Effective Use of Labeled & Unlabeled Data –Link Prediction –Closed vs. Open World
33
08/22/2004 MRDM 2004 Workshop 33 Roadmap Intro to Link Mining –Link Mining Tasks –Link Mining Challenges Some Current Projects Link-based Classification work with Qing Lu and Prithviraj Sen Link-based classification using a variety of link descriptions Link-based classification using labeled and unlabeled data –Link-based Clustering Entity detection Group Detection Conclusion
34
08/22/2004 MRDM 2004 Workshop 34 Object Classification Traditional Object Classification –Assume objects sampled from a single relation –Object Attributes (OA) X4X4 X3X3 X5X5 X1X1 X2X2
35
08/22/2004 MRDM 2004 Workshop 35 Object Classification with Linked Data Traditional Object Classification –Assume objects sampled from a single relation –Object Attributes (OA) X4X4 X3X3 X5X5 X1X1 X2X2 Linked Data Links among objects Represented as a graph
36
08/22/2004 MRDM 2004 Workshop 36 Link-based Object Classification Predicting the category of an object based on its attributes and its links and attributes of linked objects Citation domain: Predict the topic of a paper, based on word occurrence, citations and co-citations
37
08/22/2004 MRDM 2004 Workshop 37 Related Work: Link-based Classification Hypertext Classification using Links –Class labels of linked objects Soumen Chakrabarti (1998) Oh, et al. (1999) –Unique document ID, Popescul et al. (2002) –Regularities, Yang et al. (2002) Use of Unlabeled Data Co-training, Blum and Mitchel (1998) EM-algorithm, Nigam et al. (2000) Systematical investigation of EM and Co-training, Ghani (2001) TSVM, Joachims (1999)
38
08/22/2004 MRDM 2004 Workshop 38 Our Approach Link-based models –Integrate link features with object attributes using logistic regression –Investigate use of labeled and unlabeled data for link-based classification
39
08/22/2004 MRDM 2004 Workshop 39 Features Object Attributes –Notation: OA(X) Link Descriptions –Notation: LD(X) –Statistics computed from linked objects –Computed separately for each of: In-Links(X) Out-Links(X) Co-In(X) Co-Out(X) –Three types of Link Descriptions: Mode, Binary, Count
40
08/22/2004 MRDM 2004 Workshop 40 X Link Descriptions Categories
41
08/22/2004 MRDM 2004 Workshop 41 X Link Descriptions Categories In-Links(X) mode:
42
08/22/2004 MRDM 2004 Workshop 42 X Link Descriptions Categories In-Links(X) Out-Links(X) mode:
43
08/22/2004 MRDM 2004 Workshop 43 X Link Descriptions Categories In-Links(X) Out-Links(X) mode: CO(X) mode:
44
08/22/2004 MRDM 2004 Workshop 44 X Link Descriptions Categories In-Links(X) Out-Links(X) mode: CO(X) mode: CI(X) mode:
45
08/22/2004 MRDM 2004 Workshop 45 X Link Descriptions Categories In-Links(X) Out-Links(X) mode: CO(X) binary: (1,1,1) binary: (1,1,0) mode: CI(X) binary: (1,0,0) mode:
46
08/22/2004 MRDM 2004 Workshop 46 X Link Descriptions Categories In-Links(X) Out-Links(X) mode: CO(X) binary: (1,1,1) binary: (1,1,0) count: (1,2,0) count: (2,1,0) mode: CI(X) binary: (1,0,0) count: (2,0,0) mode: count: (3,1,1)
47
08/22/2004 MRDM 2004 Workshop 47 Predictive Model for Classification A structured logistic regression –Compute P(c | OA(X)) and P(c | LD(X)) separately using separate logistic regression models –where OA(X) are the object attributes and LD f (X) are the link features
48
08/22/2004 MRDM 2004 Workshop 48 Prediction category set { } P5P5 P4P4 P3P3 P2P2 P1P1 P5P5 P4P4 P3P3 P2P2 P1P1 Step 1: Bootstrap using object attributes only
49
08/22/2004 MRDM 2004 Workshop 49 Prediction P5P5 P3P3 P2P2 P1P1 P5P5 P4P4 P3P3 P2P2 P1P1 Step 2: Iteratively update the category of each object, based on linked object’s categories P4P4 P4P4
50
08/22/2004 MRDM 2004 Workshop 50 Data Sets Data Setpaperscitationscategoriesvocabulary CoraI3181618571400 CoraII330011794103174 CiteSeer3600752263000
51
08/22/2004 MRDM 2004 Workshop 51 Experiment I
52
08/22/2004 MRDM 2004 Workshop 52 Experiment II
53
08/22/2004 MRDM 2004 Workshop 53 Experiment III Setup –20% data as test data –remaining data: 20%, 40%, 60%, 80% labeled data Link-based classification using labeled and unlabeled data –Labeled-only: learn model using only labeled data –Labeled and Unlabeled: learn model using both labeled and unlabeled data
54
08/22/2004 MRDM 2004 Workshop 54 Learning with Labeled and Unlabeled Data
55
08/22/2004 MRDM 2004 Workshop 55 Ordering Strategies
56
08/22/2004 MRDM 2004 Workshop 56 LBC: Summary Variety of ways of describing link neighborhoods –Mode, Binary, Count –In-links, Out-links, CI-links and CO-links In link-based classification, unlabeled data provide useful information: –Helps us infer object attribute distribution –Links between unlabeled data allow us to make use of attributes of linked objects –Links between labeled data and unlabeled data (training data and test data) help us make more accurate inferences Link-based Challenges addressed: –Feature construction –Collective classification –Use of labeled and unlabeled data
57
08/22/2004 MRDM 2004 Workshop 57 Roadmap Intro to Link Mining –Link Mining Tasks –Link Mining Challenges Some Current Projects –Link-based Classification Link-based classification using a variety of link descriptions Link-based classification using labeled and unlabeled data Link-based Clustering work with Indrajit Bhattacharya Entity detection Group Detection Conclusion
58
08/22/2004 MRDM 2004 Workshop 58 Deduplication and Group Detection Object Consolidation –Observations come with noise or multiple representations Multiple entries for the same person in a customer database Group Detection –Identify groups of similar entities Group authors by research interest
59
08/22/2004 MRDM 2004 Workshop 59 Terminology Alfred V Aho Entities Alfred AhoAV AhoAho, A. V. References Links Alfred Aho, John Hopcroft, Jeffrey Ullman AV Aho, BW Kernighan, PJ Weinberger Entity Groups G1 (Programming Languages) G2 (Databases) G3 (Algorithms)
60
08/22/2004 MRDM 2004 Workshop 60 The two problems need to be addressed together –Goldberg and Senator, KDD 95 Deduplication and Group Detection DB DB ´ Consolidation & Link Formation KDD Tools Knowledge
61
08/22/2004 MRDM 2004 Workshop 61 Related Work: Deduplication Statistics –Blocking, Newcombe –“Match/non-match”, Fellegi & Sunter –EM with match variable, Winkler AI, Machine Learning –String similarity measures, Monge & Elkan; Cohen –Object Consolidation, ejada et al. –Learning string distances, Bilenko & Mooney –Active learning, Sarawagi et al –Coreference resolution, Mccallum & Wellner –Identity uncertainty, Pasula et al Databases –Efficient record linkage, Hernandez & Stolfo, Monge & Elkan –Use of co-occurrence, Chaudhuri et al, Ananthakrishna et al
62
08/22/2004 MRDM 2004 Workshop 62 Related Work: Group Detection Hypertext Mining –Eigen decomposition for ranking, Brin & Page; Kleinberg –Finding web communities, Gibson et al –“Missing link”, Cohn & Hofmann Probabilistic Link Modeling –Generative model for links, Getoor et al.; Kubica et al Text Retrieval –Spectral techniques, Ng, Jordan & Weiss; Dhillon et al; Ding et al –Probabilistic modeling with latent variables, Hofmann; Blei, Ng & Jordan; Rosen-Zvi et al
63
08/22/2004 MRDM 2004 Workshop 63 Paper Resolution Problem Example –R. Agrawal, R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB-94, 1994. –Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, September 1994. Traditionally, string similarity
64
08/22/2004 MRDM 2004 Workshop 64 Author Resolution Problem Given a set of papers, determine the set of authors First and middle names vary –Check common name transforms Problems remain –How about ‘J. Smith’ and ‘John Smith’? –Do two instances of ‘J. Smith’ refer to the same author? Use co-author relationships
65
08/22/2004 MRDM 2004 Workshop 65 Author Deduplication: Example Alfred V Aho Jeffrey D Ullman S C Johnson A V Aho J D Ullman Alfred V Aho Jeffrey D Ullman S C Johnson A V Aho J D Ullman P1: Code generation for machines with multiregister operations P2: The universality of database languages P3: Optimal partial-match retrieval when fields are independently specified P4: Code generation for expressions with common subexpressions Aho P1 Aho P3 Aho P2Aho P4Ullman P1Ullman P2Ullman P3Ullman P4Johnson P1Johnson P4 Aho P1,P2,P3,P4 Ullman P1,P2,P3,P4 Johnson P1,P4
66
08/22/2004 MRDM 2004 Workshop 66 Deduplication Using Links Cluster similar author references into duplicates –Problem: define appropriate distance measure Weighted combination of attribute and link distances
67
08/22/2004 MRDM 2004 Workshop 67 Link Distances for Deduplication To compare two author references, compare all their links/relations Distance between two links –How many duplicates do they share? –d(l1,l2) = 1 – |duplicates(l1,l2)| / max(|l1|,|l2|) Distance between Link Sets: Link detail distance –d(l,L) = minl’ in L d(l,l’) –d detail (L1,L2) = avg[avgl in L1 d(l,L1), avgl in L2 d(l,L2)] Distance between Link Sets: Link summary distance –Group detail distance is costly because of pair-wise comparison between group sets –Maintain group summary: all unique references in the group set –d summ (L1,L2)=d(sum1,sum2)
68
08/22/2004 MRDM 2004 Workshop 68 Group Detection: Example A. Aho Entities Links Alfred Aho, John Hopcroft, Jeffrey Ullman, Data Structures and Algorithms AV Aho, R Sethi, J D Ullman, Compilers: Principles, Techniques and Tools Groups PLDatabases Algorithms J. Hopcroft J. UllmanR. Sethi Problem: Discover the hidden set of groups and mapping from entities to groups
69
08/22/2004 MRDM 2004 Workshop 69 Group Detection Using Links Cluster similar links into groups Alfred Aho, John Hopcroft, Jeffrey Ullman, Design and Analysis of Computer Algorithms Alfred Aho, John Hopcroft, Jeffrey Ullman, Data Structures and Algorithms AV Aho, R Sethi, J D Ullman, Compilers: Principles, Techniques and Tools Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger, The AWK Programming Language Alfred V. Aho, Jeffrey D. Ullman, Principles of Compiler Design Algorithms PL & Compilers
70
08/22/2004 MRDM 2004 Workshop 70 Define cluster distance considering generation probability of observed links given model M Probabilistic Distance d LP of two clusters –What is the change in probability of observed data if the two clusters are merged? Group summary distance d summ –d LP is lower for higher overlap in entity sets of two clusters –But entity sets are the link summaries –Use summary distance as approximation of d LP Link Distances for Group Detection
71
08/22/2004 MRDM 2004 Workshop 71 Initialize clusters –Deduplication: using attribute distance only Greedily pick best candidate using d(c i,c j ) –Compare different distance measures Update link sets / summaries and cluster distances and continue Select candidates for merge using threshold Deduplication / Group Detection Using Clustering Each cluster contains currently known duplicates / links from the same group
72
08/22/2004 MRDM 2004 Workshop 72 Evaluation: Data Generator Parameters Structural Parameters –Number of author-entities and groups –Degree of overlap among the groups Noise parameters –For deduplication data, generate noisy attributes for each entity in the links Generative Parameters –Number of links –Mean size of links
73
08/22/2004 MRDM 2004 Workshop 73 Evaluation: Metric Diversity of clusters –How many different entities does a cluster contain? –Links from how many different groups does a cluster contain? Dispersion of entities/groups –How many different clusters is an entity spread over? –How many different clusters are links from the same group spread over? Measure average dispersion and diversity
74
08/22/2004 MRDM 2004 Workshop 74 Results: Deduplication Algorithm Parameters: mixing weight and threshold –Superior results with link distances Detailed results in DMKD ’04 paper, Iterative Record Linkage for Cleaning and Integration, Indrajit Bhattacharya and Lise Getoor.
75
08/22/2004 MRDM 2004 Workshop 75 Results: Distance Measures Comparison of attribute distance, group summary distance and group detail distance
76
08/22/2004 MRDM 2004 Workshop 76 Deduplicating Real Data Machine learning papers from Citeseer –Citations hand-matched by Steve Lawrence et al –Author references hand-matched by Culotta & McCallum –1504 paper citations –2892 author references –1167 author entities identified Initial results –Link summary improves entity dispersion over attribute clustering –Discovered labeling errors that are hard to identify considering attributes only
77
08/22/2004 MRDM 2004 Workshop 77 Deduplicating Real data 174 | 610 | barron_a_r | A.R. Barron 175 | 610 | barron_r_l | R.L. Barron Not the same entity –Barron, A.R., Barron, R.L., 1988. Statistical learning networks: a unifying view. In: 1988 Symposium on the Interface: Statistics and Computer Science, pp. 192-203.
78
08/22/2004 MRDM 2004 Workshop 78 Deduplicating Real data 2097 | 8460 | ramakrishnan_c_r | C. R. Ramakrishnan 2098 | 8460 | ramakrishnan_i_v | I. V. Ramakrishnan Not the same entity –A Symbolic Constraint Solving Framework for Analysis of Logic Programs, C.R. Ramakrishnan, I.V. Ramakrishnan and R. Sekar, ACM Conference on Partial Evaluation and Semantics based Program Manipulation (PEPM), June 1995
79
08/22/2004 MRDM 2004 Workshop 79 Deduplicating Real data Parse Error –1734 | 7010 | minton_andrew_b_philips_steven | Andrew B. Philips Steven Minton Same entity as –1735 | 7020 | minton_s | Minton, S.
80
08/22/2004 MRDM 2004 Workshop 80 Deduplicating Real data Parse Error –2083 | 8370 | raedt_2_l_de | 2. L. De Raedt Same entity as –2085 | 8380 | raedt_l_de | L. De Raedt
81
08/22/2004 MRDM 2004 Workshop 81 Deduplication and Group Detection Summary Study of novel distance measures for clustering similar entities in linked environments Unified generative model for evaluating the related problems Link-based clustering shows superior performance over attribute clustering for both tasks on synthetic data Link-based Challenges addressed: –Collective consolidation
82
08/22/2004 MRDM 2004 Workshop 82 Roadmap Intro to Link Mining –Link Mining Tasks –Link Mining Challenges Some Current Projects –Link-based Classification work with Qing Lu and Prithviraj Sen Link-based classification using a variety of link descriptions Link-based classification using labeled and unlabeled data –Link-based Clustering Entity detection Group Detection Conclusion
83
08/22/2004 MRDM 2004 Workshop 83 Link Mining Summary Link Mining Tasks –Link-based Object Classification –Object Type Prediction –Link Type Prediction –Predicting Link Existence Link Mining Challenges –Logical vs. Statistical dependencies –Feature construction –Instances vs. Classes –Collective Classification –Link Cardinality Estimation –Object Consolidation –Group Detection –Subgraph Discovery –Metadata Mining –Collective Consolidation –Effective Use of Labeled & Unlabeled Data –Link Prediction –Closed vs. Open World
84
08/22/2004 MRDM 2004 Workshop 84 References Deduplication and Group Detection Using Links Indrajit Bhattacharya and Lise Getoor. 10th ACM SIGKDD Workshop on Link Analysis and Group Detection, Seattle, WA, August 2004. Word Sense Disambiguation using Probabilistic Models, Indrajit Bhattacharya, Lise Getoor and Yoshua Bengio. 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, SP, July 2004. Iterative Record Linkage for Cleaning and Integration Indrajit Bhattacharya and Lise Getoor. 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Paris, FR, June 2004. Using the Structure of Web Sites for Automatic Segmentation of Tables, Kristina Lerman, Lise Getoor, Steve Minton and Craig Knoblock. Proceedings of ACM-SIGMOD 2004 International Conference on Management of Data, Paris, FR, June 2004. Structure Discovery using Statistical Relational Learning, Lise Getoor. Data Engineering Bulletin, vol. 26, No. 3, 2003. Link Mining: A New Data Mining Challenge, Lise Getoor. SIGKDD Explorations, volume 5, issue 1, 2003. Iterative Deduplication, I. Bhattacharya, L. Getoor. Link Mining: A New Data Mining Challenge, L. Getoor. SIGKDD Explorations, volume 4, issue 2, 2003. Link-based Classification, Q. Lu and L. Getoor, International Conference on Machine Learning, August, 2003 Labeled and Unlabeled Data for Link-based Classification, Q. Lu and L. Getoor. ICML workshop on The Continuum from Labeled to Unlabeled Data, August, 2003. Link-based Classification for Text Classification and Mining, Q. Lu and L. Getoor. IJCAI workshop on Text Mining and Link Analysis IJCAI 03 Workshop: Learning Statistical Models from Relational Data SRL 2003, http://kdl.cs.umass.edu/srl2003SRL 2003 ICML 04 Workshop: Statistical Relational Learning and Connections to Other Fields, SRL 2004, http://www.cs.umd.edu/srl2004SRL 2004 Supported by
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.