Download presentation
Presentation is loading. Please wait.
Published byElisabeth Peters Modified over 9 years ago
1
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim 2, Yunbo Cao 3, Chin Yew Lin 3 and Chew Lim Tan 1 1 National University of Singapore Text Analysis Conference, November 14-15, 2011 I2R-NUS-MSRA at TAC 2011: Entity Linking 2 Institute for Infocomm Research 3 Microsoft Research Asia
2
A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 2 I2R-NUS-MSRA at TAC 2011: Entity Linking I2R-NUS team at TAC incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011) Acronym Expansion Semantic Features Instance Selection Investigate three algorithms for NIL query clustering Spectral Graph Partitioning (SGP) Hierarchical Agglomerative Clustering (HAC) Latent Dirichlet allocation (LDA) Combination system Offline Combination with the system of MSRA team at KB linking step
3
A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 3 I2R-NUS-MSRA at TAC 2011: Entity Linking I2R-NUS team at TAC incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011) Acronym Expansion Semantic Features Instance Selection Investigate three algorithms for NIL query clustering Spectral Graph Partitioning (SGP) Hierarchical Agglomerative Clustering (HAC) Latent Dirichlet allocation (LDA) Combination system Combine with the system of MSRA team at KB linking step
4
A Two Tier Framework for Context-Aware Service Organization & Discovery Acronym Expansion - Motivation Text Analysis Conference, November 14-15, 2011 4 I2R-NUS-MSRA at TAC 2011: Entity Linking Expanding an acronym from its context to reduce the ambiguities of a name E.g.TSE in Wikipedia refers to 33 entries Vs. Tokyo Stock Exchange is unambiguous.
5
A Two Tier Framework for Context-Aware Service Organization & Discovery Step 1 – Find Expansion Candidates Text Analysis Conference, November 14-15, 2011 5 I2R-NUS-MSRA at TAC 2011: Entity Linking Identifying Candidate Expansions (e.g. for ACM)
6
A Two Tier Framework for Context-Aware Service Organization & Discovery Step 2 – Candidate Expansions Ranking Text Analysis Conference, November 14-15, 2011 6 I2R-NUS-MSRA at TAC 2011: Entity Linking Using SVM classifier to rank the candidates Our SVM based acronym expansion can handle link acronyms and full strings in the different sentences in the articles Number of common characters between acronym and leading character of the expansion. can handle acronym with swapped letters. E.g. Communist Party of China Vs. CCP Sentence distance between acronym and expansion
7
A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 7 I2R-NUS-MSRA at TAC 2011: Entity Linking I2R-NUS team at TAC incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011) Acronym Expansion Semantic Features Instance Selection Investigate three algorithms for NIL query clustering Spectral Graph Partitioning (SGP) Hierarchical Agglomerative Clustering (HAC) Latent Dirichlet allocation (LDA) Combination system Combine with the system of MSRA team at KB linking step
8
A Two Tier Framework for Context-Aware Service Organization & Discovery Related Work on Context Similarity The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 8 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection Zhang et al., 2010; Zheng et al., 2010; Dredze et al., 2010 Term Matching However, 1) Michael Jordan is a leading researcher in machine learning and artificial intelligence. 2) Michael Jordan is currently a full professor at the University of California, Berkeley. 3) Michael Jordan (born February, 1963) is a former American professional basketball player. 4) Michael Jordan wins NBA MVP of 91-92 season. No Term Match
9
A Two Tier Framework for Context-Aware Service Organization & Discovery Our System - A Wikipedia-LDA model The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 9 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection 1) Michael Jordan is a leading researcher in machine learning and artificial intelligence. 2) Michael Jordan is currently a full professor at the University of California, Berkeley. 3) Michael Jordan (born February, 1963) is a former American professional basketball player. 4) Michael Jordan wins NBA MVP of 91-92 season. Topic: Basketball Topic: Science
10
A Two Tier Framework for Context-Aware Service Organization & Discovery Wikipedia – LDA Model The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 10 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection P( word i | category j ) Document P( category i | document j ) Document … …
11
A Two Tier Framework for Context-Aware Service Organization & Discovery Wikipedia – LDA Model The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 11 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection 1) Michael Jordan is a leading researcher in machine learning and artificial intelligence. 2) Michael Jordan is currently a full professor at the University of California, Berkeley. 3) Michael Jordan (born February, 1963) is a former American professional basketball player. 4) Michael Jordan wins NBA MVP of 91-92 season.
12
A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 12 I2R-NUS-MSRA at TAC 2011: Entity Linking I2R-NUS team at TAC incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011) Acronym Expansion Semantic Features Instance Selection Investigate three algorithms for NIL query clustering Spectral Graph Partitioning (SGP) Hierarchical Agglomerative Clustering (HAC) Latent Dirichlet allocation (LDA) Combination system Combine with the system of MSRA team at KB linking step
13
A Two Tier Framework for Context-Aware Service Organization & Discovery Related Work The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 13 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection Vector Space Model Difficult to combine bag of words (BOW) with other features. Performance needs to be improved Supervised Approaches Using manual annotated training instances Dredze et al., 2010; Zheng et al., 2010 Using automatically generated training instances Zhang et al. 2010
14
A Two Tier Framework for Context-Aware Service Organization & Discovery Related Work The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 14 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection Auto-generate training instance (Zhang et al., 2010) (News Article) Obama Campaign Drops The George W. Bush Talking Point …
15
A Two Tier Framework for Context-Aware Service Organization & Discovery Related Work The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 15 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection From “George W. Bush” articles No positive instances for “George H. W. Bush” “George P. Bush” and “George Washington Bush” generated No negative instances for “George W. Bush” generated Such positive negative training instance distributions may not be the same with the original ambiguous cases in the raw text collection The distribution of the unambiguous mentions may not be the same in test data
16
A Two Tier Framework for Context-Aware Service Organization & Discovery The Approach in Our System The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 16 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection An instance selection approach Select an informative, representative, and diverse subset from the auto-generated data set. Reduce the effect of the distribution differences
17
A Two Tier Framework for Context-Aware Service Organization & Discovery Instance Selection The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 17 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection Small Initial data set training SVM Classifier Test on auto- generated data set 2-D data set Illustration SVM hyperplane Select Informative, representative and diverse Instances Add these selected instances to Initial data set
18
A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 18 I2R-NUS-MSRA at TAC 2011: Entity Linking I2R-NUS team at TAC incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011) Acronym Expansion Semantic Features Instance Selection Investigate three algorithms for NIL query clustering Spectral Graph Partitioning (SGP) Hierarchical Agglomerative Clustering (HAC) Latent Dirichlet allocation (LDA) Combination system Combine with the system of MSRA team at KB linking step
19
A Two Tier Framework for Context-Aware Service Organization & Discovery Advantages over other clustering techniques Globally optimized results Efficient in time and space Generally, produce a better result Success in many areas Image segmentation Gene expression clustering Spectral Clustering
20
A Two Tier Framework for Context-Aware Service Organization & Discovery Spectral Clustering A = QQ -1 Eigen Decomposition on Graph Laplacian Dimensionality Reduction (Luxburg, 2006) George W. Bush George H.W. Bush
21
A Two Tier Framework for Context-Aware Service Organization & Discovery Hierarchical Agglomerative Clustering Text Analysis Conference, November 14-15, 2011 21 I2R-NUS-MSRA at TAC 2011: Entity Linking Convert a doc into a feature vector: Wikipedia concepts, bag-of-words and named entities. Estimate the weight of each feature using Query Relevance Weighting Model (Long and Shi, 2010): this model shows good performance in Web People Search In our work, original query name, its Wikipedia redirected names and its coreference chain mentions are all considered as appearances of the query name in the text. Similarity scores : cosine similarity and overlap similarity.
22
A Two Tier Framework for Context-Aware Service Organization & Discovery Hierarchical Agglomerative Clustering Text Analysis Conference, November 14-15, 2011 22 I2R-NUS-MSRA at TAC 2011: Entity Linking Docs referred to the same entity are clustered according to doc pair-wise similarity scores. Start with singleton: each doc is a cluster If there are two docs D and D ' in clusters C i and C j respectively: Two clusters C i and C j are merged to form a new cluster C ij if Sim(D,D ' ) > γ Calculate the similarity between the new cluster C ij and all remaining clusters γ = 0.25
23
A Two Tier Framework for Context-Aware Service Organization & Discovery Latent Dirichlet Allocation (LDA) Text Analysis Conference, November 14-15, 2011 23 I2R-NUS-MSRA at TAC 2011: Entity Linking LDA has been applied to many NLP tasks such as: summarization and text classification In our approach, the learned topics can represent the underlying entities of the ambiguous names Generative story:
24
A Two Tier Framework for Context-Aware Service Organization & Discovery Text Analysis Conference, November 14-15, 2011 24 I2R-NUS-MSRA at TAC 2011: Entity Linking Three classes SVM classifier to decide which system to be trusted Features: scores given by the three systems Three Clustering Systems Combination Combine with the system of MSRA team at KB linking step Binary SVM classifier to decide which system to be trusted Features: scores given by the two systems
25
A Two Tier Framework for Context-Aware Service Organization & Discovery Experiment for Three Clustering Algorithms Text Analysis Conference, November 14-15, 2011 25 I2R-NUS-MSRA at TAC 2011: Entity Linking AlgorithmsEval 09Eval 10Eval 10 + SGP0.7450.9540.809 HAC0.6660.9500.789 LDA0.7820.9810.841 Combination0.7950.9820.852
26
A Two Tier Framework for Context-Aware Service Organization & Discovery Submissions Text Analysis Conference, November 14-15, 2011 26 I2R-NUS-MSRA at TAC 2011: Entity Linking SystemsAcc.PrecisionRecallF1 Full0.8630.8150.8490.831 Partial0.8440.7970.8290.813 Highest---0.846 Median---0.716
27
A Two Tier Framework for Context-Aware Service Organization & Discovery Conclusion Text Analysis Conference, November 14-15, 2011 27 I2R-NUS-MSRA at TAC 2011: Entity Linking Incorporate the new technologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011) Acronym Expansion Semantic Features Instance Selection Investigate three algorithms for NIL query clustering Spectral Graph Partitioning (SGP) Hierarchical Agglomerative Clustering (HAC) Latent Dirichlet allocation (LDA)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.