Download presentation
Presentation is loading. Please wait.
Published byJordan Green Modified over 8 years ago
1
Link Distribution in Wikipedia [0324] KwangHee Park
2
Table of contents Introduction Cluster using LDA Experiment Disease, settlement Demo Considering Application
3
Introduction Why focused on Link When someone make new article in Wikipedia, mostly they simply link to other language source or link to similar and related article. After that, that article to be wrote by others Assumption Link terms in the Wikipedia articles is the key terms which can represent specific characteristic of articles
4
Introduction Problem what we want to solve is To analyses latent distribution of set of Target document by Clustering of Link term set Find the Tendency of latent distribution of specific Domain by limiting input document to specific Domain
5
Process Terminology Term set = all of terms in the input documents Topic = Set of term {W i,…,W n } Document = Set of term {W k,W l,…,W n } Document = set of part of topic {T n, T k,…,T m } {Doc : 1 } {T n : 0.4, T k : 0.3,… } Clustering Term set Find latent distribution of each Document Group by domain
6
LDA The clustering techniques The LDA model consists of a fixed number of topics Each topic is modeled as a distribution over words. A document under LDA is modeled as a distribution over topics. Term Set Topic n Topic Topic 3 Topic 2 Topic 1 Doc 1 Doc2 Doc 3
7
Experiment Domain : Disease #Doc : 208 #Link terms : English : 46615, Espanola: 34560, French:, 31747Chinese:, 9286 Korean: 3272 Settlement #Doc : 1328 #Link term : English : 372483, Espanola: 227950, French:150921, Chinese:93227, Korean: 38089 Number of Topic 10,20,30,40,50,75,100,125,150,175,200,225,250 Demo site http://143.248.135.30
8
Considering Application Document Classification Classify domain of target document by calculate similarity between topic distribution of document Usage : Template recommendation,… Domain characteristic # of appearance / # of total Doc Topic number Disease Settlement
9
Template recommendation Starvation Trenton,_New_Jersey Starvation Disease Trenton,_New_Jersey Settlement
10
Thanks
11
Domain characteristic # of appearance /# of total Doc Topic number Disease Settlement
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.