Download presentation
Presentation is loading. Please wait.
Published byBaldric Blair Modified over 9 years ago
1
Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4
2
Agenda Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion
3
Introduction With the rapid development of web 2.0, the inte rnet has brought a large amount of resources, s uch as blogs, twitter, and encyclopedia. These resources contain a wealth of informatio n, which can be applied to a variety of fields in i nformation processing to improve the service q uality, but it is too deficiency to use tradition hu man professional to dispose the information.
4
Introduction In NLP, computer programs face several tasks t hat require human-level intelligence, or the prog rams should be endowed with the ability of lang uage understanding. One core of the issues is how to automatically o btain knowledge and effectively use them to ac hieve semantic analysis and computation
5
Introduction Tagging has recently emerged as a popular wa y to organize user generated content for Web 2. 0 applications, such as blogs and bookmarks. I n blogs, users can assign one or more tags for each blog. Usually, these tags can reflect the c oncerned subjects of the contents. Tags can be seen as labeled meta-information about the con tent, and they are beneficial for knowledge mini ng from blogs.
6
Introduction In this paper, we extend the Tag topic model (T TM) 1 by crystallized HDP as prior distribution. We assume that an author is clear in his mind t hat the content will contains which aspects befo re he writes a blog and for each aspect he will c hoose a tag to describe it.
7
Agenda Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion
8
LDA Generative model
9
Tag-Topic Model Basic ideal: each document with a mixture of ta gs, each tag can be viewed as a multinomial dis tribution over topics and each topic is associate d with a multinomial distribution over words.
10
Tag-Topic Model
11
Agenda Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion
12
THDP The THDP topic model draws upon the strength s of the two models (TTM, HDP); using the topi c-based representation to model both the conte nt of documents and the tag. As in the THDP m odel, a group of tags, T d, indicate the mainly pu rpose of the blog. For each word in the docume nt a Tag is chosen uniformly at random. Then, as in the topic model, a topic is chosen from a d istribution over topics specific to that tag, and th e word is generated form the chosen topic.
13
THDP
14
Given an underlying measure H on multinomial probabil ity vectors, we select a random measure G0 which prov ides a countable infinite collection of multinomial probab ility vectors; these can be viewed as the set of all topics that can be used in a given corpus. For the lth tag in the jth document in the corpus we sample Gj using G0 as a base measure; this selects specific subsets of topics to be used in tag l in document j. From Gj we then generat e a document by repeatedly (1) choose a tag with the e qual probability from the tag sets associate with the doc ument and (2) sampling specific multinomial probability vectors zji from Gj and sampling words wji with probabili ties zji. The overlap among the random measures Gj im plement the sharing of topics among documents.
15
Agenda Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion
16
Experiments and evaluation DataSet The dataset used in the experiment is from the blog c orpus during October 2011 and December 2012, which is constructed by National Language Resources Monito ring and Research Center, Network Media Branch. Afte r filtering out blog texts with no tags or containing less t han 100 words and some preprocessing such as remov e stop words and extremely common words, filter out th e non-nominal words and retain only the nouns or nomi nal phrases. The dataset containing the tags and conte xt of N = 927 blog, with W = 10438 words in the vocabul ary and T = 558 tags.
17
Experiments and evaluation The perplexities for different topic numbers of T TM and THDP
18
Experiments and evaluation the topic number for different iteration of THDP.
19
Experiments and evaluation An illustration of 8 to pics from 114–topic s olution for the datase t, Each topic is show n with the 10 words and 5 tags that have the highest probabilit y conditioned on that topic.
20
Agenda Introduction Tag-Topic Model Tag Hierarchical Dirichlet Process Experiments and evaluation Conclusion
21
Conclusion In this paper, we propose a THDP model. The model uses the HDP as the prior distribution of TTM, which infer the topic number of dataset au tomatically and links the tags to the topics of th e document and capture the semantic of a tag i n the form of topic distribution. Example results on the dataset are used to demonstrate the con sistent and promising performance of the propo sed THDP, the computational expense of the pr oposed model is comparable to that of related t opic model.
22
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.