Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4.

Similar presentations


Presentation on theme: "Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4."— Presentation transcript:

1 Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4

2 Agenda  Introduction  Tag-Topic Model  Tag Hierarchical Dirichlet Process  Experiments and evaluation  Conclusion

3 Introduction  With the rapid development of web 2.0, the inte rnet has brought a large amount of re­sources, s uch as blogs, twitter, and encyclope­dia.  These resources contain a wealth of informatio n, which can be applied to a variety of fields in i nformation processing to improve the service q uality, but it is too deficiency to use tradition hu man professional to dispose the information.

4 Introduction  In NLP, computer programs face several tasks t hat require human-level intelligence, or the prog rams should be endowed with the ability of lang uage understanding.  One core of the issues is how to automatically o btain knowledge and effectively use them to ac hieve semantic analysis and computation

5 Introduction  Tagging has recently emerged as a popular wa y to organize user generated content for Web 2. 0 applications, such as blogs and bookmarks. I n blogs, users can assign one or more tags for each blog. Usually, these tags can reflect the c oncerned subjects of the contents. Tags can be seen as labeled meta-information about the con tent, and they are beneficial for knowledge mini ng from blogs.

6 Introduction  In this paper, we extend the Tag topic model (T TM) 1 by crystallized HDP as prior distribution. We assume that an author is clear in his mind t hat the content will contains which as­pects befo re he writes a blog and for each aspect he will c hoose a tag to describe it.

7 Agenda  Introduction  Tag-Topic Model  Tag Hierarchical Dirichlet Process  Experiments and evaluation  Conclusion

8 LDA Generative model

9 Tag-Topic Model  Basic ideal: each docu­ment with a mixture of ta gs, each tag can be viewed as a multinomial dis tribution over topics and each topic is associate d with a multi­nomial distribution over words.

10 Tag-Topic Model

11 Agenda  Introduction  Tag-Topic Model  Tag Hierarchical Dirichlet Process  Experiments and evaluation  Conclusion

12 THDP  The THDP topic model draws upon the strength s of the two models (TTM, HDP); using the topi c-based representation to model both the conte nt of documents and the tag. As in the THDP m odel, a group of tags, T d, indicate the mainly pu rpose of the blog. For each word in the docume nt a Tag is cho­sen uniformly at random. Then, as in the topic model, a topic is chosen from a d istribution over topics specific to that tag, and th e word is generated form the chosen topic.

13 THDP

14  Given an underlying measure H on multinomial probabil ity vectors, we select a random measure G0 which prov ides a countable infinite collection of multinomial probab ility vectors; these can be viewed as the set of all topics that can be used in a given corpus. For the lth tag in the jth document in the corpus we sample Gj using G0 as a base measure; this selects specific subsets of topics to be used in tag l in document j. From Gj we then generat e a document by repeatedly (1) choose a tag with the e qual probability from the tag sets associate with the doc ument and (2) sampling specific multinomial probability vectors zji from Gj and sampling words wji with probabili ties zji. The overlap among the random measures Gj im plement the sharing of topics among documents.

15 Agenda  Introduction  Tag-Topic Model  Tag Hierarchical Dirichlet Process  Experiments and evaluation  Conclusion

16 Experiments and evaluation  DataSet The dataset used in the experiment is from the blog c orpus during October 2011 and December 2012, which is constructed by Na­tional Language Resources Monito ring and Research Center, Network Media Branch. Afte r filtering out blog texts with no tags or containing less t han 100 words and some prepro­cessing such as remov e stop words and extremely common words, filter out th e non-nominal words and retain only the nouns or nomi nal phrases. The dataset containing the tags and conte xt of N = 927 blog, with W = 10438 words in the vocabul ary and T = 558 tags.

17 Experiments and evaluation  The perplexities for differ­ent topic numbers of T TM and THDP

18 Experiments and evaluation  the topic number for differ­ent iteration of THDP.

19 Experiments and evaluation  An illustration of 8 to pics from 114–topic s olution for the datase t, Each topic is show n with the 10 words and 5 tags that have the highest probabilit y conditioned on that topic.

20 Agenda  Introduction  Tag-Topic Model  Tag Hierarchical Dirichlet Process  Experiments and evaluation  Conclusion

21 Conclusion  In this paper, we propose a THDP model. The model uses the HDP as the prior distribution of TTM, which infer the topic number of dataset au tomati­cally and links the tags to the topics of th e document and capture the semantic of a tag i n the form of topic distribution. Example results on the dataset are used to demonstrate the con sistent and promising per­formance of the propo sed THDP, the computa­tional expense of the pr oposed model is comparable to that of related t opic model.

22 Thank you


Download ppt "Incorporating Hierarchical Diric- hlet Process into Tag topic Model 张明 2013.4."

Similar presentations


Ads by Google