Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sophia(Xueyao) Liang CPSC 503 Final Project. K=3 Unsupervised P( |d) Olympic, vancouver Snow, cold Moon light, spider man.

Similar presentations


Presentation on theme: "Sophia(Xueyao) Liang CPSC 503 Final Project. K=3 Unsupervised P( |d) Olympic, vancouver Snow, cold Moon light, spider man."— Presentation transcript:

1 Sophia(Xueyao) Liang CPSC 503 Final Project

2 K=3 Unsupervised P( |d) Olympic, vancouver Snow, cold Moon light, spider man

3 W1W2W3W4… D11011 D2………… D3………… ……………

4 W1W2W3W4… D11011 D2………… D3………… …………… z k ∈ {z 1,z 2,…,z N }

5 Expectation: Maximization:

6 D1D2D3D4… D11011 D2………… D3………… ……………

7 W1W2W3W4… D11011 D2………… D3………… ……………

8

9

10

11  Efficient Algorithm:  Expectation (PLSA)  Maximization(PLSA)  The result of the previous steps may not ends in better value for O Parameter Inference: No closed form solution for expectation step

12  Potential Problems of the model  Parameter Inference  Higher time complexity and slower to converge -10000100

13  Cora Data version 1.0  Cited paper not in the corpus  No abstract for some post-script files  Too many categories  Duplicated or isolated papers 30000 scientific papers, with citation information Important files: papers (ID-name, link, author…..) citations (ID-cited ID) classifications (link-category) directory: extractions (post-script form of the papers)

14  Cora Data version 1.0  Papers in category Machine Learning About 2700 papers 1400 Frequent Words (stop words removed, stemmed) Theory315 Reinforcement217 Geneti Algorithms418 Neural Networks818 Probabilistic426 Case based298 Rule Learning180

15

16 (A) Accuracy(B) Recall Accuray and Recall for each category PHITSPLSANetPLSA Overall Accuracy0.4700.5010.562 Overall Accuracy

17  Justified the claim that adding network structure into the model could improve the result of topic modeling  Modeled the network on a scale of articles  Inherent problem exists in the picked framework  The result is still far from satisfactory

18  How to model the network structure of blog articles, especially considering model them on a scale of articles  Bag-of-words matrix extraction  Better integral model, maybe LDA based  Efficiency of the algorithm  Recommendation based on topic communtiy discovery


Download ppt "Sophia(Xueyao) Liang CPSC 503 Final Project. K=3 Unsupervised P( |d) Olympic, vancouver Snow, cold Moon light, spider man."

Similar presentations


Ads by Google