Tweetool ( version) Final Report Yilei Qian Computer Science University of Southern California A Twitter Recommend System based on Topic Modeling
Ideas Following too many points on Twitter Too many news every day Cannot find the interested and valued news Don’t know the name which user want to follow Need someone to recommend who to follow Need someone to recommend the hottest news Use topic modeling to re-rank all the user
Traditional Method
Topic Modeling
a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Always used in natural language processing. Reference Papers: Steyvers,m. and Griffiths, T., “Probabilistic topic models,” Hand book of latent semantic analysis Blei, D.M and Ng, A.Y and Jordan, M.I, “Latent Dirichlet Allocation”, The Journal of Machine Learning Research 2003
Label based LDA Step: 1.Build the LDA Model 2.Train the model instance by train document 3.Run the LDA for all the data based on trained model instance Problem: 1.Punctuation marks. E.g. “”,.={}() … 2.Frequent words. E.g I, you…. 3.Other Noise
Result Generate
13-Dimension Topics 1.Art & Design 2.Book 3.Business 4.Charity 5.Entertainment 6.Family 7.Fashion 8.Food & Drink 9.Health 10.Music 11.News 12.Science & Technology 13.Sports
Languages & Tools Web UI: HTML + AJAX(Unfinished) +CSS(unfinished)+Twitter REST API Android UI: Java, Android 2.1(unfinished) Server Side: Java 1.6, Servlet 2.0, Spring 3.0, Hibernate 3.3 Twitter API: Twitter4j (300 request per hour) Server: Tomcat 7.08 Database: MySQL 5.5 Data Package: JSON Develop Platform: Eclipse 3.4 Total code lines: 2000(+) = 5000(+) Subversion:
Architecture DB Twitter fetch LLDA Tweetool Hibernate DAO Work Flow Servlets Work Flow Mobile Device HTML APPLICATION CONTEXT
Distributed Crawler & Computing
Problems(endless T_T) 1.High noise in topic model Few words, Odd marks, Abbreviation 2.Unfamiliar with Twitter API, A lot of bugs 3.Transaction Problems 4.The Ugly UI 5.Poor performance 6.Don’t have enough time. Many functions are unfinished 7.Tweetool system should be reconstructed !!! Environment: 7000+Users 22,0000+Tweets
Future Work 1.Try to finish it 2.Debug 3.Build a better train file 4.Add feedback function 5.Better topics classification
Web UI (Design Version)
Android UI Function Button Function Button Function Button Function Button Titile Main MenuNews Menu Title News