COMP423 Intelligent Agents
Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set of items in the past – Content based filtering (e.g SmartMuseum) Based on how well the contend of the target item matches the user’s preferred content pattern, which is learnt from the user’s own past ratings and the content pattern of the rated items. – Hybrid
User-based Collaborative Filtering Nearest Neighbor Collaborative Filtering – Calculate user similarities Pearson’s correlation – Define the effective neighborhood – Compute the predicted ratings The correlation of two users ken and lee, they both rated n items K(1..n) L (1..n) Prediction on Ken’s rating for m
Item-based Collaborative filtering The same Item rating Matrix Item vectors: the columns Item similarity – Pearson’s Correlation – Cosine similarity – Adjusted Cosine similarity
typical Collaborative Filtering Memory based collaborative filtering – Nearest-neighbor based – User similarity – Item similarity Clustering for collaborative filtering – Kmeans – HAC – Naïve Bayes clustering – Group oriented, less personalized, can be addressed by reducing cluster size
Content based filtering Content – Features: Movie: directors, actor/actress, producers., editors, distributors, editors, keywords, review, …. Text recommendation: a set of extracted keywords Classification problem
Hybrid Collaborative filtering: – Require other users rating data (cold start problem) – Can do cross domain – Non-transitive association problem: users are linked by common items and items are linked by common users. Content Based – Require one user’s rating data – Require item’s content data – Not cross domain Sequential Hybridization Combinational Hybridization
Evaluation Binary: change rates to positive or negative – Precision – Top N precision – Recall – F-measure – MAP: consider ranking, precision, recall Mean of the Average Precision for all queries Average Precision: the mean of the precision when each relevant document is retrieved. (M is the No of relevant documents) Average precision is roughly the area under the precision and recall curve
Evaluation Consider ranking score MAE: mean absolute error
projects New data, new task – Online dating recommendation – Using community data: what is popular in my peers Combine two or more areas – Personalised Search and recommendation Building profile from click through data Query expansion based on profile – Knowledge based reasoning: model of “what I need” – Decision support systems: heuristics, personality New constrains – Consider time in building user profile – Multiple profiles Children book Chinese novel
A PhD thesis Completed in 2012 User similarities – Do not consider the relevance of items – Aware of item similarity User rating and prediction – Number between 1-5 – Probability: (0.2, 0.3, 0.6, 0.1, 0.1) Hybrid – Sequential – Diamond shape
Research projects Recommender systems combined with personalized search – Building profile from click through data – Query expansion based on profile Two way recommendation – Online dating systems Knowledge-based, Personalized recommendation
Projects Image recommendation Image search without a name Computational advertising Honors project – Recommender for courses
Opinion mining Document level Sentence level Feature level
Bing Liu, UIC ACL Feature-based Summary (Hu and Liu, KDD-04) GREAT Camera., Jun 3, 2004 Reviewer: jprice174 from Atlanta, Ga. I did a lot of research last year before I bought this camera... It kinda hurt to leave behind my beloved nikon 35mm SLR, but I was going to Italy, and I needed something smaller, and digital. The pictures coming out of this camera are amazing. The 'auto' feature takes great pictures most of the time. And with digital, you're not wasting film if the picture doesn't come out. … …. Feature Based Summary : Feature1: picture Positive: 12 The pictures coming out of this camera are amazing. Overall this is a good camera with a really good picture clarity. … Negative: 2 The pictures come out hazy if your hands shake even for a moment during the entire process of taking a picture. Focusing on a display rack about 20 feet away in a brightly lit room during day time, pictures produced by this camera were blurry and in a shade of orange. Feature2: battery life …
Bing Liu, UIC ACL Visual summarization & comparison Summary of reviews of Digital camera 1 PictureBatterySizeWeightZoom + _ Comparison of reviews of Digital camera 1 Digital camera 2 _ +
Opinion mining and sentiment analysis Classification Extraction Summarization Supervised, unsupervised Corpus based, dictionary based
Opinion mining Opinion holder, object and opinions(P, N) Comparative relations – A is cheaper than B Temporal opinion mining and summarization
Projects Web of things Hardware and software Cross domain learning Personalized search learning large Knowledge base Cross checking with Cyc, wordnet Privacy
Web data mining Web Content mining Web structure mining Web usage mining
Two projects on security Intrusion detection by clustering Web log files – New similarity measure Malicious Web pages Automatic detection