Personalisation and Recommendations using Drupal Keywords: – Personalisation – Recommendations – Scalable machine learning – Predictions – Similarity – Data Mining – Big Data – Trend Spotting – Clustering Drupal Developer Days Barcelona – Kendra Initiative
Kendra Initiative mission – Foster an Open Distributed Marketplace for Digital Media EU funded – P2P-Next – SARACEN (Socially Aware, collaboRative, scAlable Coding mEdia distributioN) Drupal Developer Days Barcelona – Kendra Initiative
Deliverables Kendra Signpost – Metadata interoperability, mapping and transformation Smart Filters – Portable preferences and filters Kendra Social, Kendra Hub – Social networking management tools Standards work – OpenSocial extension – Social API – see Abstracting Social Networking functionality in Drupal sprint Kendra Match – Searching and recommendation Drupal Developer Days Barcelona – Kendra Initiative
Components Drupal Recommender API module Recommender helper modules async_command module Apache Mahout or cloud service Hadoop cluster (optional) Drupal Developer Days Barcelona – Kendra Initiative
Industry Examples Amazon Netflix Spotify, Pandora Facebook, LinkedIn OKCupid iTunes: Genius; app store - not so much Drupal Developer Days Barcelona – Kendra Initiative
Machine learning Collaborative Filtering – AKA recommender engines Clustering Classification Drupal Developer Days Barcelona – Kendra Initiative
Collaborative Filtering Input: preference data Output: predictions Preference = – w 1 = signed integer representing weight of uid 1 - nid 1 or uid 1 -uid 2 correlation (affinity) Prediction = – w 2 = float representing strength of uid 1 -nid 1 or uid 1 -uid 2 correlation Drupal Developer Days Barcelona – Kendra Initiative
Enter Mahout Apache Mahout is a scalable machine learning library that supports large data sets. Launched Spring 2010 Grew from the Apache Lucene project (basis for Apache Solr) Merged with Taste project Drupal Developer Days Barcelona – Kendra Initiative
Use Cases Recommendation mining Clustering Classification Frequent itemset mining Drupal Developer Days Barcelona – Kendra Initiative
Out-of-box algorithms Recommendation – User-based recommender – Item-based recommender – Slope-One recommender – Distributed Item-Based Collaborative Filtering – Collaborative Filtering using parallel matrix factorisation Clustering – Canopy Clustering – K-Means Clustering – Fuzzy K-Means – Mean Shift Clustering – Dirichlet Process Clustering – Latent Dirichlet Allocation – Spectral Clustering – Minhash Clustering Model combination – Naive Bayes algorithm Drupal Developer Days Barcelona – Kendra Initiative
Hadoop Provides clustering capabilities Not trivial to set up Not yet implemented in Recommender API (issue # ) Drupal Developer Days Barcelona – Kendra Initiative
Recommender API Drupal 7 (alpha) & 6 (beta) Can run either on same server as Apache web server or on a remote server Java helper program (was PHP) Uses JDBC and Java Persistence API (JPA) Drupal helper modules Drupal Developer Days Barcelona – Kendra Initiative
Recommender API helper modules Browsing History Recommender OG Similar groups module Ubercart Products Recommender Fivestar Recommender Points Voting Recommender Flag Recommender Drupal Developer Days Barcelona – Kendra Initiative
Asynchronous operation Async_command module – Talks to Mahout – Typically run via cron Results are stored directly in Drupal db – Recommender tables – Via JDBC Drupal Developer Days Barcelona – Kendra Initiative
Hosting Solutions Self-hosted: all-in-one (web server, database server, recommender server) - has its pro’s & cons Recommender API Cloud Service - looking for beta testers Amazon Elastic MapReduce (EMR) Drupal Developer Days Barcelona – Kendra Initiative
Installing Mahout Prerequisites: – Dedicated VM if possible – Linux, Mac OSX Leopard or later, Windows (Cygwin) – Java JDK 1.6 – Maven or higher (maven.apache.org) Drupal Developer Days Barcelona – Kendra Initiative
Installing Mahout Building – Follow instructions – ut.html ut.html Use maven to build examples Drupal Developer Days Barcelona – Kendra Initiative
Installing Mahout Testing: Grouplens – On a single 2GHz server: 100K ratings (1000 users, 1700 items): 9 minutes. 1M ratings (6000 users, 4000 items): 12 hours. 10M ratings (72,000 users, 10,000 items): fuggedaboutit – Using 6 concurrent 2GHz processing units: 100K ratings (1000 users, 1700 items): 2 minutes. 1M ratings (6000 users, 4000 items): 2 hours. 10M ratings (72,000 users, 10,000 items): 11 days 20 hours. Drupal Developer Days Barcelona – Kendra Initiative
Installing Recommender API See Configuration – sites/all/modules/async_command/config.propert ies should match settings.php Download and enable async_command Check /admin/config/search/recommender/admin Drupal Developer Days Barcelona – Kendra Initiative
Usage Making recommendations – User-user – User-item – Item-item Predictions/similarity feeds back into Drupal Blocks Views Drupal Developer Days Barcelona – Kendra Initiative
Case study: Data Mining and Recommendations in SARACEN SARACEN: Feedback loop to measure subjective quality of the recommendations – Limited set of data, small user base – API provides an initial set of recommended videos – User can then watch a recommended video – User’s actions are incorporated into their implicit profile, feeds back to the recommender API – Recommender API generates new predictions based on the complete set of implicit profile metadata Drupal Developer Days Barcelona – Kendra Initiative
SARACEN: Prototype Drupal Developer Days Barcelona – Kendra Initiative
Recommender data sources Explicit data – SARACEN account data, including location and language – Linked accounts and profiles e.g. Facebook user profile, “likes”, connections, metadata Implicit data – Activity history recorded during the user’s sessions – Searches – Shared content – Viewed content – Albums (media containers) – Content ratings Drupal Developer Days Barcelona – Kendra Initiative
Scalability Don’t need Hadoop if – Number of users is orders of magnitude larger than the number of items – Users browse anonymously most of the time – Few users log in and need personalised recommendations – Item churn rate is relatively low Drupal Developer Days Barcelona – Kendra Initiative
Worth Considering Decreased Transparency Decreased Serendipity Sleep deprivation Drupal Developer Days Barcelona – Kendra Initiative
Resources: Recommender API MAHOUT MAHOUT Drupal Developer Days Barcelona – Kendra Initiative
Resources: Mahout Mahout in Action – – ISBN The Optimality of Naive Bayes, Harry Zhang. Drupal Developer Days Barcelona – Kendra Initiative
Acknowledgements Socially Aware, collaboRative, scAlable Coding mEdia distributioN (SARACEN) – – Funded within the European Union’s Seventh Framework Programme (FP7/ ) under grant agreement Drupal Developer Days Barcelona – Kendra Initiative
Questions? Kendra Initiative – – Klokie Grossfeld – – Daniel Harris – – Drupal Developer Days Barcelona – Kendra Initiative
Thanks Drupal Developer Days Barcelona – Kendra Initiative networking-functionality-drupal