5-star Ratings & Recommendations with Mahout Robin Bramley Chief Scientific Officer Ixxus
We are a leading global provider of end-to-end custom-built content solutions.
Our Alfresco Credentials Long-standing Platinum Alfresco partner in US and UK Working with Alfresco since Alfresco v0.6 Excellent Alfresco knowledge and highly trained and experienced staff We are trusted to deliver some of the largest Alfresco projects in the World Alfresco Million $ Club (May 2012) Best Solution Partner (Nov 2013)
Award-winning projects: Contributed to:
Presented at: Published in:
Discovering existing knowledge How did we find answers 30 years ago? How was that information organised? Encyclopædia Library Bookshop Printed 1768 - 2010 7
The landscape changed "Updating dozens of books every two years now seems so pedestrian. The younger generation consumes data differently now, and we want to be there.” Jorge Cauz, Britannica, 2012
Number 6: “What do you want?” Number 2: “We want information.” The Prisoner
Discoverability Metadata is key Permits discovery through multiple dimensions
Finding stuff in Alfresco A quick recap
Wordle: Browse Keyword~search Advanced~search Faceted~navigation Workflow Taxonomy Folksonomy~tags Dashlets Image~browsing Association~relationships Favourites Likes
Wanted to use the Anthrax Anti-Social single cover here – copyright stopped play Audience participation exercise
Social content
Alice and Barbara I love my new iPhone 6 Me too! Alfresco on iOS is great isn’t it? If you like Alfresco you should check out Robin’s Summit talk… Recommendations in a nutshell
Collaborative filtering User similarity recommendations in a nutshell A B C 1 2 3 4 5
Alfresco 5-star ratings 5 star rating scheme supported by the Ratings Service Not exposed in Share Nod to metaversant / Jeff Potts’ 5 star Share extension
Demo time
Overview Diagram needs to be made clearer for projection
Technical details UML class diagram here?
5 stars give us preference level Taste
The elephant in the room
Hadoop Hadoop was named after a stuffed toy elephant owned by the son of Doug Cutting who started the project Hadoop was extracted from the Nutch crawler Lucene sub-project and provides a scalable batch data processing framework using Map-Reduce on top of a distributed file system (HDFS). The use of Hadoop is beyond the scope of this session
Mahout started off as a sub-project of Apache Lucene Portions of Mahout were* built on top of Hadoop The name is a Hindi word referring to an elephant driver * the project is moving over to Apache Spark
Recommendations Clustering Classification User or item similarity Grouping similar documents Classification Reduce manual burden of assigning categories
RDBMS data source
Back to the demo
Overview Diagram needs to be made clearer for projection
Technical details
Sample Code { // extract avm store id and path var fullpath = url.extension.split("/"); if (fullpath.length == 0) status.code = 400; status.message = "Store id has not been provided."; status.redirect = true; break script; } var storeid = fullpath[0]; var path = (fullpath.length == 1 ? "/" : "/" + fullpath.slice(1).join("/"));
Questions
Image credits Land Rover Discovery 3 Encylopædia Dewey Decimal http://www.flickr.com/photos/klausnahr/2572689595/ Encylopædia http://www.flickr.com/photos/stewart/461099066/ Dewey Decimal http://www.flickr.com/photos/brewbooks/4467301505/ Book store http://www.flickr.com/photos/brewbooks/6541665609/ Anti-social sign https://www.flickr.com/photos/ell-r-brown/6937806186