Personalisation and Recommendations using Drupal Keywords: – Personalisation – Recommendations – Scalable machine learning – Predictions – Similarity –

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Fusing Online Commerce and Social Network: Enhance Social Shopping Experience via Desktop Application A Master Project Presented By Ning Song.
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Recommender System with Hadoop and Spark
Movie Recommendation System
Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination.
Recommender Systems; Social Information Filtering.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Drupal Workshop Introduction to Drupal Part 1: Web Content Management, Advantages/Disadvantages of Drupal, Drupal terminology, Drupal technology, directories.
HADOOP ADMIN: Session -2
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Sharing Geographic Content
Identifying and Incorporating Latencies in Distributed Data Mining Algorithms Michael Sevilla.
The DSpace Course Module – DSpace Installation. Module objectives  By the end of this module you will:  Understand the platforms DSpace can be hosted.
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
Collaborative Filtering - Rajashree. Apache Mahout In 2008 as a subproject of Apache’s Lucene project Mahout absorbed the Taste open source collaborative.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
What Can Do for You! Fabian Christ
Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Apache Mahout Industrial Strength Machine Learning Jeff Eastman.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
1CONFIDENTIAL | Thinking Lucene Think Lucid Grant Ingersoll Chief Scientist Lucid Imagination Enhancing Discovery with Solr and Mahout.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Revolutionizing enterprise web development Searching with Solr.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
We Know IT … IT’s What We Do! ® 2 Cyprien Mvuanda & Jonathan Davis Empire 2.0 Services October 1, 2010 Albany, NY Design, Development,Workflow and Implementation.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Scalable Machine Learning CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
frustrated consumer confused content owner world wide web All Web Sites All Web Browsers.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Enterprise Cloud Computing
807 - TEXT ANALYTICS Massimo Poesio Lab 2: (Quick intro to) SOLR Document clustering with MAHOUT.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Matthias Clausen, Jan Hatje, DESY CSS Overview – Alarm System and Management CSS Overview - GSI, 11 Februrary CSS Overview Alarm System and CSS.
Page 1 Cloud Study: Algorithm Team Mahout Introduction 박성찬 IDS Lab.
Institute for the Protection and Security of the Citizen HAZAS – Hazard Assessment ECCAIRS Technical Course Provided by the Joint Research Centre - Ispra.
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
Apache Mahout Industrial Strength Machine Learning Jeff Eastman.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Recommendation Systems ARGEDOR. Introduction Sample Data Tools Cases.
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
Image taken from: slideshare
Big Data is a Big Deal!.
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
Item-to-Item Recommender Network Optimization
Presented by: Javier Pastorino Fall 2016
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Scalable Machine Learning
Industrial Strength Machine Learning Jeff Eastman
Introducing Apache Mahout
Spark Presentation.
Hadoop Clusters Tess Fulkerson.
Big Data: GitHub & Spark
Scalable Parallel Interoperable Data Analytics Library
Introduction to Apache
Spark and Scala.
Technical Capabilities
Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services
Charles Tappert Seidenberg School of CSIS, Pace University
Introducing Apache Mahout
Presentation transcript:

Personalisation and Recommendations using Drupal Keywords: – Personalisation – Recommendations – Scalable machine learning – Predictions – Similarity – Data Mining – Big Data – Trend Spotting – Clustering Drupal Developer Days Barcelona – Kendra Initiative

Kendra Initiative mission – Foster an Open Distributed Marketplace for Digital Media EU funded – P2P-Next – SARACEN (Socially Aware, collaboRative, scAlable Coding mEdia distributioN) Drupal Developer Days Barcelona – Kendra Initiative

Deliverables Kendra Signpost – Metadata interoperability, mapping and transformation Smart Filters – Portable preferences and filters Kendra Social, Kendra Hub – Social networking management tools Standards work – OpenSocial extension – Social API – see Abstracting Social Networking functionality in Drupal sprint Kendra Match – Searching and recommendation Drupal Developer Days Barcelona – Kendra Initiative

Components Drupal Recommender API module Recommender helper modules async_command module Apache Mahout or cloud service Hadoop cluster (optional) Drupal Developer Days Barcelona – Kendra Initiative

Industry Examples Amazon Netflix Spotify, Pandora Facebook, LinkedIn OKCupid iTunes: Genius; app store - not so much Drupal Developer Days Barcelona – Kendra Initiative

Machine learning Collaborative Filtering – AKA recommender engines Clustering Classification Drupal Developer Days Barcelona – Kendra Initiative

Collaborative Filtering Input: preference data Output: predictions Preference = – w 1 = signed integer representing weight of uid 1 - nid 1 or uid 1 -uid 2 correlation (affinity) Prediction = – w 2 = float representing strength of uid 1 -nid 1 or uid 1 -uid 2 correlation Drupal Developer Days Barcelona – Kendra Initiative

Enter Mahout Apache Mahout is a scalable machine learning library that supports large data sets. Launched Spring 2010 Grew from the Apache Lucene project (basis for Apache Solr) Merged with Taste project Drupal Developer Days Barcelona – Kendra Initiative

Use Cases Recommendation mining Clustering Classification Frequent itemset mining Drupal Developer Days Barcelona – Kendra Initiative

Out-of-box algorithms Recommendation – User-based recommender – Item-based recommender – Slope-One recommender – Distributed Item-Based Collaborative Filtering – Collaborative Filtering using parallel matrix factorisation Clustering – Canopy Clustering – K-Means Clustering – Fuzzy K-Means – Mean Shift Clustering – Dirichlet Process Clustering – Latent Dirichlet Allocation – Spectral Clustering – Minhash Clustering Model combination – Naive Bayes algorithm Drupal Developer Days Barcelona – Kendra Initiative

Hadoop Provides clustering capabilities Not trivial to set up Not yet implemented in Recommender API (issue # ) Drupal Developer Days Barcelona – Kendra Initiative

Recommender API Drupal 7 (alpha) & 6 (beta) Can run either on same server as Apache web server or on a remote server Java helper program (was PHP) Uses JDBC and Java Persistence API (JPA) Drupal helper modules Drupal Developer Days Barcelona – Kendra Initiative

Recommender API helper modules Browsing History Recommender OG Similar groups module Ubercart Products Recommender Fivestar Recommender Points Voting Recommender Flag Recommender Drupal Developer Days Barcelona – Kendra Initiative

Asynchronous operation Async_command module – Talks to Mahout – Typically run via cron Results are stored directly in Drupal db – Recommender tables – Via JDBC Drupal Developer Days Barcelona – Kendra Initiative

Hosting Solutions Self-hosted: all-in-one (web server, database server, recommender server) - has its pro’s & cons Recommender API Cloud Service - looking for beta testers Amazon Elastic MapReduce (EMR) Drupal Developer Days Barcelona – Kendra Initiative

Installing Mahout Prerequisites: – Dedicated VM if possible – Linux, Mac OSX Leopard or later, Windows (Cygwin) – Java JDK 1.6 – Maven or higher (maven.apache.org) Drupal Developer Days Barcelona – Kendra Initiative

Installing Mahout Building – Follow instructions – ut.html ut.html Use maven to build examples Drupal Developer Days Barcelona – Kendra Initiative

Installing Mahout Testing: Grouplens – On a single 2GHz server: 100K ratings (1000 users, 1700 items): 9 minutes. 1M ratings (6000 users, 4000 items): 12 hours. 10M ratings (72,000 users, 10,000 items): fuggedaboutit – Using 6 concurrent 2GHz processing units: 100K ratings (1000 users, 1700 items): 2 minutes. 1M ratings (6000 users, 4000 items): 2 hours. 10M ratings (72,000 users, 10,000 items): 11 days 20 hours. Drupal Developer Days Barcelona – Kendra Initiative

Installing Recommender API See Configuration – sites/all/modules/async_command/config.propert ies should match settings.php Download and enable async_command Check /admin/config/search/recommender/admin Drupal Developer Days Barcelona – Kendra Initiative

Usage Making recommendations – User-user – User-item – Item-item Predictions/similarity feeds back into Drupal Blocks Views Drupal Developer Days Barcelona – Kendra Initiative

Case study: Data Mining and Recommendations in SARACEN SARACEN: Feedback loop to measure subjective quality of the recommendations – Limited set of data, small user base – API provides an initial set of recommended videos – User can then watch a recommended video – User’s actions are incorporated into their implicit profile, feeds back to the recommender API – Recommender API generates new predictions based on the complete set of implicit profile metadata Drupal Developer Days Barcelona – Kendra Initiative

SARACEN: Prototype Drupal Developer Days Barcelona – Kendra Initiative

Recommender data sources Explicit data – SARACEN account data, including location and language – Linked accounts and profiles e.g. Facebook user profile, “likes”, connections, metadata Implicit data – Activity history recorded during the user’s sessions – Searches – Shared content – Viewed content – Albums (media containers) – Content ratings Drupal Developer Days Barcelona – Kendra Initiative

Scalability Don’t need Hadoop if – Number of users is orders of magnitude larger than the number of items – Users browse anonymously most of the time – Few users log in and need personalised recommendations – Item churn rate is relatively low Drupal Developer Days Barcelona – Kendra Initiative

Worth Considering Decreased Transparency Decreased Serendipity Sleep deprivation Drupal Developer Days Barcelona – Kendra Initiative

Resources: Recommender API MAHOUT MAHOUT Drupal Developer Days Barcelona – Kendra Initiative

Resources: Mahout Mahout in Action – – ISBN The Optimality of Naive Bayes, Harry Zhang. Drupal Developer Days Barcelona – Kendra Initiative

Acknowledgements Socially Aware, collaboRative, scAlable Coding mEdia distributioN (SARACEN) – – Funded within the European Union’s Seventh Framework Programme (FP7/ ) under grant agreement Drupal Developer Days Barcelona – Kendra Initiative

Questions? Kendra Initiative – – Klokie Grossfeld – – Daniel Harris – – Drupal Developer Days Barcelona – Kendra Initiative

Thanks Drupal Developer Days Barcelona – Kendra Initiative networking-functionality-drupal