Collaborative Filtering - Rajashree. Apache Mahout In 2008 as a subproject of Apache’s Lucene project Mahout absorbed the Taste open source collaborative.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Clustering Categorical Data The Case of Quran Verses
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Recommender System with Hadoop and Spark
1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination.
AWS, HADOOP AND MAHOUT – VIDEO GAME RECOMMENDER BEN GOODING UNIVERSITY OF ARKANSAS – DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRESENTED - APRIL 30,
Searching with Lucene Chapter 2. For discussion Information retrieval What is Lucene? Code for indexer using Lucene Pagerank algorithm.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Collaborative Filtering CMSC498K Survey Paper Presented by Hyoungtae Cho.
Recommender systems Ram Akella November 26 th 2008.
Struts 2.0 an Overview ( )
Item-based Collaborative Filtering Recommendation Algorithms
Identifying and Incorporating Latencies in Distributed Data Mining Algorithms Michael Sevilla.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
Item Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karpis, Joseph KonStan, John Riedl (UMN) p.s.: slides adapted from:
An intro to programming. The purpose of writing a program is to solve a problem or take advantage of an opportunity Consists of multiple steps:  Understanding.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
1 Module Objective & Outline Module Objective: After completing this Module, you will be able to, appreciate java as a programming language, write java.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Scalable Machine Learning CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Plug-In Architecture Pattern. Problem The functionality of a system needs to be extended after the software is shipped The set of possible post-shipment.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
User Modeling and Recommender Systems: Introduction to recommender systems Adolfo Ruiz Calleja 06/09/2014.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Page 1 Cloud Study: Algorithm Team Mahout Introduction 박성찬 IDS Lab.
User Modeling and Recommender Systems: recommendation algorithms
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
Experimental Study on Item-based P-Tree Collaborative Filtering for Netflix Prize.
Company LOGO MovieMiner A collaborative filtering system for predicting Netflix user’s movie ratings [ECS289G Data Mining] Team Spelunker: Justin Becker,
Recommendation Systems ARGEDOR. Introduction Sample Data Tools Cases.
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
Image taken from: slideshare
Big Data is a Big Deal!.
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Scalable Machine Learning
ITCS-3190.
Introducing Apache Mahout
Architecture Concept Documents
Spark Presentation.
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Hadoop Clusters Tess Fulkerson.
Waikato Environment for Knowledge Analysis
Collaborative Filtering Nearest Neighbor Approach
HPML Conference, Lyon, Sept 2018
Overview of big data tools
Movie Recommendation System
Charles Tappert Seidenberg School of CSIS, Pace University
MAPREDUCE TYPES, FORMATS AND FEATURES
Introducing Apache Mahout
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Plug-In Architecture Pattern
Presentation transcript:

Collaborative Filtering - Rajashree

Apache Mahout In 2008 as a subproject of Apache’s Lucene project Mahout absorbed the Taste open source collaborative filtering project

Apache Mahout A Mahout is an elephant trainer/driver/keeper, hence… + Machine Learning =

What is Apache Mahout? Machine learning For large data Based on Hadoop But can work on a non Hadoop cluster Scalable Licensed by Apache Mahout is a Java written open source scalable machine learning library from Apache

Mahout in Apache Software Foundation Lucene: information retrieval software library

Mahout in Apache Software Foundation Hadoop: framework for distributed storage and programming based on MapReduce

Mahout in Apache Software Foundation Taste: collaborative filtering framework

Why Mahout? Mahout provides a rich set of components from which you can construct a customized recommender system from a selection of algorithms. Mahout is designed for performance, scalability and flexibility.

Why do we need a scalable framework?

General Architecture Three-tiers architecture (Application, Algorithms and Shared Libraries)

General Architecture Data Storage and Shared Libraries

General Architecture Business Logic

General Architecture External Applications invoking Mahout APIs

Use cases Currently Mahout supports mainly four use cases: –Recommendation - takes users' behavior and from that tries to find items users might like. –Clustering - takes e.g. text documents and groups them into groups of topically related documents. –Classification - learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. –Frequent itemset mining - takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

In this presentation we will focus on Recommendation

Mahout implements a Collaborative Filtering framework –Popularized by Amazon and others –Uses hystorical data (ratings, clicks, and purchases) to provide recommendations User-based: Recommend items by finding similar users. This is often harder to scale because of the dynamic nature of users. Item-based: Calculate similarity between items and make recommendations. Items usually don't change much, so this often can be computed offline. Slope-One: A very fast and simple item-based recommendation approach applicable when users have given ratings (and not just boolean preferences).

Collaborative Filtering Recommend people and products – User-User User likes X, you might too – Item-Item People who bought X also bought Y

Example Amazon.com

Recommendation - Architecture Inceptive Idea: A Java/J2EE application invokes a Mahout Recommender whose DataModel is based on a set of User Preferences that are built on the ground of a physical DataStore

Recommendation - Architecture Physical Data Data model Recommender External Application

Recommendation in Mahout Input: raw data (user preferences) Output: preferences estimation Step 1 –Mapping raw data into a DataModel Mahout- compliant Step 2 –Tuning recommender components Similarity measure, neighborhood, etc.

Recommendation Components Mahout implements interfaces to these key abstractions: –DataModel Methods for mapping raw data to a Mahout-compliant form –UserSimilarity Methods to calculate the degree of correlation between two users –ItemSimilarity Methods to calculate the degree of correlation between two items –UserNeighborhood Methods to define the concept of ‘neighborhood’ –Recommender Methods to implement the recommendation step itself

Components: DataModel A DataModel is the interface to draw information about user preferences. Which sources is it possible to draw? –Database MySQLJDBCDataModel –External Files FileDataModel –Generic (preferences directly feed through Java code) GenericDataModel

Components: DataModel Basic object: Preference –Preference is a triple (user,item,score) Two implementations –GenericUserPreferenceArray It stores numerical preference, as well. –BooleanUserPreferenceArray It skips numerical preference values.

Components: UserSimilarity UserSimilarity defines a notion of similarity between two Users. –(respectively) ItemSimilarity defines a notion of similarity between two Items. Which definition of similarity are available? –Pearson Correlation –Spearman Correlation –Euclidean Distance –Tanimoto Coefficient –LogLikelihood Similarity

Example: TanimotoDistance

Components: UserNeighborhood UserSimilarity defines a notion of similarity between two Users. –(respectively) ItemSimilarity defines a notion of similarity between two Items. Which definition of neighborhood are available? –Nearest N users The first N users with the highest similarity are labeled as ‘neighbors’ –Tresholds Users whose similarity is above a threshold are labeled as ‘neighbors’

Components: Recommender Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item Which recommendation algorithms are implemented? –User-based CF –Item-based CF –SVD-based CF

Download Mahout Download –The latest Mahout release is 0.9 –Available at: distribution-0.9.zip -Extract all the libraries and include them in a new NetBeans (Eclipse) project Requirement: Java 1.6.x or greater. Hadoop is not mandatory!

Exercise 1: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); }

Exercise 1: First Recommender package dnslab.recommender; import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } FileData model

Exercise 1: First Recommender package dnslab.recommender; import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } Neighbors

Exercise 1: First Recommender package dnslab.recommender; import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } Top-1 Recommendation for User 1

Exercise 1: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("data/mydata.dat ")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); }

Output

Exercise 2: MovieLens Recommender Download the MovieLens dataset (100k) – 100k.zip similarity calculations with a bigger dataset Next: now we can run the recommendation framework against a state-of-the-art dataset

MovieLens Dataset u.data The full u data set, ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1 The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp

Data Processing Its format is not Mahout compliant cat u.data | cut –f1,2,3 | tr “\\t” “,” >>u.data1

package dnslab.recommender; import java.io.File; import java.io.IOException; import java.util.List; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity; import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.recommender.RecommendedItem; import org.apache.mahout.cf.taste.similarity.ItemSimilarity; public class ItemRecommend { public static void main(String[] args) { try { DataModel dm = new FileDataModel(new File("data/u.data1")); TanimotoCoefficientSimilarity sim = new TanimotoCoefficientSimilarity(dm); GenericItemBasedRecommender recommender = new GenericItemBasedRecommender(dm, sim); int x=1; for(LongPrimitiveIterator items = dm.getItemIDs(); items.hasNext();) { long itemId = items.nextLong(); List recommendations = recommender.mostSimilarItems(itemId, 5); for(RecommendedItem recommendation : recommendations) { System.out.println(itemId + "," + recommendation.getItemID() + "," + recommendation.getValue()); } x++; if(x>10) System.exit(1); } } catch (IOException e) { System.out.println("There was an error."); e.printStackTrace(); } catch (TasteException e) { System.out.println("There was a Taste Exception"); e.printStackTrace(); }

Output Similar items Preferences similarity items

Thank you