Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ITERATIVE FILE- BASED ITEM:ITEM SIMILARITY COMPUTATION 1 ● Will Holcomb – Vanderbilt University ● Project Aura Intern.

Similar presentations


Presentation on theme: "1 ITERATIVE FILE- BASED ITEM:ITEM SIMILARITY COMPUTATION 1 ● Will Holcomb – Vanderbilt University ● Project Aura Intern."— Presentation transcript:

1 1 ITERATIVE FILE- BASED ITEM:ITEM SIMILARITY COMPUTATION 1 ● Will Holcomb – Vanderbilt University ● Project Aura Intern

2 Sun Confidential: Internal Only 2 1.Recommender systems overview 2.The shape of the tail 3.The role of Project Aura 4.Item:item similarity 5.Programming for Project Caroline 6.Reworking item:item in terms of tuples 7.Parallelizability and computability 8.Computation in Project Aura Presentation Overview

3 Sun Confidential: Internal Only 3 Recommender Systems Exploiting The Long Tail The Theoretical Tail

4 Sun Confidential: Internal Only 4 The Actual Tail (More Or Less) Crawl of Last.fm Top 50 Artists for 11,985 Users 21,858 Total Artists 598,168 Artist/User Pairs 83,668,000 Listens

5 Sun Confidential: Internal Only 5 Collaborative Filtering in Project Aura More Aura details coming later Collaborative filtering is about adding the hybrid to the hybrid recommender system Main concerns for filtering algorithms: > Stability – How much can a recommendation change? > Computability – How long does it take to find the answer?

6 Sun Confidential: Internal Only 6 Item:Item Collaborative Filtering Users are dimensions Items are vectors Similarity is the cosine distance

7 Sun Confidential: Internal Only 7 Project Caroline Designed for internet applications Utility style pricing – pay for what you use Multiple processes distributed across multiple machines Shared file storage No shared memory

8 Sun Confidential: Internal Only 8 The Aura Datastore Requests funneled through Data Store Head Subtrees distributed to Partition Clusters running in separate processes > Process coordination using Jini

9 Sun Confidential: Internal Only 9 Cosine Generation Overview

10 Sun Confidential: Internal Only 10 Composition > For a single Record Set, perform an operation on a list of all records with a given key > (Artist, ) Cartesian Join > For n input Record Sets, permute all pairs of records with matching keys > (Artist A, Length A ) × (Artist B, Length B ) = (Artist A.Artist B, Length A *Length B ) Join Methods Composition

11 Sun Confidential: Internal Only 11 Partitioning Collect all matching keys in a single file Run in m processes for m output files Each processor puts records in a set of shared files as determined by a common hashing scheme File locking necessary to prevent concurrent access

12 Sun Confidential: Internal Only 12 Cosine Generation As Tuples

13 Sun Confidential: Internal Only 13 Computational Complexity Optimizations Exploit symmetricity in output files to only do n! joins Exploit symmetricity in records to only do (.5)n joins

14 14 Any Questions? ● Will Holcomb ● hoenir.himinbi.org hoenir.himinbi.org


Download ppt "1 ITERATIVE FILE- BASED ITEM:ITEM SIMILARITY COMPUTATION 1 ● Will Holcomb – Vanderbilt University ● Project Aura Intern."

Similar presentations


Ads by Google