Recommendations Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Slide deck by Dr. Greg Reese Miami University MATLAB An Introduction With Applications, 5 th Edition Dr. Amos Gilat The Ohio State University Chapter 3.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Distance and Similarity Measures
The Laws of Linear Combination James H. Steiger. Goals for this Module In this module, we cover What is a linear combination? Basic definitions and terminology.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Chapter 1 Computing Tools Data Representation, Accuracy and Precision Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction.
Lecture 4 Linear Filters and Convolution
Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.
Chapter 5 Orthogonality
Calculating Spectral Coefficients for Walsh Transform using Butterflies Marek Perkowski September 21, 2005.
Recommender Systems. Collaborative Filtering Process.
Richard Fateman CS 282 Lecture 111 Determinants Lecture 11.
Learn how to make your drawings come alive…  NEW COURSE: SKETCH RECOGNITION Analysis, implementation, and comparison of sketch recognition algorithms,
Concatenation MATLAB lets you construct a new vector by concatenating other vectors: – A = [B C D... X Y Z] where the individual items in the brackets.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Matrix Mathematics in MATLAB and Excel
Linear regression models in matrix terms. The regression function in matrix terms.
Copyright © Cengage Learning. All rights reserved. 7.6 The Inverse of a Square Matrix.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Separate multivariate observations
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
1 Chapter 2 Matrices Matrices provide an orderly way of arranging values or functions to enhance the analysis of systems in a systematic manner. Their.
Chapter 7 Matrix Mathematics Matrix Operations Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Distance and Similarity Measures
Portfolio Statistics  Portfolio Expected Return: E(r p ) = w T r  Portfolio Variance:  2 p = w T  w  Sum of portfolio weights: = w T 1 –where w is.
Chapter 10 Review: Matrix Algebra
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Linear Recurrence Relations in Music By Chris Hall.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Array Math.
Matrices Square is Good! Copyright © 2014 Curt Hill.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
1 1 Slide © 2004 Thomson/South-Western Chapter 17 Multicriteria Decisions n Goal Programming n Goal Programming: Formulation and Graphical Solution and.
MATLAB for Engineers 4E, by Holly Moore. © 2014 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved. This material is protected by Copyright.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
CMPS 1371 Introduction to Computing for Engineers MATRICES.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Mean and Standard Deviation of Discrete Random Variables.
Scientific Computing General Least Squares. Polynomial Least Squares Polynomial Least Squares: We assume that the class of functions is the class of all.
A n = c 1 a n-1 + c2an-2 + … + c d a n-d d= degree and t= the number of training data (notes) The assumption is that the notes in the piece are generated.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
The Pearson Product-Moment Correlation Coefficient.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Linear Algebra Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Correlation. Correlation is a measure of the strength of the relation between two or more variables. Any correlation coefficient has two parts – Valence:
Chapter 6: Analyzing and Interpreting Quantitative Data
Basics Copyright © Software Carpentry 2011 This work is licensed under the Creative Commons Attribution License See
INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work
Exceptions Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
WEIGHTED AVERAGE ALG114 Weighted Average: an average where every quantity is assigned a weight. Example: If a teacher thinks it’s more important, a final.
Quality Assessment based on Attribute Series of Software Evolution Paper Presentation for CISC 864 Lionel Marks.
Indexing Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
 1 More Mathematics: Finding Minimum. Numerical Optimization Find the minimum of If a given function is continuous and differentiable, find the root.
Copyright © Cengage Learning. All rights reserved. 8 Matrices and Determinants.
Chapter 7 Matrix Mathematics
Sets and Dictionaries Examples Copyright © Software Carpentry 2010
Regression.
Singular Value Decomposition
Collaborative Filtering Nearest Neighbor Approach
WUPS: Find the determinates by hand
Since When is it Standard to Be Deviant?
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
WUPS: Find the determinates by hand
MATLAB Programming Indexing Copyright © Software Carpentry 2011
Matrix Algebra THE INVERSE OF A MATRIX © 2012 Pearson Education, Inc.
Presentation transcript:

Recommendations Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information. Matrix Programming

MatricesRecommendations NumPy is a numerical library in Python –Provides matrices and tools to manipulate them –Plus a large library of linear algebra operations Create a complete program

MatricesRecommendations What papers should scientists read? One good answer is "what their colleagues read" –Input: scientists' ratings of papers –Output: what they should read Based on example from Segaran's Programming Collective Intelligence

MatricesRecommendations What criteria can we use to recommend papers? 1.The way the paper was rated by other people. 2.The similarity between those raters and the previous ratings for an individual.

MatricesRecommendations Plan: 1.Process people's ratings of various papers and store in NumPy array 2.Introduce two similarity measures 3.Generate recommendations

MatricesRecommendations Input is triples: person, paper, score This is (very) sparse data So store it in a dictionary raw_scores = { 'Bhargan Basepair' : { 'Jackson 1999' : 2.5, 'Chen 2002' : 3.5, }, 'Fan Fullerene' : { 'Jackson 1999' : 3.0,... Should use DOI or some other unique ID

MatricesRecommendations Turn this dictionary into a dense array (SciPy contains sparse arrays for large data sets) Bhargan Basepair Fan Fullerene... Jackson 1999 Chen raw_scores = { 'Bhargan Basepair' : { 'Jackson 1999' : 2.5, 'Chen 2002' : 3.5, }, 'Fan Fullerene' : { 'Jackson 1999' : 3.0,...

MatricesRecommendations def prep_data(all_scores): # Names of all people in alphabetical order. people = all_scores.keys() people.sort() # Names of all papers in alphabetical order. papers = set() for person in people: for title in all_scores[person].keys(): papers.add(title) papers = list(papers) papers.sort()

MatricesRecommendations def prep_data(scores):... # Create and fill array. ratings = numpy.zeros((len(people), len(papers))) for (person_id, person) in enumerate(people): for (title_id, title) in enumerate(papers): r = scores[person].get(title, 0) ratings[person_id, title_id] = float(r) return people, papers, ratings

MatricesRecommendations Next step is to compare sets of ratings Many ways to do this We will consider: –Inverse sums of squares –Pearson correlation coefficient Remember: 0 in matrix means "no rating" Doesn't make sense to compare ratings unless both people have read the paper Limit our metrics by masking the array

MatricesRecommendations Inverse sum of squares uses the distance between N-dimensional vectors In 2D, this is: distance 2 =(x A -x B ) 2 + (y A -y B ) 2 Define similarity as: sim(A, B)=1 / (1 + norm(D)) where D is the vector of differences between the papers that both have rated No diffs=>sim(A, B) = 1 Many/large diffs=>sim(A, B) is small

MatricesRecommendations def sim_distance(prefs, left_index, right_index): # Where do both people have preferences? left_has_prefs = prefs[left_index, :] > 0 right_has_prefs = prefs[right_index, :] > 0 mask = numpy.logical_and(left_has_prefs, right_has_prefs) # Not enough signal. if numpy.sum(mask) < EPS: return 0

MatricesRecommendations def sim_distance(prefs, left_index, right_index):... # Return sum-of-squares distance. diff = prefs[left_index, mask] – prefs[right_index, mask] sum_of_squares = numpy.linalg.norm(diff) ** 2 return 1/(1 + sum_of_squares)

MatricesRecommendations What if two people rate many of the same papers but one always rates them lower than the other? If they rank papers the same, but use a different scale, we want to report that they rate papers the same way Pearson’s Correlation reports the correlation between two individuals rather than the absolute difference.

MatricesRecommendations Pearson’s Correlation Score measures the error of a best fit line between two individuals. Greg Wilson Tommy Guy Each dot is a paper, and its X and Y values correspond to the ratings of each individual. In this case, the score is high, because the line fits quite well.

MatricesRecommendations To calculate Pearson’s Correlation, we need to introduce two quantities: The standard deviation is the divergence from the mean: StDev(X)=E(X 2 ) – E(X) 2 The covariance measures how two variables change together: Cov(X,Y)=E(XY) – E(X)E(Y)

MatricesRecommendations Pearson's Correlation is: r=Cov(X,Y) / ( StdDev(X) * StdDev(Y) ) Use NumPy to calculate both terms

MatricesRecommendations If a and b are Nx1 arrays, then numpy.cov(a,b) returns an array of results Variance (a)Covariance (a,b) Variance (b) Use it to calculate numerator and denominator

MatricesRecommendations def sim_pearson(prefs, left_index, right_index): # Where do both have ratings? rating_left = prefs[left_index, :] rating_right = prefs[right_index, :] mask = numpy.logical_and(rating_left > 0, rating_right > 0) # Summing over Booleans gives number of Trues num_common = sum(mask) # Return zero if there are no common ratings. if num_common == 0: return 0

MatricesRecommendations def sim_pearson(prefs, left_index, right_index):... # Calculate Pearson score "r" varcovar = numpy.cov(rating_left[mask], rating_right[mask]) numerator = varcovar[0, 1] denominator = sqrt(varcovar[0, 0]) * sqrt(varcovar[1,1]) if denominator < EPS: return 0 r = numerator / denominator return r

MatricesRecommendations Now that we have the scores we can: 1.Find people who rate papers most similarly 2.Find papers that are rated most similarly 3.Recommend papers for individuals based on the rankings of other people and their similarity with this person's previous rankings

MatricesRecommendations To find individuals with the most similar ratings, apply a similarity algorithm to compare each person to every other person Sort the results to list most similar people first

MatricesRecommendations def top_matches(ratings, person, num, similarity): scores = [] for other in range(ratings.shape[0]): if other != person: s = similarity(ratings, person, other) scores.append((s, other)) scores.sort() scores.reverse() return scores[0:num]

MatricesRecommendations Use the same idea to compute papers that are most similar Since both similarity functions compare rows of the data matrix, we must transpose it And change names to refer to papers, not people

MatricesRecommendations def similar_items(paper_ids, ratings, num=10): result = {} ratings_by_paper = ratings.T for item in range(ratings_by_paper.shape[0]): temp = top_matches(ratings_by_paper, item, num, sim_distance) scores = [] for (score, name) in temp: scores.append((score, paper_ids[name])) result[paper_ids[item]] = scores return result

MatricesRecommendations Finally suggest papers based on their rating by people who rated other papers similarly Recommendation score is the weighted average of paper scores, with weights assigned based on the similarity between individuals Only recommend papers that have not been rated yet

MatricesRecommendations def recommendations(prefs, person_id, similarity): totals, sim_sums = {}, {} num_people, num_papers = prefs.shape for other_id in range(num_people): # Don't compare people to themselves. if other_id == person_id: continue sim = similarity(prefs, person_id, other_id) if sim < EPS: continue

MatricesRecommendations def recommendations(prefs, person_id, similarity):... for other_id in range(num_people):... for title in range(num_papers): # Only score papers person hasn't seen yet. if prefs[person_id, title] < EPS and \ prefs[other_id, title] > 0: if title in totals: totals[title] += sim * \ prefs[other_id, title] else: totals[title] = 0

MatricesRecommendations def recommendations(prefs, person_id, similarity):... for other_id in range(num_people):... for title in range(num_papers):... # Sum of similarities if title in sim_sums: sim_sums[title] += sim else: sim_sums[title] = 0

MatricesRecommendations def recommendations(prefs, person_id, similarity):... # Create the normalized list rankings = [] for title, total in totals.items(): rankings.append((total/sim_sums[title], title)) # Return the sorted list rankings.sort() rankings.reverse() return rankings

MatricesRecommendations Major points: 1.Mathematical operations on matrix were all handled by NumPy 2.We still had to take care of data (re)formatting

November 2010 created by Richard T. Guy Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information.