Sorting Music, Intelligently

Sorting Music, Intelligently

Mathematical models for song recommendation engines
Algorithms by Blake Ottinger, Kevin Todd, Steven Tucker

1. What makes music likeable?
Music tastes are often highly subjective, at times elusive. Recommendation engines rely on analysis of the attributes of music that make one song “similar” to another.

Contextual clues Categorization by genre (broad classification)
Classification by artist. Classification by year (or era, e.g. 70s, 80s, etc.) Classification by tags. Tags are manually created when they are entered into the database. Songs are classified with multiple tags. Because tags are human-entered for each song, the accuracy of these tags is critical.

Database Formatted with the following structure:
Artist - Song Title, Year, Genre, Listens, (Tag1, tag2, tag3) Listens is a count of current listens on Spotify. We have a total of about 20 possible tags in the database. We have taken care to ensure the accuracy of these tags. These songs are parsed within Python into a list from this file, with each item in the list representing one song object.

Song Database We have songs stored in a formatted text file, available for the algorithm to choose from.

INFORMAL PROBLEM STATEMENT
Design a machine-learning algorithm to efficiently sort through a finite database of music in a txt file format, ranking songs with a similarity variable based on the comparison of user inputted music preferences and chosen song data.

FORMAL PROBLEM STATEMENT
Given a set of S songs, distribute each item from set (songs) into a distinct list given the song’s genre, tag, and era. Given a set of Q user preferred data, iterate through S and produce songs with high similarity rating (calculated below) and append songs into a list of suggested songs. Find the similarity of a suggested song by finding a percentage of matched data, m, over total compared data, n, all multiplied by By printing results, present the data containing song suggestions that had the highest similarity calculated to the user.

One problem. Multiple implementations.
2. Ways it’s been done One problem. Multiple implementations. Cue - Steven.

Computers can’t listen to music…
Music tastes are human by nature, can be highly subjective. But computer scientists have found a way to algorithmically generate accurate and optimized suggestions. Algorithms rely on a numerical representation of a song’s attributes. Algorithms must mathematically evaluate the data from these attributes, and make educated guesses based on complex tastes. As a result, different algorithms often have varied results, and may be practical for a different set of purposes.

Who Else Has Done It? Pandora Spotify Apple Music

3. Our Approach Our three algorithms that take different approaches to optimize music suggestions Cue - Kevin

K-Nearest Neighbors Converts categorical string data in numerical data that can be manipulated. Plots converted data onto Cartesian graph Determines focus points (P) Draws a radius around focus points (K) Derives similarity by Euclidean distance

Converting Data Converting data is crucial to this algorithm to retain accuracy X-axis: Metal - Electronic Y-axis: Rage - Cheesy

Converting Data Courtesy of MusicMap.info

Plotting Data and Determining Focus Points
Songs are plotted by their primary genre (x-axis) and a mathematical average of their tags (y-axis) User preference is split into three points of focus. Each K is determined by the users primary preference, secondary, and third.

Converting Data

Determining Similarity
Using Euclidean distance from the focus points to potential points (within K) Points with a closer distance are of higher similarity

Results

Pros and Cons Pros Consistent Fast with small n value
Easy to implement different test data Cons Shallow Optimization Dependent on graph setup Slow with large n value

Probability Sliders Creates a list of user preferences for genres, tags, and eras. Slider: Each genre, tag, and era has a float (0, 1) associated to it, representing a % of how much the user likes each label. If a potential new song meets the requirements, the song’s Sliders are updated in the user Sliders. Cue - Steven

User Sliders Initialization
Initial user Slider values based on what songs they say they like. If 2 songs out of a total 10 songs have the tag “happy”, the happy Slider will be set to 0.20 Repeat for each era, genre, and other tags Sliders not given a value will be assigned 0.10 to increase diversity

User Sliders Visualization
Sliders generated by user input of songs desired. For testing, arbitrary and random values were used. In reality, the Sliders would be automatically created based on recorded data of what the user listens to.

Evaluating a Potential Song
S ← song Slider, U ← user Slider For each era, genre, and tag, compare random float (from random.random() ) to U If random < U, add that random to S Else, add that random / 4 to S for a small chance to be accepted

O(g + e + t) g ← number of genres e ← number of eras
t ← number of tags Algorithm: O(n), where n is the number of songs to be accepted

Accept / Decline If random.random() < Fitness, the song is accepted as a suitable song! Increases user Sliders by a % Ideally adds song to playlist Else, decline song For each song label, set the user Slider to 95% of its original value Continue to evaluate next song.

Percentage Sliders Pros: Easy to visualize
Works well with mid-size user databases Quick comparisons Easy manipulations Cons: Selecting 1 song will not be enough Up to random probability Less likely to accept as you request more songs Takes genre, era, and tags equally into account.

Multiple Correspondence Analysis [MCA]
Each attribute for a song is converted to a numeric value, representing a comparison to the user’s preferences. Higher comparison values = higher similarity. Compute these values mathematically into a single “similarity score” for the song. This score represents the likeability of the song. The highest scores are ranked first, and are picked for recommendations to the user. Cue - Blake.

Measuring User Preferences
We need to know how much artists, genres, eras, and tags should weigh, respectively. (How important is each factor to the user) We gather slightly redundant user preference data to do this. We aggregate this data into sets A and B. MCA then measures the correlation along each column for data sets A and B, and then sets the weight for these attributes according to the correlation.

Mathematical Correlation Analysis (cont.)
A consistency value of 10 indicates a strong correlation. Strong correlation indicates that the attribute type is more important to the user, and should be weighed more heavily.

Example: Measuring correlation
Data set A (primitive data): Data set B (favorite Songs): Favorite Genre Favorite artists Preferred Tags Preferred Era: Sam Rock AJR, Rush Upbeat, happy 80s Song Title Genre Artist Tags Year Round and Round Rock 3 doors down Upbeat, catchy 2011 Clocks Pop Coldplay Chill, deep, hits 2002

MCA - Computing final similarity
We represent each attribute numerically. Initializes to a 1 if it matches user preferences. 0 otherwise. The correlation scores calculated previously determine how much genre, artist, era, and tags should weigh, respectively.

MCA Pros and Cons Very deep analysis, potential for good results
Can determine the weight that should be applied to genre, tags, artists, and era. Is moderately “self-correcting” due input data redundancy. Can be computationally expensive Requires redundant input data. User has to supply a list of their favorite songs to the algorithm Expected Efficiency: O(n * k) * m Where N is the number of database songs. M is the number of users to run for. K is the number of songs that the user provides as favorite songs.

MCA: Real Example Output:

Results

What can we begin to take away?
4. Conclusion What can we begin to take away? Cue - Kevin

Future Work Implement user feedback for all algorithms
Expand K-Nearest Neighbors Optimize x and y-axis Expand Percentage Sliders Add mutations to slightly increase/decrease the chance a song is accepted Directly editable Sliders to manually change preferences Expand MCA Try different data rankings Cue - Kevin

Conclusion Different implementations of music optimization call for different approaches Deep analysis, continuous user tracking:MCA Medium level analysis: Percentage Sliders Shallow analysis, Initial user preference: K-Nearest Neighbors

5 Questions 1.) How to determine if optimization is accurate when analyzing subjective things like music taste? 2.) How should the algorithmic approach change with different sizes of data sets? 3.) How do we determine how songs are tagged? 4.) How to determine which data to derive from the user and how to best utilize that data? 5.) What would be the most effective way to implement user feedback for improvement?

Sorting Music, Intelligently

Similar presentations

Presentation on theme: "Sorting Music, Intelligently"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sorting Music, Intelligently

Similar presentations

Presentation on theme: "Sorting Music, Intelligently"— Presentation transcript:

Similar presentations

About project

Feedback