Download presentation
Presentation is loading. Please wait.
Published byMorten Villadsen Modified over 5 years ago
1
Unsupervised Machine Learning: Clustering Assignment
2
Assignment Investigate how an unsupervised machine learning k-means clustering algorithm performs in a text classification task. Change input parameters and then compare the k-means cluster results side-by-side with the human-annotated results.
3
Assignment: About the Data
The tool takes a set of unlabeled business articles (published by and groups the data using the k-means clustering algorithm based on term frequency. For comparison, the tool also shows the same set of articles with their human-annotated labels. Subject matter experts have classified these documents into five main topics: finance, management, marketing, public policy, and technology.
4
Link to Assignment Click the Link to Platform button or use the following link to directly access the assignment:
5
Assignment: Select Parameters
Select the Number of Clusters for the k-means algorithm. Select the total Number of Documents. Click the Run Process button.
6
Assignment: Compare Results
Compare the machine learning clusters with the human-annotated topics (shown in greyscale). Business articles classified according to main topic by human annotators. The same set of business articles grouped into clusters using unsupervised machine learning.
7
Assignment: Rerun the Process
Note that even with the same parameters selected, the results change when you rerun the process. This is because the k-means algorithm randomly assigns the initial centroid each time, and this ultimately affects the resulting clusters.
8
Assignment: Review the Terms
The results allow you to examine the top 10 weighted words in each cluster. The words can be interesting to analyze given that the clusters typically do not exactly match the human-annotated classification of the documents.
9
Assignment: Assessment
Are you able to discern any patterns from the clusters that the k-means algorithm generates? (Hint: note geographic terms.) What have you learned from this side-by-side comparison between unsupervised machine learning clustering and human-annotated classification?
10
Background Information on Tool
To create our unsupervised machine learning text classification visualization tool, we used Scikit-learn, a free software machine learning library for the Python programming language. A tf-idf weighting method was employed. We used k-means as the process for clustering, and k-means++ as the initialization method for centroid selection.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.