Download presentation
Presentation is loading. Please wait.
Published byJesse Gaines Modified over 9 years ago
1
C LUSTERING FOR T AXONOMY E VOLUTION By -Anindya Das - Sneha Bankar
2
P ROBLEM STATEMENT Problem -Due to lack of correct category many a times products are placed in the wrong category -This could be an indication of taxonomy evolution Solution -Clustering products based on product descriptions
3
T AXONOMY EVOLUTION Camera & Photo LensesFlashesDigital Cameras Compact System Camera/ Digital SLR Cameras Point & Shoot Cameras/ Digital SLR Cameras
4
T AXONOMY EVOLUTION Camera & Photo LensesFlashesDigital Cameras Compact System Camera Digital SLR Camera Point & Shoot Cameras
5
FEATURE E XTRACTION Use product description as features Brand Removal Stemming Use of unigrams and bigrams Feature Weighing based on Term Frequency Feature Weighing based on TFIDF
6
HIERARCHICAL AGGLOMERATIVE CLUSTERING Initially, each item is considered a cluster. The closest pair is chosen. Those two clusters are merged. Each iteration reduces one cluster. Continues till terminating condition satisfies. No. of clusters Inter cluster Distance UPGMA used for measuring cluster distance.
7
DISTANCE MEASURES
8
K-M EANS Select K initial centroids Assign data points(ASIN feature vector) to the centroids based on distances Update Mean for the Centroids Re-assign and update the centroids till data points can be re- assigned
9
EXECUTION PIPELINE Data Preprocessor Feature Extraction Engine Clustering Engine Cluster Evaluation Engine
10
CLUSTER EVALUATION How many items in a cluster are talking about the top most frequent features of a cluster? Precision = true positives / (true positives + false positives) Recall = true positives /( true positives + false negatives)
11
RESULTS Precision Values Recall values for all cases lie between 20% to 30% HACK-Means Dataset 195%92% Dataset 292%96% Dataset 393%90%
12
FUTURE WORK Mining topics from product descriptions using them as features Approach to detect outliers and merge them to form a new category Use of association rule mining for evaluation instead of top frequent words
13
R EFERENCES http://en.wikipedia.org/wiki/Hierarchical_clusteri ng http://en.wikipedia.org/wiki/K-means_clustering Liu, Tao. "An Evaluation on Feature Selection for Text Clustering." N.p., 2003. Web.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.