Presentation is loading. Please wait.

Presentation is loading. Please wait.

C LUSTERING FOR T AXONOMY E VOLUTION By -Anindya Das - Sneha Bankar.

Similar presentations


Presentation on theme: "C LUSTERING FOR T AXONOMY E VOLUTION By -Anindya Das - Sneha Bankar."— Presentation transcript:

1 C LUSTERING FOR T AXONOMY E VOLUTION By -Anindya Das - Sneha Bankar

2 P ROBLEM STATEMENT Problem -Due to lack of correct category many a times products are placed in the wrong category -This could be an indication of taxonomy evolution Solution -Clustering products based on product descriptions

3 T AXONOMY EVOLUTION Camera & Photo LensesFlashesDigital Cameras Compact System Camera/ Digital SLR Cameras Point & Shoot Cameras/ Digital SLR Cameras

4 T AXONOMY EVOLUTION Camera & Photo LensesFlashesDigital Cameras Compact System Camera Digital SLR Camera Point & Shoot Cameras

5 FEATURE E XTRACTION  Use product description as features  Brand Removal  Stemming  Use of unigrams and bigrams  Feature Weighing based on Term Frequency  Feature Weighing based on TFIDF

6 HIERARCHICAL AGGLOMERATIVE CLUSTERING Initially, each item is considered a cluster. The closest pair is chosen. Those two clusters are merged. Each iteration reduces one cluster. Continues till terminating condition satisfies. No. of clusters Inter cluster Distance UPGMA used for measuring cluster distance.

7 DISTANCE MEASURES

8 K-M EANS  Select K initial centroids  Assign data points(ASIN feature vector) to the centroids based on distances  Update Mean for the Centroids  Re-assign and update the centroids till data points can be re- assigned

9 EXECUTION PIPELINE Data Preprocessor Feature Extraction Engine Clustering Engine Cluster Evaluation Engine

10 CLUSTER EVALUATION How many items in a cluster are talking about the top most frequent features of a cluster? Precision = true positives / (true positives + false positives) Recall = true positives /( true positives + false negatives)

11 RESULTS Precision Values Recall values for all cases lie between 20% to 30% HACK-Means Dataset 195%92% Dataset 292%96% Dataset 393%90%

12 FUTURE WORK Mining topics from product descriptions using them as features Approach to detect outliers and merge them to form a new category Use of association rule mining for evaluation instead of top frequent words

13 R EFERENCES http://en.wikipedia.org/wiki/Hierarchical_clusteri ng http://en.wikipedia.org/wiki/K-means_clustering Liu, Tao. "An Evaluation on Feature Selection for Text Clustering." N.p., 2003. Web.


Download ppt "C LUSTERING FOR T AXONOMY E VOLUTION By -Anindya Das - Sneha Bankar."

Similar presentations


Ads by Google