Download presentation
Presentation is loading. Please wait.
Published bySabrina Singleton Modified over 9 years ago
1
Study of the parallel techniques for dimensionality reduction and its impact on quality of the text processing algorithms Marcin Pietroń 1,2, Maciej Wielgosz 1,2, Michał Karwatowski 1,2, Kazimierz Wiatr 12 1 AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, 2 ACK Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków RUC 17-18.09.2015 Kraków
2
Agenda Text classification System architecture Metrics Dimensionality reduction Experiments and results Conclusions and future work 2
3
Text classification 3 Very useful and popular problem in internet and big data processing Real time processing requirement Preceded by text preprocessing Clustering as a one of a few techniques which helps text classification
4
System architecture 4 Text pre- processing Dictionary and model transformation SVD K-means
5
System architecture Document corpus generation (e.g. crawler) Text preprocessing (implemented by gensim library, lemmatization, stoplist etc.) SVD K-means as clustering method (clustering documents to chosen domains) 5
6
Quality metrics 6
7
Entropy 7
8
Dimensionality reduction 8
9
9
10
Results and experiments 10 number of clusters PrecisionrecallF-measure business 3.9(0.3)0.81(0.022)0.56(0.077)0.66(0.034) culture 3(0)0.37(0.015)0.7(0.061)0.48(0.024) automotive 4.8(0.4)0.39(0.007)0.56(0.021)0.45(0.01) science 2.1(0.3)0.39(0.014)0.74(0.016)0.51(0.014) sport 4.8(0.4)0.39(0.007)0.56(0.021)0.45(0.01) employed algorithms Entropy vsm+kmeans 0.28(0.012) vsm+tfidf+kmeans 0.17(0.019) vsm+tfidf+svd+kmeans 0.16(0.006)
11
Results and experiments 11
12
GPU implementation 12 reduction sizeGPGPU [ms]CPU [ms] 103380 2077305 30107420 40161624 NVIDIA tesla m2090Intel Xeon e5645
13
Conclusions and future work Applying more algorithms lowers entropy GPU can efficiently reduce time of text classification Random projection hardware implementation K-means GPU acceleration 13
14
Questions 14
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.