Presentation is loading. Please wait.

Presentation is loading. Please wait.

Study of the parallel techniques for dimensionality reduction and its impact on quality of the text processing algorithms Marcin Pietroń 1,2, Maciej Wielgosz.

Similar presentations


Presentation on theme: "Study of the parallel techniques for dimensionality reduction and its impact on quality of the text processing algorithms Marcin Pietroń 1,2, Maciej Wielgosz."— Presentation transcript:

1 Study of the parallel techniques for dimensionality reduction and its impact on quality of the text processing algorithms Marcin Pietroń 1,2, Maciej Wielgosz 1,2, Michał Karwatowski 1,2, Kazimierz Wiatr 12 1 AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, 2 ACK Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków RUC 17-18.09.2015 Kraków

2 Agenda Text classification System architecture Metrics Dimensionality reduction Experiments and results Conclusions and future work 2

3 Text classification 3 Very useful and popular problem in internet and big data processing Real time processing requirement Preceded by text preprocessing Clustering as a one of a few techniques which helps text classification

4 System architecture 4 Text pre- processing Dictionary and model transformation SVD K-means

5 System architecture Document corpus generation (e.g. crawler) Text preprocessing (implemented by gensim library, lemmatization, stoplist etc.) SVD K-means as clustering method (clustering documents to chosen domains) 5

6 Quality metrics 6

7 Entropy 7

8 Dimensionality reduction 8

9 9

10 Results and experiments 10 number of clusters PrecisionrecallF-measure business 3.9(0.3)0.81(0.022)0.56(0.077)0.66(0.034) culture 3(0)0.37(0.015)0.7(0.061)0.48(0.024) automotive 4.8(0.4)0.39(0.007)0.56(0.021)0.45(0.01) science 2.1(0.3)0.39(0.014)0.74(0.016)0.51(0.014) sport 4.8(0.4)0.39(0.007)0.56(0.021)0.45(0.01) employed algorithms Entropy vsm+kmeans 0.28(0.012) vsm+tfidf+kmeans 0.17(0.019) vsm+tfidf+svd+kmeans 0.16(0.006)

11 Results and experiments 11

12 GPU implementation 12 reduction sizeGPGPU [ms]CPU [ms] 103380 2077305 30107420 40161624 NVIDIA tesla m2090Intel Xeon e5645

13 Conclusions and future work Applying more algorithms lowers entropy GPU can efficiently reduce time of text classification Random projection hardware implementation K-means GPU acceleration 13

14 Questions 14


Download ppt "Study of the parallel techniques for dimensionality reduction and its impact on quality of the text processing algorithms Marcin Pietroń 1,2, Maciej Wielgosz."

Similar presentations


Ads by Google