Download presentation
Presentation is loading. Please wait.
Published byCharlene Andrews Modified over 9 years ago
1
Text Analytics Teradata & Sabanci University April, 2015
2
2 TEXT DATA #2 Sentiment Analysis #3 Relation between words #4 Association Rules between events #1 Topic Categorization Text analytics scope at a glance
3
3 Main Project Diagram
4
4 #1 Topic Categorization
5
5 2 Main Categories ŞikayetBilgi/İşlem 19 Subcategories Ürün/Servis Kampanya/ Paket......... Fatura ve Yükleme Topic Categorization
6
6 Dimension Reduction ◦ Data ◦ High in volume ◦ Noisy ◦ For each category, significant words found (by using Aster tf_idf function) ◦ Project the data into the space of these words: ◦ Decrease the noise ◦ Thus, differentiate the classes better SAMPLE Topic Categorization
7
7 Main Category Classification We applied Naive Bayes Text Classification
8
8 Subcategory Classification We applied Naive Bayes Text Classification
9
9 2 Main Categories 19 Subcategories Topic Categorization
10
10 #2 Sentiment Analysis
11
11 Sentiment Classification Model Whole dataset is randomly splitted as train and test sets The model is unbiasedly evaluated on the unseen test set Aster’s Random Forest Function is available to apply to our Turkish Sentiment Features.
12
12 Turkish Model
13
13 Does this call have NEGATIVE sentiment? Classify the call center messages into two: – Negative – Neutral (and Positive) Train and test sets are different for each evaluation: – The accuracy is not dependent on a specific test set
14
14 Regression Model Same dataset used for the sentiment classification Output is different; not classes, but real values These values scale the negativity of the customers
15
15 Negative Sentiment Breakdown Threshold values were chosen by using 68–95–99.7 rule in statistics ( three-sigma rule of thumb heuristic supports this). Our thresholds are set as mean+(-)2*standart deviation Used for social sciences and business
16
16 #3 Relation Extraction
17
17 Relations between Words Relation scores are calculated between words It is based on confidence values. – Score(a,b) = confidence(a,b)*confidence(b*a) Words were filtered: – We selected the words with the highest tf*idf values Words are clustered in terms of their relations Clusters are represented in a graph Cfilter (in Aster)function has been applied for this process.
18
18 Complaining Customers Graph Selected Cluster
19
19 Closer Look to the Cluster The cluster seems to indicate the customers facing internet-related problems Connection problem, or Internet package is not sufficient?
20
20 Complaint Word Cloud (Aster Lens)
21
21 Information&Operation Word Cloud
22
22 Bigram Cloud (ngram function)
23
23 #4 Association Rules between Events
24
24 Association Rule Mining Topic A with a Sentiment score Topic B with a Sentiment score Topic C Confidence(A, B --> C)
25
25 QUESTIONS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.