Download presentation
Presentation is loading. Please wait.
Published byAimee Bobbitt Modified over 9 years ago
1
1 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up
2
2 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up
3
3 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth
4
4 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth
5
5 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth
6
6 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth
7
7 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound emails Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth
8
8 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up
9
9 New Opportunities with Text Analytics Mine freely available social media data for: Understanding customer sentiment Identifying major customer concerns Tracking sentiment/issues over time Business implications: Ability to act on negative sentiments quickly Respond to customer concerns in a timely manner Target initiatives appropriately by continuous tracking Superior market research & focus group outcomes
10
10 Sentiment Analysis Methodology: Score based on positive & negative sentiment words OR Use supervised learning with labelled examples New Opportunities No sarcasm detection
11
11 Topic Detection Methodology: 1.Create term frequency matrix from text sequences 2.Use un-supervised learning to create clusters 3.Create cluster descriptions New Opportunities
12
12 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up
13
13 Challenges in Text Analytics 1.Creating term frequency matrix for machine learning –One row for each entry –One column for each term/feature describing the entries Treat non-alpha as white space Case-insensitive Term = word
14
14 1. Term Frequency Matrix Challenges Presence of non-informative words Different forms of the same words Spelling error & typos Synonyms Homonyms
15
15 2. Very Large Feature Space Challenges Many different terms within a single entry –10 4 features with just 50 to 100 entries –Sparse entries: Many zeros in the martrix Unsupervised learning –Hard to form cohesive clusters with sparse entries Supervised learning –Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature
16
16 Text Analytics for Unlocking the Potential of Big Data Bhavani Raskutti @ Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up
17
17 1. Term Frequency Matrix Solutions Presence of non-informative words –Create a list of stopwords –Remove them from consideration Different forms of the same words –Use rule based stemming to remove suffix Spelling error & typos –Use some spell-checker OR –Use n-grams (character sequences) as features 5-grams for 'single bill': 'singl', 'ingle', 'ngle ', 'gle b', 'le bi', 'e bil‘, ' bill' Synonyms –Use a thesaurus (manual or statistical) Homonyms –Provide context by using word pair or triplets as features
18
18 2. Very Large Feature Space Solutions Use feature selection to identify significant features Features are of 3 types: –Very frequent low information content (e.g., stopwords) –Infrequent low information content (occurs once/twice in the set) –Significant middle frequency features Many statistical techniques –Inverse document frequency weight –signal-noise ratio –Average discrimination value –…–… Unsupervised learning Hard to form cohesive clusters with sparse entries
19
19 2. Very Large Feature Space (Cont’d) Solutions Use new techniques based on maximal margin separators that can handle large feature space Support Vector Machines Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature
20
20 Support Vector Machines Solutions Customers who Churned to other providers Customers who are loyal Objective: To learn a separator to identify people likely to churn before they do
21
21 Support Vector Machines Solutions What is a good separator? Maximises margin between two parallel supporting hyperplanes Separator depends on support vectors
22
22 Support Vector Machines Solutions Why does maximising margins work? Small margin means more choice & overfits data Large margin means less choice & no overfitting
23
23 2. Very Large Feature Space (Cont’d) Solutions Use new techniques based on maximal margin separators that can handle large feature space Support Vector Machines –Maximises margin between two classes –Separator depends only on support vectors –Separator obtained using quadratic programming Available in some statistical packages Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature
24
24 Wrap-up Text analytics creates new opportunities for businesses to understand their customers –Understanding customer sentiment –Identifying major customer concerns –Tracking sentiment/issues over time A few challenges in implementing text analytics –Creating term frequency matrix from text sequence –Large number of features in matrix Many techniques to overcome these challenges Now is the time to use text analytics to unlock the potential of big data in your business!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.