1 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text.

Slides:



Advertisements
Similar presentations
This document contains information and data that AAUM considers confidential. Any disclosure of Confidential Information to, or use of it by any other.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Market Research Ms. Roberts 10/12. Definition: The process of obtaining the information needed to make sound marketing decisions.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Data Mining in Industry: Putting Theory into Practice Bhavani Raskutti.
Ensuring big data is supporting financial analytics Gaining a thorough understanding of big data in order to understand analytics Transforming unstructured.
SOCIAL MEDIA FOR CONSUMER INSIGHT Chapter Chapter Objectives  Describe the types of data used in social media research  Explain the different.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Frank Yu Australian Bureau of Statistics Unstructured Data 1.
Information Retrieval in Practice
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Compression Word document: 1 page is about 2 to 4kB Raster Image of 1 page at 600 dpi is about 35MB Compression Ratio, CR =, where is the number of bits.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
CES 514 – Data Mining Lecture 8 classification (contd…)
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Distributed Representations of Sentences and Documents
Part I: Classification and Bayesian Learning
Overview of Search Engines
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
#ATYC Ask The Young Cow What’s the difference between Marketing, Digital Marketing & Internet Marketing? If you’re.
Basic Marketing Research Customer Insights and Managerial Action
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
This week: overview on pattern recognition (related to machine learning)
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Text mining.
Social Media is: ? Social Media: are media designed to be disseminated through social interaction, created using highly accessible and scalable publishing.
AB 219 Marketing Unit Eight The Promotion Mix Components Note: This seminar will be recorded by the instructor.
Smart Social Media: I have LinkedIn/Facebook/Twitter but … April 11, 2012.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 5. Document Representation and Information Retrieval.
What is Big Data and Why Do We Need it?
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
CSC 594 Topics in AI – Text Mining and Analytics
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
CSC 594 Topics in AI – Text Mining and Analytics
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Reporter: Shau-Shiang Hung( 洪紹祥 ) Adviser:Shu-Chen Cheng( 鄭淑真 ) Date:99/06/15.
E-commerce Marketing Communication
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
Leveraging Social Media Analytics to Protect the Brand, Improve Products and enhance Operational Performance Derive business value from unstructured data.
Course : Study of Digital Convergence. Name : Srijana Acharya. Student ID : Date : 11/28/2014. Big Data Analytics and the Telco : How Telcos.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Applied Analytics in Business Plans Lessons learnt during the SRC15 business planning process Robert Murray – Scottish Water Analytics Team Leader – 27th.
Marketing Research.
Oracle Advanced Analytics
Discussion for: Broadening the data base for deepening the focus
Business Analytics Social Media Channels
Big Data.
Business Analytics Applications in Budget Modelling
Application of Classification and Clustering Methods on mVoC (Medical Voice of Customer) data for Scientific Engagement Yingzi Xu, Department of Statistics,
Power of Social Media Analytics
Learning with information of features
iSRD Spam Review Detection with Imbalanced Data Distributions
Fusion Tomo Staff Training
Text Mining & Natural Language Processing
Rose Harr CEO - BlueWare Group of Company
Machine Learning Support Vector Machine Supervised Learning
Recommender Systems: Collaborative & Content-based Filtering Features
Presentation transcript:

1 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

2 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

3 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound s Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

4 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound s Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

5 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound s Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

6 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound s Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

7 Text Analytics & Big Data Data used for Analytics NowOther Data Available Customer Data Demographics Usage summary Product Usage Traditional customer feedback Surveys Customer complaints Inbound s Transactional data Usage records Sales receipts Outputs from sensors Service assurance Social media data Facebook discussions Twitter feeds Blogs Youtube videos Product Data Mix & usage Access Device Data GPS & locale data …… Linear growthExponential growth

8 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

9 New Opportunities with Text Analytics Mine freely available social media data for: Understanding customer sentiment Identifying major customer concerns Tracking sentiment/issues over time Business implications: Ability to act on negative sentiments quickly Respond to customer concerns in a timely manner Target initiatives appropriately by continuous tracking Superior market research & focus group outcomes

10 Sentiment Analysis Methodology: Score based on positive & negative sentiment words OR Use supervised learning with labelled examples New Opportunities No sarcasm detection

11 Topic Detection Methodology: 1.Create term frequency matrix from text sequences 2.Use un-supervised learning to create clusters 3.Create cluster descriptions New Opportunities

12 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

13 Challenges in Text Analytics 1.Creating term frequency matrix for machine learning –One row for each entry –One column for each term/feature describing the entries Treat non-alpha as white space Case-insensitive Term = word

14 1. Term Frequency Matrix Challenges Presence of non-informative words Different forms of the same words Spelling error & typos Synonyms Homonyms

15 2. Very Large Feature Space Challenges Many different terms within a single entry –10 4 features with just 50 to 100 entries –Sparse entries: Many zeros in the martrix Unsupervised learning –Hard to form cohesive clusters with sparse entries Supervised learning –Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature

16 Text Analytics for Unlocking the Potential of Big Data Bhavani Pacific Brands 5 1 Text analytics & big data 2 New opportunities with text analytics 3 Challenges when mining text 4 Solutions to overcome challenges Wrap-up

17 1. Term Frequency Matrix Solutions Presence of non-informative words –Create a list of stopwords –Remove them from consideration Different forms of the same words –Use rule based stemming to remove suffix Spelling error & typos –Use some spell-checker OR –Use n-grams (character sequences) as features 5-grams for 'single bill': 'singl', 'ingle', 'ngle ', 'gle b', 'le bi', 'e bil‘, ' bill' Synonyms –Use a thesaurus (manual or statistical) Homonyms –Provide context by using word pair or triplets as features

18 2. Very Large Feature Space Solutions Use feature selection to identify significant features Features are of 3 types: –Very frequent low information content (e.g., stopwords) –Infrequent low information content (occurs once/twice in the set) –Significant middle frequency features Many statistical techniques –Inverse document frequency weight –signal-noise ratio –Average discrimination value –…–… Unsupervised learning Hard to form cohesive clusters with sparse entries

19 2. Very Large Feature Space (Cont’d) Solutions Use new techniques based on maximal margin separators that can handle large feature space Support Vector Machines Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature

20 Support Vector Machines Solutions Customers who Churned to other providers Customers who are loyal Objective: To learn a separator to identify people likely to churn before they do

21 Support Vector Machines Solutions What is a good separator? Maximises margin between two parallel supporting hyperplanes Separator depends on support vectors

22 Support Vector Machines Solutions Why does maximising margins work? Small margin means more choice & overfits data Large margin means less choice & no overfitting

23 2. Very Large Feature Space (Cont’d) Solutions Use new techniques based on maximal margin separators that can handle large feature space Support Vector Machines –Maximises margin between two classes –Separator depends only on support vectors –Separator obtained using quadratic programming Available in some statistical packages Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature

24 Wrap-up Text analytics creates new opportunities for businesses to understand their customers –Understanding customer sentiment –Identifying major customer concerns –Tracking sentiment/issues over time A few challenges in implementing text analytics –Creating term frequency matrix from text sequence –Large number of features in matrix Many techniques to overcome these challenges Now is the time to use text analytics to unlock the potential of big data in your business!!