Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo.

Slides:



Advertisements
Similar presentations
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Advertisements

Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
Pollyanna Gonçalves (UFMG, Brazil) Matheus Araújo (UFMG, Brazil) Fabrício Benevenuto (UFMG, Brazil) Meeyoung Cha (KAIST, Korea) Comparing and Combining.
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
Data warehouse example
1 I256: Applied Natural Language Processing Marti Hearst Nov 8, 2006.
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
Introduction to machine learning
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Chapter 1 Introduction to Data Mining
2009 IEEE Symposium on Computational Intelligence in Cyber Security 1 LDA-based Dark Web Analysis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Topic Modeling using Latent Dirichlet Allocation
CSC 594 Topics in AI – Text Mining and Analytics
Project 2 Latent Dirichlet Allocation 2014/4/29 Beom-Jin Lee.
Link Distribution on Wikipedia [0407]KwangHee Park.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Big Data in Technical and Vocational Education (TVE)
Queensland University of Technology
SNS COLLEGE OF TECHNOLOGY
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
Taking a Tour of Text Analytics
Sentiment analysis algorithms and applications: A survey
Market Intelligence Analysis
Make Predictions Using Azure Machine Learning Studio
Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová
Data Mining 101 with Scikit-Learn
Rosta Farzan and Keyang Zheng, School of Computing and Information
ICDIS 2018 Intelligence and Security
Power of Social Media Analytics
Aspect-based sentiment analysis
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Data Sources, Use Cases and Capabilities
Multi-Dimensional Data Visualization
TED Talks – A Predictive Analysis Using Classification Algorithms
Topic Modeling Nick Jordan.
iSRD Spam Review Detection with Imbalanced Data Distributions
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
CSE 635 Multimedia Information Retrieval
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Text Mining & Natural Language Processing
Text Analytics and Machine Learning Workshop
Text Mining & Natural Language Processing
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
Introduction to Sentiment Analysis
Text Analytics Solutions with Azure Machine Learning
Unsupervised learning of visual sense models for Polysemous words
Big Data Big Data first appeared towards the end of the 1990’s and has become a buzz word in the last few years.
Presented By: Grant Glass
Promising “Newer” Technologies to Cope with the
Presentation transcript:

Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo

Outline  Introduction  Objectives  Implementation  Results  Conclusions

Introduction  Big Data Analytics  Topic Modeling  Sentiment Analysis

Big Data Analytics  In the modern world, big data analytics becomes a key component that is helpful for decision making and identifying hidden patterns of data  Big data analytics concept helps to get ideas for analyzing large data sets

Topic Modeling  Topic modeling is a statistical model used to discover "topics" in collection of unstructured textual documents  Topic is a key idea used to represent the content of documents  The Latent Dirichlet Allocation (LDA) topic model is frequently used for text analytics of set of documents

Sentiment Analysis  Sentiment analysis is the technique used to classify the polarity of text data  The polarity of text data can be considered as positive, negative and neutral

Sentiment Analysis  Lexicon-based approach is the unsupervised technique used for sentiment analysis  The lexicon-based technique takes less time for the classification and it does not require a training set Sentiment Analysis Machine Learning Approach Lexicon-Based Approach Supervised Learning Unsupervised Learning Dictionary-Based Approach Corpus-Based Approach

Objectives  Making sense of large collections of unstructured responses o To identify the nature of the data to take ideas for analyzing data o To handle the large volume of dataset o To identify key ideas of Indian Internet users about Net Neutrality using topic model o To classify the polarity of responses using sentiment analysis

Implementation  Data  Data Preprocessing  Topic Modeling  Sentiment Analysis

Data  The data set included about 500,000 submissions received in response to a debate on Net Neutrality in India  There were two types of responses o Separate answers to each twenty question asked from Telecom Regulatory Authority of India o General comments of internet users in India

Examples of Data Fragment of a Question Based Response

Examples of Data Example of a Comment

Data Preprocessing  Extracting plaintext from HTML files and cleaning data  Extracting answers for questions  Data formatting o Stop word removal o Lemmatizing

Technology Used for Data Preprocessing and Analyzing  Python  R  Natural Language Toolkit (NLTK)

Topic Modeling  Determining the number of topics o The number of topics was determined by examining the topic models which were fitted in each question  Evaluating the generated topics  Visualizing the topics

Sentiment Analysis  Sentiment analysis using Multi-perspective Question Answering lexicon resource  Sentiment analysis using SentiWordNet lexicon resource

Results Topic Modeling Topic 1Topic 2Topic 3 WeightWordWeightWordWeightWord 0.024law0.036increase0.009speed 0.019time0.024account0.008neutrality 0.016act0.024lead0.008net 0.015telecom0.024cost0.007ott 0.012digital0.013complexity0.006penetration 0.012consultation0.013financial0.006establish 0.012application0.013accessible0.006evolving 0.011information0.012degradation0.005high 0.010subject0.011involve0.004made 0.010indian0.010essential0.004early LDA Topic Model

Results Topic Modeling Word Cloud

Results Sentiment Analysis  The lexicon-based approach to sentiment detection reveals that most of the responses are positive

Conclusions  Identifying general ideas of Indian Internet users about Net Neutrality by carrying out analytics on this large data set  The topics generated from the LDA model are mainly focused on Internet problems and Net Neutrality  There were key issues such as regulations of OTT players, Internet security and privacy, speed of Internet services  The results of the lexicon-based approach to sentiment detection reveals that most of the responses are positive

References  S. Sagiroglu and D. Sinanc. “Big data: A review,” in Collaboration Technologies and Systems (CTS), 2013 International Conference on. IEEE, 2013, pp  W. Peng and D. H. Park. “Generate adjective sentiment dictionary for social media sentiment analysis using constrained nonnegative matrix factorization,” in Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Urbana,  M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-based methods for sentiment analysis,” Computational linguistics, vol. 37, no. 2, pp ,  A. Hamouda, and M. Rohaim, “Reviews classification using SentiWordNet lexicon” vol. 2, no. 1, January  D. M. Blei, A. Y. Ng, and M. I. Jordan. “Latent Dirichlet Allocation,” in Journal of Machine Learning Research, 2003, pp. 993_1022.