WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Deema Abdal Hafeth MSc student by research School of Computer Science, University of Lincoln Dr Amr Ahmed Supervisor Dr David Cobham supervisor.
Social Media.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Machine learning continued Image source:
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
Classification of the aesthetic value of images based on histogram features By Xavier Clements & Tristan Penman Supervisors: Vic Ciesielski, Xiadong Li.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu ( ) Supervisor: Robert Dale.
Analysis of online hate communities in Social Networks Presented by : Ruchi Bhindwale.
1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.
Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Practical Project of the 2006 Joint International Master’s Degree.
Data Mining Applied to Document Imaging Jeff Rekoske.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Computer Science and Engineering Lehigh University.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
Semi-supervised Dialogue Act Recognition Maryam Tavafi.
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 3.1 Chapter 3 : The Problem of Web Navigation.
FORESTUR How to work… …with this training platform? …with this methodology?
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
SUPPORTING SYNCHRONOUS SOCIAL Q&A THROUGHOUT THE QUESTION LIFECYCLE Matthew Richardson Ryen White Microsoft Research.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Opinion Observer: Analyzing and Comparing Opinions on the Web WWW 2005, May 10-14, 2005, Chiba, Japan. Bing Liu, Minqing Hu, Junsheng Cheng.
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
Language Identification and Part-of-Speech Tagging
Experience Report: System Log Analysis for Anomaly Detection
Next Question Prediction
Easy-Bash: Designing a Metasearch Engine for Bash Command Queries
Detecting Online Commercial Intention (OCI)
CIKM Competition 2014 Second Place Solution
CIKM Competition 2014 Second Place Solution
Tutorial for LightSIDE
Predicting Student Performance: An Application of Data Mining Methods with an Educational Web-based System FIE 2003, Boulder, Nov 2003 Behrouz Minaei-Bidgoli,
Behrouz Minaei, William Punch
The Best Game Ever Created By You!.
Text Mining & Natural Language Processing
Assignment 7 Due Application of Support Vector Machines using Weka software Must install libsvm Data set: Breast cancer diagnostics Deliverables:
Extracting Why Text Segment from Web Based on Grammar-gram
Stance Classification of Ideological Debates
Presentation transcript:

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies Asian Institute of Technology Committee: Dr. Sumanta Guha (Chairperson) Prof. Phan Minh Dung Assoc. Prof. Tapio J. Erke May 2010

Introduction Problem Statement Objectives Methodology Implementation Results and Discussion Conclusion and Future Work Demonstration WEB FORUM MINING BASED ON USER SATISFACTION PAGE 2 Agendas Introduction

 Internet-Forum or Message Board  Online Discussion Site  Asynchronous  People participating in an Internet forum may cultivate social bonds and interest groups for a topic may form from the discussions WEB FORUM MINING BASED ON USER SATISFACTION PAGE 3

Figure 1: Organization of Threads WEB FORUM MINING BASED ON USER SATISFACTION PAGE 4 Introduction

Table 1: An example of a thread WEB FORUM MINING BASED ON USER SATISFACTION PAGE 5 Introduction Title: Software for Ubuntu. Post No.PostUsersCategory 1 I am really new to Linux, where can I find software for Ubuntu? avacomputersQuestion 2 Applications>Add/remove If you are using Ubuntu Firefox, you can also use danielrmtAnswer 3http:// 4 something else I was reading about is a port from OzOs ( called apt:foo devildoc5Answer 5 or the easy way >.> CLICK SYSTEM > PREFERENCES > SYNAPTIC PACKAGE MANAGER there you can search all kinds of games, software! anything you like, thousands to choose from! They download and install easily onto your system Codix121Answer 6Thanks GUys.avacomputersAnswer Questioner Repliers Questioner Questioner Post

Introduction Problem Statement Objectives Methodology Implementation Results and Discussion Conclusion and Future Work Demonstration WEB FORUM MINING BASED ON USER SATISFACTION PAGE 6 Agendas

Problem Statement Which forum may have solution? Lots of Forums…… Ooops WEB FORUM MINING BASED ON USER SATISFACTION PAGE 7 I don’t want to test all forums…

Introduction Problem Statement Objectives Methodology Implementation Results and Discussion Conclusion and Future Work Demonstration WEB FORUM MINING BASED ON USER SATISFACTION PAGE 8 Agendas

Objectives  To categorize a post as a question post or an answer post.  To classify a thread as answered or unanswered based on questioner’s satisfaction and forum features.  To predict a solution post based on interaction and satisfaction of questioner. WEB FORUM MINING BASED ON USER SATISFACTION PAGE 9

Introduction Problem Statement Objectives Methodology Implementation Results and Discussion Conclusion and Future Work Demonstration WEB FORUM MINING BASED ON USER SATISFACTION PAGE 10 Agendas

Methodology: Framework of Study Figure 4: Framework of Study WEB FORUM MINING BASED ON USER SATISFACTION PAGE 11

Figure 5 : Sentence Classification WEB FORUM MINING BASED ON USER SATISFACTION PAGE 12 Methodology: Sentence Classification Example

Label Sequential Patterns (LSPs), p, in the form of  LHS is a sequence, a i is named “item”.  c is a class label (question/non-question) A= has a subsequence B=  A contains B  A LSP p 1 is contained by p 2 if the sequence p 1.LHS is contained by p 2.LHS and p 1.c = p 2.c. Example: t 1 = (,Q) t 2 = (,Q) t 3 = (,NQ) 1 ) LSP p 1 = (, Q) is contained in t 1 and t 2 sup(p 1 ) = 2/3 = 66.7%, conf(p 1 )=(2/3)/(2/3) = 100% 2) LSP p 2 = (, Q) sup(p 2 ) = 2/3 = 66.7%, conf(p 2 )= (2/3)/(3/3) = 66.7% WEB FORUM MINING BASED ON USER SATISFACTION PAGE 13 Methodology: Sentence Classification

Mining LSPs  Word length of sequence : 4  Setting minimum support at 0.01% and minimum confidence at 95% Converting to features  LSP  5W1H word  Auxiliary Verb  Question Mark The corresponding feature being set at 1 if a sentence includes a LSP, question mark, start with 5W1H word, or auxiliary verb. WEB FORUM MINING BASED ON USER SATISFACTION PAGE 14 Methodology: Sentence Classification

Figure 6 : Classification of Thread WEB FORUM MINING BASED ON USER SATISFACTION PAGE 15 Methodology: Thread Classification

 Satisfied Phrase Derive Features :  Unsatisfied Phrase WEB FORUM MINING BASED ON USER SATISFACTION PAGE 16 Methodology: Questioner Post Classification

 Question Present  Presence of More Post WEB FORUM MINING BASED ON USER SATISFACTION PAGE 17 Methodology: Questioner Post Classification Derive Features :

 Satisfied Post Length WEB FORUM MINING BASED ON USER SATISFACTION PAGE 18  Presence of Emoticons  Happy emoticon ( ) : mood of satisfaction  Unhappy emoticon (  ) : mood of un-satisfaction Methodology: Questioner Post Classification Derive Features :

 Original Post WEB FORUM MINING BASED ON USER SATISFACTION PAGE 19 Methodology: Questioner Post Classification Derive Features :

WEB FORUM MINING BASED ON USER SATISFACTION PAGE 20 Figure 7: Classification of Questioner Post Methodology: Questioner Post Classification

 Presence of Quote Find Solution Post  Presence of User Name WEB FORUM MINING BASED ON USER SATISFACTION PAGE 21 Methodology: Predict Solution Posts

 May be between the questioner post QP R R SP Solution Posts WEB FORUM MINING BASED ON USER SATISFACTION PAGE 22 Methodology: Predict Solution Posts

Introduction Problem Statement Objectives Scope and Limitation Methodology Implementation Results and Discussion Conclusion and Future Work Demonstration WEB FORUM MINING BASED ON USER SATISFACTION PAGE 23 Agendas

Dataset  Forum : Ubuntu (  Sentence classification: datasets of 100, 200 and 300 from 3000 sentences  Questioner Post Classification: 250 posts from 79 threads  Manual Evaluation : 100 threads by two team contains 5 person in each team Tools and Language  POS Tag, Tokenization, Sentence Detection : OpenNLP  Classifier : Support Vector Machine (LibSVM, SMO)  Model : SVM is trained using libSVM for classifier model  Language : Java WEB FORUM MINING BASED ON USER SATISFACTION PAGE 24 Implementation

Introduction Problem Statement Objectives Methodology Implementation Results and Discussion Conclusion and Future Work Demonstration WEB FORUM MINING BASED ON USER SATISFACTION PAGE 25 Agendas

FeaturesAccuracyRecallPrecisionF-Measure WH word (5W1H) Question Mark (QM) Auxiliary Verb (Aux) W1H + QM + Aux Labeled Sequential Pattern (LSP)0.94 QM+Aux+LSP+5W1H (LSP+)0.96 Table 3: Accuracy of Sentence Classification by using LSP+ by Class WEB FORUM MINING BASED ON USER SATISFACTION PAGE 26 Result and Discussion : Sentence Classification Comparison  All the results obtained from10 fold cross validation

Table 5: Questioner Post Classification Comparison using Different Features FeaturesPrecisionRecallF-Measure Satisfied Words (SW) 0.78 Unsatisfied Words (UW) 0.72 Question (Ques) More Post (MP) 0.83 Word Count (WC) Happy Emoticon (HE) Unhappy Emoticon (UE) Original Post (OP) Ques+MP0.84 Ques+OP0.86 MP+UE+OP0.88 Ques+MP+OP0.85 SW + UW + Ques SW+UW+Ques+MP+WC+HE+UE+OP0.91 WEB FORUM MINING BASED ON USER SATISFACTION PAGE 27 Result and Discussion : QP Classification Comparison

PerformanceAccuracyRecallPrecisionF-Measure System with Team A System with Team B Average Table 6: Comparison of System Result with Manual Evaluation for thread classification AccuracyRecallPrecisionF-Measure System Accuracy with Team A System Accuracy with Team B Average System Accuracy Table 7: System Accuracy for Prediction of Solution Posts WEB FORUM MINING BASED ON USER SATISFACTION PAGE 28 Result and Discussion : Comparison with Team’s Evaluation

Introduction Problem Statement Objectives Methodology Implementation Results and Discussion Conclusion and Future Work Demonstration WEB FORUM MINING BASED ON USER SATISFACTION PAGE 29 Agendas

Conclusion :  Finding answered threads in web forum is achieved by tracing user satisfaction.  Thread and sentence are classified by deriving different features.  Performance of system is increased when combining different features. WEB FORUM MINING BASED ON USER SATISFACTION PAGE 30 Conclusion and Future Work: Conclusion

Future Work :  It can be used for query based raking of thread.  It can be used for extracting answered sentences with better accuracy.  The performance of system can be increased by incorporating semantics. WEB FORUM MINING BASED ON USER SATISFACTION PAGE 31 Conclusion and Future Work: Future Work