Topic Identification in Forums Evaluation Strategy IA Seminar Discussion Ahmad Ammari School of Computing, University of Leeds.

Slides:



Advertisements
Similar presentations
Data Set used. K Means K Means Clusters 1.K Means begins with a user specified amount of clusters 2.Randomly places the K centroids on the data set 3.Finds.
Advertisements

NO YES Question 3 Question 2 Question 1 Topic 1 Next Topic 1.
United Nations Statistics Division Review of the Implementation Guide to ISIC Rev.4.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Developing a MapReduce Application – packet dissection.
Dimensionality Reduction PCA -- SVD
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Searching with Lucene Chapter 2. For discussion Information retrieval What is Lucene? Code for indexer using Lucene Pagerank algorithm.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
HIV/AIDS Asia Regional Project (HAARP) Training Program for Police Effective Strategies for reducing the spread of HIV/AIDS among and from Injecting Drug.
Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
This presentation has been modified from its original version. It has been formatted to fit your computer screen.
Apache Mahout Feb 13, 2012 Shannon Quinn Cloud Computing CS
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Jaegul Choo1*, Changhyun Lee1, Chandan K. Reddy2, and Haesun Park1
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.
Online Learning for Latent Dirichlet Allocation
Identification of the authors of short messages portals on the Internet using the methods of mathematical linguistics. Postgraduate:Sukhoparov M.E. Supervisor:doctor.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Authors: Rosario Sotomayor, Joe Carthy and John Dunnion Speaker: Rosario Sotomayor Intelligent Information Retrieval Group (IIRG) UCD School of Computer.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
SINGULAR VALUE DECOMPOSITION (SVD)
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
PCI th Panhellenic Conference in Informatics Clustering Documents using the 3-Gram Graph Representation Model 3 / 10 / 2014.
Topic Modeling using Latent Dirichlet Allocation
Collaborative Filtering Zaffar Ahmed
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.
Latent Dirichlet Allocation
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Abdul Wahid, Xiaoying Gao, Peter Andreae
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.
GeoMF: Joint Geographical Modeling and Matrix Factorization for Point-of-Interest Recommendation Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, EnhongChen,
Scaling up LDA (Monday’s lecture). What if you try and parallelize? Split document/term matrix randomly and distribute to p processors.. then run “Approximate.
Automatic Labeling of Multinomial Topic Models
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Page 1 Cloud Study: Algorithm Team Mahout Introduction 박성찬 IDS Lab.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.
Optimization Indiana University July Geoffrey Fox
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Big Data Analytics: HW#3
Techniques for Dimensionality Reduction
Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk
Community-based User Recommendation in Uni-Directional Social Networks
Multi-Dimensional Data Visualization
Frontiers of Computer Science, 2015, 9(4):608–622
Introduction to Informer
AquaScale Data Collection Android Application
Scalable Parallel Interoperable Data Analytics Library
Stochastic Optimization Maximization for Latent Variable Models
Learning human mobility patterns by learning a hierarchical hidden Markov model using latent Dirichlet allocation Eyal Ben Zion , Boaz Lerner Department.
Junghoo “John” Cho UCLA
Topic Models in Text Processing
Indiana University July Geoffrey Fox
Algorithms Lecture # 01 Dr. Sohail Aslam.
Restructuring Sparse High Dimensional Data for Effective Retrieval
RANDOM NUMBERS SET # 1:
Neal Kurande, WinaGodwin Anyanwu Jr., Adam Chau
Presentation transcript:

Topic Identification in Forums Evaluation Strategy IA Seminar Discussion Ahmad Ammari School of Computing, University of Leeds

2 Identify Discussion Forum Topics Service Topic Identifier Lucene Filtering Hadoop Map/Reduce Topic Weighting Topic Sorting

3 Selected Discussion Forums

4 Envisaged Topic Clouds View in Dicode Forums

5 We aim to implement the service in different variations (approaches / algorithms) to improve the identified topics Variations include: Identified Topics based on Term Frequency (This is the current version of the service!) Different Discussion Clustering Algorithm (K-Medoids) Adding Dimension Reduction before Clustering (SVD) Topic Modelling with Latent Dirichlet Allocation (LDA) Semantic Annotation of Discussions Semantic Augmentation of the Identified Topics Variations of the original approach

6 We aim to set up an evaluation strategy to evaluate the business value of the service To do so, we want to: Broaden the Discussions (we may need to change the forum) Run all the variations of the service on the large Discussions and derive the topic clouds for each variation Determine the evaluation criteria to test/measure Design a set of specific tasks/problems for the users to do using the forum Discussions and the derived topic clouds Split the users randomly into groups Give each group the discussion forum, the topic clouds derived by one variation of the service, and the tasks to do Compare between the groups based on the evaluation criteria Your feedback on the following is invaluable: What evaluation criteria to test (please give examples) For each evaluation criteria, what tasks/problems to give users to do (please give examples) How this evaluation strategy may be further improved Any other recommended evaluation strategies Discussion: Evaluation Strategy