User-centered Information System Evaluation Tutorial
Goal Understand how to do user-centered information system evaluation Using query performance of different queries generated by users to measure the efficiency of different information retrieval systems. Evaluating systems using precision and recall based measures
Example Scenario: Finding expert is important, especially when you want to find potential collaborators, ask for consultancy, look for potential reviewers, etc. Aim: comparing two expert finding systems Pair 1: http://vivo.cornell.edu vs. http://experts.umich.edu Pair 2: http://vivosearch.org vs. http://www.arnetminer.org Pair 3: Google vs. Yahoo Goal: which one is a better expert finding system based on the following criteria easy to use, user-friendly quality of search
Quantitative data analysis
How Approach (2 persons form a group) Demographic questionnaire About your user, whether he/she is familiar with this system Pre-evaluation questionnaire whether you are familiar with the topic Evaluation Easy to use, user-friendly Develop a survey/questionnaire with 3-5 5-scale questions and 1-2 open-end questions E.g., easy to find help file, easy to know where you are, focusing on the design of the system Quality of search (2 queries) Pair 1: Q1: find experts working on “stem cell”, Q2: create another query Pair 2: Q1: find experts good at “data mining”, Q2: create another query Pair 3: Q1: create your own query, Q2: create another query Post-evaluation questionnaire Usability Whether the system is easy to search, does it take too long to get the result, whether the results are ranked, whether each ranked URL has short summary Exit questionnaire preference
Measuring quality of search Query System Rank Page title Relevance Judgment (0/1) Q1: ??????? 1 2 3 4 5 Same for Query 2
User judgment of relevance Fill in the following table with the title and short summary of top 5 results Show the results to your partner and let your partner decide the relevance (0=irrelevant, 1=relevant) Rank Page title + short summary (random order) Relevance Judgment (0/1) 1 2 3 4 5
Calculate Precision@topN Query System ID Evaluation Metric Q1 System 1 Precision@3 = Precision@5 = System 2 Q2 System ID Evaluation Metric System 1 Precision@3 = Precision@5 = System 2 Precision@Top3=(1+0+1)/3=0.67 Precision@Top5=(1+0+1+0+1)/5=0.6 Rank Relevance Judgment (0/1) 1 2 3 4 5 Precision@Top3_S1= (Precision@Top3_Q1+ Precision@Top3_Q2)/2
Compare your results with your partner’s Will your results be consistent with your partner’s? Compare the precision@top3 and precision@top5 If your results are consistent, you can draw the conclusion, which system performs better If your results are not consistent, what can you draw, why? Limitation and potential reasons.
Evaluation Report Write a short report (1-2 pages) about this evaluation Including the evaluation outputs Quantitative analysis Qualitative analysis Final conclusions Limitation Lessons-Learned If you want to have this as a real evaluation, what can you improve.