Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proposing a Scientific Paper Retrieval and Recommender Framework

Similar presentations


Presentation on theme: "Proposing a Scientific Paper Retrieval and Recommender Framework"— Presentation transcript:

1 Proposing a Scientific Paper Retrieval and Recommender Framework
Aravind Sesagiri Raamkumar, Schubert Foo & Natalie Pang Wee Kim Wee School of Communication and Information Nanyang Technological University, Singapore Presentation for ICADL’16 December 7th 2016

2 BACKGROUND Information Retrieval (IR) and Recommender Systems (RS) techniques have been used to find information objects for:- Scholarly Communication Lifecycle tasks Literature Review (LR) search tasks Examples of such tasks include Building a reading list of research papers Recommending similar papers based on seed papers Recommending papers based on query logs Serendipitous discovery of interesting papers Recommending publication venues for manuscripts Recommending papers based on citation context Recommending co-authors for papers And few more….

3 Background Issues Proposed techniques and applications are piecemeal approaches Wide variety of algorithms and data fields used in prior studies What was done? A prototype system Rec4LRW was built for recommending papers for three tasks:- Building a reading list of research papers Finding similar papers based on a set of papers Shortlisting papers from the final reading list for inclusion in manuscript Task recommendation techniques conceptualized on top of an identified set of base features

4 Rec4lRW System – TaSk 1

5 Rec4lRW System – TaSk 2

6 Rec4lRW System – TaSk 3

7 REC4LRW System evaluation
Offline evaluation experiment and user evaluation study conducted to evaluate the Rec4LRW system ACM DL extract of papers published between 1951 and 2011 used as corpus for the system with 103,739 articles Postgraduate research students, research staff and academic staff were recruited for the user evaluation study Main entry criteria: Participant should have authored at least one research paper Participants evaluated the task recommendations and the overall Rec4LRW system from a list of 43 topics Online questionnaires were provided at the end of each task

8 Sample questionnaire

9 User STUDY Participants
Demographic Variable Number of Participants Position Student 62 (47%) Staff 70 (53%) Experience Level [Self-Reported] Beginner 15 (11.4%) Intermediate 61 (46.2%) Advanced 34 (25.8%) Expert 22 (16.7%) Discipline Category Engineering & Technology 87 (65.9%) Social Sciences 42 (31.8%) Life Sciences & Medicine 3 (2.3%) Discipline Computer Science & Information Systems 51 (38.6%) Library and Information Studies 30 (22.7%) Electrical & Electronic Engineering Communication & Media Studies 8 (6.1%) Mechanical, Aeronautical & Manufacturing Engineering 5 (3.8%) Biological Sciences 2 (1.5%) Statistics & Operational Research 1 (0.8%) Education Politics & International Studies Economics & Econometrics Civil & Structural Engineering Psychology

10 DATA ANALYSIS PROCEDURES
Quantitative Data Ascertain the agreement percentages of the evaluation measures Logistic regression, t-test and correlation tests Qualitative Data Identify the top preferred and critical aspects of the tasks and the overall system Feedback responses were coded by a single coder using an inductive approach

11 Emergent themes and a Framework
Certain dominant themes were apparent from the qualitative feedback These themes were consolidated into a single framework - Scientific Paper Retrieval and Recommender Framework (SPRRF) Why do we need a framework? Most RS and IR studies are single dimensional i.e. algorithmic Need to consider the overall context towards providing a meaningful experience Framework generation based on empirical data Guide the next round of evaluation of Rec4LRW system

12 ThEMEs (1-2) Theme 1: Distinct User Groups Theme 2: Information Cues
Users who want more control Participants required control features in the UI and gave preferences on the algorithms logic “..Maybe a side window with categories like high reach, survey etc could be put up and upon clicking it, more papers in that category could be loaded.” Users who tend to trust the system and its output Participants were largely satisfied with the overall system “The idea of providing this system is quite* good. Such a system if developed and prepared well, can help and speed up the process of literature survey by helping to find better papers…” Theme 2: Information Cues Four cue labels used in the system: Recent, Popular, High Reach, Survey/Review Cues positively impacted participants’ perceptions of the system “I like the highlighted recommendations - for e.g. Popular, Recent etc. which greatly helps in distinguishing various references and catches the eye !”

13 Themes (3-4) Theme 3: Forced Serendipity vs Natural Serendipity
Prior studies have focused mainly on modelling serendipity ‘View Papers in the Parent Cluster’ feature helped participants in noticing papers which they have not read earlier “The view papers in the parent cluster function is very helpful to get a full picture of research field.” “The user can view many papers in the parent cluster in addition to the shortlisted papers. Thus the user need not spend much time on finding related papers.” Theme 4: Learning Algorithms vs Fixed Algorithms Some participants in the study suggested heuristics to identify papers for the tasks 1 and 2 These users expect a list of appropriate algorithms to be presented in the system “..Take a high impact paper (based on citation and may be exact keyword matching), then go through its own references to understand more about the research conducted. This is because, a good work generally cites other prominent works in the field…”

14

15 Themes (3-4) Theme 3: Forced Serendipity vs Natural Serendipity
Prior studies have focused mainly on modelling serendipity ‘View Papers in the Parent Cluster’ feature helped participants in noticing papers which they have not read earlier “The view papers in the parent cluster function is very helpful to get a full picture of research field.” “The user can view many papers in the parent cluster in addition to the shortlisted papers. Thus the user need not spend much time on finding related papers.” Theme 4: Learning Algorithms vs Fixed Algorithms Some participants in the study suggested heuristics to identify papers for the tasks 1 and 2 These users expect a list of appropriate algorithms to be presented in the system “..Take a high impact paper (based on citation and may be exact keyword matching), then go through its own references to understand more about the research conducted. This is because, a good work generally cites other prominent works in the field…”

16 ThEMEs (5-6) Theme 5: Inclusion of Control Features in User-Interface
Many participants felt handicapped by the absence of control features in the Rec4LRW system Expected control features were sort options, topical facets and advanced search features “Really good for the initial review. It would be nice to see additional filters to focus on a specific topic” “More recent papers shall be included, and it is better if the user can sort the recommended paper by sequence such as sort times, date, relevance...” Theme 6: Inclusion of Bibliometric Data Participants explicitly stated the need for metrics such as impact factor and h- index in the UI The main challenge is the computing overhead for calculating the new metrics “Categorizing the papers based on popularity, journal impact factor, and etc” “…In case that an item in the recommendation list is a journal paper, can we also know its impact factor and which databases indexes it?”

17 ThEMEs (7-8) Theme 7: Diversification of Corpus
The evaluation of algorithms has been restricted to datasets from certain disciplines such as computer science in prior studies Future studies should include papers from “far-apart” disciplines for the evaluation “…Due to limitation of data sets (as only ACM papers) search result is not of decent quality.” “But in general the main drawback is that "the papers in the corpus/dataset are from an extract of papers from ACM DL". As I work at the intersection of information systems and business many relevant papers are not included in the list.” Theme 8: Task Interconnectivity Participants appreciated the utility of ‘seed basket’ and ‘reading list’ towards management of the paper across the three tasks “I like the idea of giving recommendations based on a seed group of articles, but there needs to be more facets to select from, there needs to be greater selection of seeding articles as well in terms of those facets.” “The whole idea seems good for me, especially making seed of 5+ for expanding the bunch.”

18 The framework SPRRF Feature Skill-Reliant User System-Reliant User
UI Customization Sort options Topical Facets Advanced search options Algorithmic Customization Setting the recommendations count Selecting the retrieval algorithm Submitting external papers User Personalization Paper collections Favourites specification Paper anchors Relevance feedback

19 Future work SPRRF to be used in second round of Rec4LRW evaluation studies SPRRF components to be statistically validated through hypotheses Expand the scope of SPRRF to other information objects in the Scholarly Communication Lifecycle

20 Use the link http://goo.gl/XgynzY or scan the below QR code
Get access to Rec4lrw… Use the link or scan the below QR code

21 Thank you


Download ppt "Proposing a Scientific Paper Retrieval and Recommender Framework"

Similar presentations


Ads by Google