Proposing a Scientific Paper Retrieval and Recommender Framework

Slides:



Advertisements
Similar presentations
Mapping Studies – Why and How Andy Burn. Resources The idea of employing evidence-based practices in software engineering was proposed in (Kitchenham.
Advertisements

L ITERATURE REVIEW RESEARCH METHOD FOR ACADEMIC PROJECT I.
Search Engines and Information Retrieval
1 Enhancing Evidence-based Information Access to Inform Public Health Practice Modeling Public Health Information Needs and Accessing Requirements December.
1 Scopus Update 15 Th Pan-Hellenic Academic Libraries Conference, November 3rd,2006 Patras, Greece Eduardo Ramos
Overview of Search Engines
By Kousar Taj A Seminar Paper on LITERATURE REVIEW.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Search Engines and Information Retrieval Chapter 1.
Research in Computing สมชาย ประสิทธิ์จูตระกูล. Success Factors in Computing Research Research Computing Knowledge Scientific MethodAnalytical Skill Funding.
Copyright © Allyn & Bacon 2008 Locating and Reviewing Related Literature Chapter 3 This multimedia product and its contents are protected under copyright.
Chapter 3 Copyright © Allyn & Bacon 2008 Locating and Reviewing Related Literature This multimedia product and its contents are protected under copyright.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Planning an Applied Research Project Chapter 3 – Conducting a Literature Review © 2014 by John Wiley & Sons, Inc. All rights reserved.
Which Journal to Publish in and How Barbara Gastel, MD, MPH Professor, Texas A&M University Knowledge Community Editor, AuthorAID.
Week 2 The lecture for this week is designed to provide students with a general overview of 1) quantitative/qualitative research strategies and 2) 21st.
Holly Wang Workshop at CAU December 15, 2010 Conducting Empirical Research and Publishing in International Journals.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
SmartSearch. SmartSearch is the Library’s new improved Online Catalogue A single site searches all Library resources:  The Library Online Catalogue (ie,
COM 4001 & 4002 Library Workshop Spring Session Overview  Library website review (library.villanova.edu)  Getting started with a topic  Finding.
This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including.
The Thomson Reuters Journal Selection Policy – Building Great Journals - Adding Value to Web of Science Maintaining and Growing Web of Science Regional.
Research Methods in Pscyhology Library Workshop January 2011.
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Understanding and Critically Appraising the Literature Review
Information Retrieval in Practice
Information Retrieval in Practice
Finding Magazine & Newspaper Articles in a Library Database
Databases vs the Internet
Summon® 2.0 Discovery Reinvented
Introduction to Human Services
Understanding and Critically Appraising the Literature Review
D. E. Koditschek 358 GRW ESE 290/291 Introduction to Electrical & Systems Engineering Research Methodology & Design
Recommendation in Scholarly Big Data
Databases vs the Internet
TJTS505: Master's Thesis Seminar
Reviewing the literature
Searching the Literature
Proposal for Term Project
Citation Analysis Your article Jill Otto InCites Other?
Improving your list of results in EBSCO Discovery Service (EDS)
Using computers to search electronic databases
Summon discovers contents from one search box!
Planning your Dissertation
Preface to the special issue on context-aware recommender systems
Parts of an Academic Paper
Content analysis, thematic analysis and grounded theory
Literature reviews and reading lists
Jian Wang Assistant Professor Science Based Business Program LIACS, Leiden University
User Interface HEP Summit, DESY, May 2008
Analysis: Clarity of Scholarly Question Style & Scholarly Relevance
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Exploring Scholarly Data with Rexplore
Linked Open Data Project
IL Step 3: Using Bibliographic Databases
Introduction of KNS55 Platform
สมชาย ประสิทธิ์จูตระกูล
Journal evaluation and selection journal
Reviewing the literature
Learning Literature Search Models from Citation Behavior
Chapter Two: Review of the Literature
Research in Virtual Worlds
Comparing your papers to the rest of the world
Research Methodology BE-5305
digital libraries and human information behavior
ProQuest Databases.
Chapter Two: Review of the Literature
PHARM Library Orientation
Analysis: Clarity of Scholarly Question Style & Scholarly Relevance
Dr John Corbett USP-CAPES International Fellow
Presentation transcript:

Proposing a Scientific Paper Retrieval and Recommender Framework Aravind Sesagiri Raamkumar, Schubert Foo & Natalie Pang Wee Kim Wee School of Communication and Information Nanyang Technological University, Singapore Presentation for ICADL’16 December 7th 2016

BACKGROUND Information Retrieval (IR) and Recommender Systems (RS) techniques have been used to find information objects for:- Scholarly Communication Lifecycle tasks Literature Review (LR) search tasks Examples of such tasks include Building a reading list of research papers Recommending similar papers based on seed papers Recommending papers based on query logs Serendipitous discovery of interesting papers Recommending publication venues for manuscripts Recommending papers based on citation context Recommending co-authors for papers And few more….

Background Issues Proposed techniques and applications are piecemeal approaches Wide variety of algorithms and data fields used in prior studies What was done? A prototype system Rec4LRW was built for recommending papers for three tasks:- Building a reading list of research papers Finding similar papers based on a set of papers Shortlisting papers from the final reading list for inclusion in manuscript Task recommendation techniques conceptualized on top of an identified set of base features

Rec4lRW System – TaSk 1

Rec4lRW System – TaSk 2

Rec4lRW System – TaSk 3

REC4LRW System evaluation Offline evaluation experiment and user evaluation study conducted to evaluate the Rec4LRW system ACM DL extract of papers published between 1951 and 2011 used as corpus for the system with 103,739 articles Postgraduate research students, research staff and academic staff were recruited for the user evaluation study Main entry criteria: Participant should have authored at least one research paper Participants evaluated the task recommendations and the overall Rec4LRW system from a list of 43 topics Online questionnaires were provided at the end of each task

Sample questionnaire

User STUDY Participants Demographic Variable Number of Participants Position   Student 62 (47%) Staff 70 (53%) Experience Level [Self-Reported] Beginner 15 (11.4%) Intermediate 61 (46.2%) Advanced 34 (25.8%) Expert 22 (16.7%) Discipline Category Engineering & Technology 87 (65.9%) Social Sciences 42 (31.8%) Life Sciences & Medicine 3 (2.3%) Discipline Computer Science & Information Systems 51 (38.6%) Library and Information Studies 30 (22.7%) Electrical & Electronic Engineering Communication & Media Studies 8 (6.1%) Mechanical, Aeronautical & Manufacturing Engineering 5 (3.8%) Biological Sciences 2 (1.5%) Statistics & Operational Research 1 (0.8%) Education Politics & International Studies Economics & Econometrics Civil & Structural Engineering Psychology

DATA ANALYSIS PROCEDURES Quantitative Data Ascertain the agreement percentages of the evaluation measures Logistic regression, t-test and correlation tests Qualitative Data Identify the top preferred and critical aspects of the tasks and the overall system Feedback responses were coded by a single coder using an inductive approach

Emergent themes and a Framework Certain dominant themes were apparent from the qualitative feedback These themes were consolidated into a single framework - Scientific Paper Retrieval and Recommender Framework (SPRRF) Why do we need a framework? Most RS and IR studies are single dimensional i.e. algorithmic Need to consider the overall context towards providing a meaningful experience Framework generation based on empirical data Guide the next round of evaluation of Rec4LRW system

ThEMEs (1-2) Theme 1: Distinct User Groups Theme 2: Information Cues Users who want more control Participants required control features in the UI and gave preferences on the algorithms logic “..Maybe a side window with categories like high reach, survey etc could be put up and upon clicking it, more papers in that category could be loaded.” Users who tend to trust the system and its output Participants were largely satisfied with the overall system “The idea of providing this system is quite* good. Such a system if developed and prepared well, can help and speed up the process of literature survey by helping to find better papers…” Theme 2: Information Cues Four cue labels used in the system: Recent, Popular, High Reach, Survey/Review Cues positively impacted participants’ perceptions of the system “I like the highlighted recommendations - for e.g. Popular, Recent etc. which greatly helps in distinguishing various references and catches the eye !”

Themes (3-4) Theme 3: Forced Serendipity vs Natural Serendipity Prior studies have focused mainly on modelling serendipity ‘View Papers in the Parent Cluster’ feature helped participants in noticing papers which they have not read earlier “The view papers in the parent cluster function is very helpful to get a full picture of research field.” “The user can view many papers in the parent cluster in addition to the shortlisted papers. Thus the user need not spend much time on finding related papers.” Theme 4: Learning Algorithms vs Fixed Algorithms Some participants in the study suggested heuristics to identify papers for the tasks 1 and 2 These users expect a list of appropriate algorithms to be presented in the system “..Take a high impact paper (based on citation and may be exact keyword matching), then go through its own references to understand more about the research conducted. This is because, a good work generally cites other prominent works in the field…”

Themes (3-4) Theme 3: Forced Serendipity vs Natural Serendipity Prior studies have focused mainly on modelling serendipity ‘View Papers in the Parent Cluster’ feature helped participants in noticing papers which they have not read earlier “The view papers in the parent cluster function is very helpful to get a full picture of research field.” “The user can view many papers in the parent cluster in addition to the shortlisted papers. Thus the user need not spend much time on finding related papers.” Theme 4: Learning Algorithms vs Fixed Algorithms Some participants in the study suggested heuristics to identify papers for the tasks 1 and 2 These users expect a list of appropriate algorithms to be presented in the system “..Take a high impact paper (based on citation and may be exact keyword matching), then go through its own references to understand more about the research conducted. This is because, a good work generally cites other prominent works in the field…”

ThEMEs (5-6) Theme 5: Inclusion of Control Features in User-Interface Many participants felt handicapped by the absence of control features in the Rec4LRW system Expected control features were sort options, topical facets and advanced search features “Really good for the initial review. It would be nice to see additional filters to focus on a specific topic” “More recent papers shall be included, and it is better if the user can sort the recommended paper by sequence such as sort times, date, relevance...” Theme 6: Inclusion of Bibliometric Data Participants explicitly stated the need for metrics such as impact factor and h- index in the UI The main challenge is the computing overhead for calculating the new metrics “Categorizing the papers based on popularity, journal impact factor, and etc” “…In case that an item in the recommendation list is a journal paper, can we also know its impact factor and which databases indexes it?”

ThEMEs (7-8) Theme 7: Diversification of Corpus The evaluation of algorithms has been restricted to datasets from certain disciplines such as computer science in prior studies Future studies should include papers from “far-apart” disciplines for the evaluation “…Due to limitation of data sets (as only ACM papers) search result is not of decent quality.” “But in general the main drawback is that "the papers in the corpus/dataset are from an extract of papers from ACM DL". As I work at the intersection of information systems and business many relevant papers are not included in the list.” Theme 8: Task Interconnectivity Participants appreciated the utility of ‘seed basket’ and ‘reading list’ towards management of the paper across the three tasks “I like the idea of giving recommendations based on a seed group of articles, but there needs to be more facets to select from, there needs to be greater selection of seeding articles as well in terms of those facets.” “The whole idea seems good for me, especially making seed of 5+ for expanding the bunch.”

The framework SPRRF Feature Skill-Reliant User System-Reliant User UI Customization   Sort options  Topical Facets Advanced search options Algorithmic Customization Setting the recommendations count Selecting the retrieval algorithm Submitting external papers User Personalization Paper collections Favourites specification Paper anchors Relevance feedback

Future work SPRRF to be used in second round of Rec4LRW evaluation studies SPRRF components to be statistically validated through hypotheses Expand the scope of SPRRF to other information objects in the Scholarly Communication Lifecycle

Use the link http://goo.gl/XgynzY or scan the below QR code Get access to Rec4lrw… Use the link http://goo.gl/XgynzY or scan the below QR code

Thank you