Research Paper Recommender System Monica D ă g ă diţ ă
Outline Article recommender systems Why Scienstein? Citation analysis methods Text mining Document rating User interface Conclusions
Article recommender systems Purpose : find relevant articles Methods used Content based filtering Collaborative filtering Key elements of an article Citations Author Content
Why Scienstein? PhD students Béla Gipp and Jöran Beel Appeared as an alternative to academic search engines Improves simple keyword-based search Citation analysis Distance Similarity Index (DSI) In-text Impact Factor (ItIF) Author analysis Source analysis Implicit/explicit ratings
Citation analysis methods Problems Homographs The Mathew Effect Self citations Citation circles Ceremonial citations Scienstein’s approach – 4 citation analysis methods
Citation analysis methods(2) Cited by Papers that cite the input document – A&B Reference list Papers referenced in the input document – C&D Bibliographic coupling Papers that cite the same article(s) – BibCo Co-citation Papers cited in the same document – CoCit
Citation analysis methods(3) In-text citation frequency analysis (ICFA) the frequency with which a research paper is cited within a document In-text Impact Factor (ItIF) The higher the ItIF, the closer related is the input document to the cited document
Citation analysis methods(4) In-text citation distance analysis (ICDA) the distance between references within a text -> the degree of their similarity Distance Similarity Index (DSI) calculates the similarity of two documents based on the citation distance OccurenceValue Sentence1 Paragraph1/2 Section1/4 Chapter1/8 Other1/16
Text mining Existing techniques Additional features Classification based on details given in the acknowledgements section Collaborative annotations and classifications Creating new categories classifying publications about archaeological sites according to their geographic location -> Google Maps Extension
Document rating Explicit ratings Improve a user’s own recommendation accuracy Problem: a large amount is needed Implicit ratings Time spent with mouse over a paragraph Time spent reading an article Printed articles
User interface
Conclusions Scienstein - the first hybrid recommender system for research papers Known methods Keyword analysis Ratings New methods In-text Impact Factor (ItIF) Distance Similarity Index (DSI) Hybrid system (content based and collaborative filtering) => more powerful tool
Questions ?