A Measure of Similarity Between Pairs of Papers Susan Biancani Stanford University School of Education.

Slides:



Advertisements
Similar presentations
How to review a paper for a journal Dr Stephanie Dancer Editor Journal of Hospital Infection.
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
The Writing Center Presents:
Periodicals BooksNewspapers Reference tools Online Databases Printed Version Electronic Version Annual reports and other publications.
What are the characteristics of academic journals
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
Business Source Premier (BSP) How to find a publication with an incomplete reference University Library click = next.
Title of Presentation Author 1, Author 2, Author 3, Author 4 Abstract Introduction This is my abstract. This is my abstract. This is my abstract. This.
Your professor will give greater authoritative weight to an article on the Maya published in the scholarly journal American Anthropologist... than to.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Predicting the Semantic Orientation of Adjectives
Tuple – InfoVis Publication Browser CS533 Project Presentation by Alex Gukov.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Journal Status* Using the PageRank Algorithm to Rank Journals * J. Bollen, M. Rodriguez, H. Van de Sompel Scientometrics, Volume 69, n3, pp , 2006.
Is this Article Scholarly? So you have to write a paper or give a presentation, and your professor wants you to find articles from something called a “scholarly”
Left click or use the forward arrows to advance through the PowerPoint Upon clicking, each section of the article will be highlighted one by one Read.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Welcome to Scopus Training by : Arash Nikyar June 2014
Writing a scientific paper Maxine Eskenazi Meeting 1 - Overall Structure and Content of a Paper.
Data mining and machine learning A brief introduction.
Library Research Practices in Education
Rajesh Singh Deputy Librarian University of Delhi Measuring Research Output.
Bibliometrics: coming ready or not CAUL, September 2005 Cathrine Harboe-Ree.
The identification of interesting web sites Presented by Xiaoshu Cai.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
How to Read a Scientific Paper (Computational) Question 1: Are the conclusions justified.
The subject of a scholarly article is based on original research.
How to write a professional paper. 1. Developing a concept of the paper 2. Preparing an outline 3. Writing the first draft 4. Topping and tailing 5. Publishing.
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Social Science Inquiry Model. Scientific inquiry has 5 steps Identify a problem Develop a hypothesis Gather data Analyze the data Draw conclusions.
Bibliometrics toolkit Website: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Further info: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Scopus Scopus was launched by Elsevier in.
Research & The Library Prepared for StFX eXcel Students October 14 th, 2011 Suzanne van den Hoogen, MLIS.
Algorithmic Detection of Semantic Similarity WWW 2005.
CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.
Which Journal to Publish in and How Barbara Gastel, MD, MPH Professor, Texas A&M University Knowledge Community Editor, AuthorAID.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Citation Provenance FYP/Research Update WING Meeting 28 Sept 2012 Heng Low Wee 1/5/
Citation Searching Isabel Holowaty Juliet Ralph
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Experimental Psychology PSY 433 Chapter 5 Research Reports.
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
Matching References to Headers in PDF Papers Tan Yee Fan 2007 December 19 WING Group Meeting.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Demonstration: Tools for large scale bibliometric analysis André Somers | 1 June 25, 2009.
Test Title Test Content.
Big Data Processing of School Shooting Archives
A Simple Approach for Author Profiling in MapReduce
Constructing a Predictor to Identify Drug and Adverse Event Pairs
Recommendation in Scholarly Big Data
Report writing.
Finding Scholarly Articles in a Library Database
Experimental Psychology
إعداد د/زينب عبد الحافظ أستاذ مساعد بقسم الاقتصاد المنزلي
مناهــــج البحث العلمي
TITLE Use this area for your institution logo(s)
RECOMMENDER SYSTEMS WITH SOCIAL REGULARIZATION
WISER: Citiation searching
How to write a good APA reference
HIGHLIGHTS FOR READERS
Advice on getting published
Machine Learning – a Probabilistic Perspective
AP Language & Composition
Why We Need Car Parking Systems - Wohr Parking Systems
Types of Stack Parking Systems Offered by Wohr Parking Systems
Add Title.
Presentation transcript:

A Measure of Similarity Between Pairs of Papers Susan Biancani Stanford University School of Education

Introduction Long-term goal: Understand changes in scholarly ideas over time Develop a person-person similarity measure, to reflect similarity in bodies of work Short-term goal: Develop a measure of paper-paper similarity 9 features, including metadata and content Train on 120 papers, rated by experts on a 1-7 scale

Data 66,000 papers written by professors at Stanford, from the ISI database Features for each pair of papers: Cosine similarity of abstract tf-idf vectors; cosine similarity of title tf-idf vectors Cosine similarity of LDA vectors (3 versions) Count of common references Count of journals referenced in common Count of authors referenced in common Dummy indicating whether the two papers were published in the same journal or not

Gold Standard Data 31 papers from 8 professors in Sociology 44 papers from 7 professors in Biology 45 papers from 7 professors in CS Rating Scale: RatingMeaningCount in Training Corpus 1Same paper120 2Highly related134 3Same subfield394 4Related subfields389 5Same discipline1661 6Related disciplines174 7Completely unrelated4385

Training & Validation Regression model: rating = β 1 tfidfAbstract + β 2 tfidfTitle + β 3 lda50 + β 4 lda100 + β 5 lda200 + β 6 cites + β 7 citeJournals + β 8 citeAuthors + β 9 sameJournal Ordinal Logistic Regression to learn optimal weights for features Ten-fold cross validation (comparing predicted rating to actual)

Results 1 Model Accuracy (6 classes) Accuracy (5 classes) Accuracy (collapsed) rating = tfidfAbstract rating = tfidfAbstract + tfidfTitle rating = lda rating = lda rating = lda50 + lda rating = lda50 + lda100 + lda rating = sameJournal rating = cites rating = cites + citeJournal rating = cites + citeJournal + citeAuthor rating = tfidfAbstract + tfidfTitle + lda50 + lda100 + lda rating = tfidfAbstract + tfidfTitle + lda50 + lda100 + lda200 + cites + citeJournals rating = tfidfAbstract + tfidfTitle + lda50 + lda100 + lda200 + cites + citeJournals + citeAuthors rating = tfidfAbstract + tfidfTitle + lda50 + lda100 + lda200 + cites + citeJournals + citeAuthors + sameJournal

Results 2 Model Accuracy (all classes) Accuracy (collapsed) SOC ONLY: rating = tfidfAbstract + tfidfTitle SOC ONLY: rating = lda50 + lda100 + lda SOC ONLY: rating = cites + citeJournal SOC ONLY: rating = tfidfAbstract + tfidfTitle + lda50 + lda100 + lda200 + cites + citeJournals + citeAuthors + sameJournal BIO ONLY: rating = tfidfAbstract + tfidfTitle BIO ONLY: rating = lda50 + lda100 + lda BIO ONLY: rating = cites + citeJournal BIO ONLY: rating = tfidfAbstract + tfidfTitle + lda50 + lda100 + lda200 + cites + citeJournals + citeAuthors + sameJournal CS ONLY: rating = tfidfAbstract + tfidfTitle CS ONLY: rating = lda50 + lda100 + lda CS ONLY: rating = cites + citeJournal CS ONLY: rating = tfidfAbstract + tfidfTitle + lda50 + lda100 + lda200 + cites + citeJournals + citeAuthors + sameJournal

Future Directions Improve ratings set. Add more disciplines Confirm ratings with more experts Develop a person-person distance measure, treating each person as the cluster of their papers Apply this measure to the study of paradigm shifts / scientific-intellectual movements Explore the role of organizational structure in these movements