Learning Literature Search Models from Citation Behavior

Slides:

Advertisements

Similar presentations

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.

Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Introduction to Information Retrieval

Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.

DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.

Cluster Evaluation Metrics that can be used to evaluate the quality of a set of document clusters.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University

Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.

Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02.

Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,

A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes Bilal Hawashin, Farshad Fotouhi Traian Marius Truta Department.

1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.

SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.

Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Chris Luszczek Biol2050 week 3 Lecture September 23, 2013.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

1 Computing Relevance, Similarity: The Vector Space Model.

 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.

A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.

TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:

Copyright Atomic Dog Publishing, 2007 Parts of a Research Report Abstract Introduction Method Results Discussion References.

1 CS 430: Information Discovery Lecture 5 Ranking.

User-Friendly Systems Instead of User-Friendly Front-Ends Present user interfaces are not accepted because the underlying systems are too difficult to.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

SIGIR 2005 Relevance Information: A Loss of Entropy but a Gain for IDF? Arjen P. de Vries Thomas Roelleke,

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.

1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.

ClusCite:Effective Citation Recommendation by Information Network-Based Clustering Date: 2014/10/16 Author: Xiang Ren, Jialu Liu,Xiao Yu, Urvashi Khandelwal,

QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.

LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.

Recommendation in Scholarly Big Data

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Applying Key Phrase Extraction to aid Invalidity Search

Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen

Enriching Taxonomies With Functional Domain Knowledge

Clinically Significant Information Extraction from Radiology Reports

Topic: Semantic Text Mining

Presentation transcript:

Learning Literature Search Models from Citation Behavior Who Should I Cite? Learning Literature Search Models from Citation Behavior CIKM’10 Advisor：Jia Ling, Koh Speaker：SHENG HONG, CHUNG

Outline Introduction Features Experiment Conclusion Similar terms Cited information Regency Cited using similar terms Similar topics Social habits Experiment Conclusion

Introduction Motivation Solution Researcher Project idea hard to determine prior research has been done and what prior research is more relevant The result from search engine is not good enough Solution Add more features in retrieval model

Framework of the system Input Output query( topic ) System reference list d1 d2 d3 . abstract

Pattern from article

Pattern from article

Example Corpus Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Author: a4 {t4:3,t5:2} total:5

Similar terms Corpus D1 query( topic ) D2 D3 D4 D5 abstract . TF-IDF score : topic terms and abstract terms match against document contexts Scoreterms(q,D1) Scoreterms(q,D2) Scoreterms(q,D3) ….

Cited information Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Author: a4 {t4:3,t5:2} total:5 Cited information According to the feature from corpus : cite(D1) = {D2D3D5} cite(D2) = {D1D3D4D5} cite(D3) = {D1D4} cite(D4) = {D1D3} cite(D5) = {D2D4} |cite(D1) |= 3 = scorecitation-count(q,D1) |cite(D2) |= 4 = scorecitation-count(q,D2) |cite(D3) |= 2 = scorecitation-count(q,D3) |cite(D4) |= 2 = scorecitation-count(q,D4) |cite(D5) |= 2 = scorecitation-count(q,D5)

Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 D2 D3 D4 Author: a1 a2 a3 {t1:3,t3:1} Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Author: a4 {t4:3,t5:2} total:5 D1 D2 D3 D4 D5

D1 D2 D3 D4 D5 Scorepagerank(q,D1) = 0.15/5 + 0.85 * (Scorepagerank(q,D2) /4 + Scorepagerank(q,D3) /2+ Scorepagerank(q,D5) /2)

scorevenue-citation-count(q,D1) = 4 + 2 =6 Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Author: a4 {t4:3,t5:2} total:5 D3 cite(D1) = {D2D3D5} cite(D2) = {D1D3D4D5} cite(D3) = {D1D4} cite(D4) = {D1D3} cite(D5) = {D2D4} scorevenue-citation-count(q,D1) = 4 + 2 =6 a1:{ D1,D2,D3 } scoreauthor-citation-count(q,D1) = max{9,7} = 9 a2:{ D1,D3,D4 }

An author has published h papers Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Author: a4 {t4:3,t5:2} total:5 An author has published h papers each of which has been cited by others at least h times cite(D1) = {D2D3D5} cite(D2) = {D1D3D4D5} cite(D3) = {D1D4} cite(D4) = {D1D3} cite(D5) = {D2D4} h = 3 a1:{ D1,D2,D3 } scoreauthor-h-index(q,D1) = max{7,3} = 7 a2:{ D1,D3,D4 } 13

Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 Year:2007 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 Year:2008 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Year:1988 Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Year:2003 Author: a4 {t4:3,t5:2} total:5 Year:2002 Year:2009 query( topic ) scoreage(q,D1) = -(2009-2007) = -2 scoreage(q,D3) = -(2009-1988) = -21

Cited using similar terms Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 Year:2007 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 Year:2008 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Year:1988 Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Year:2003 Author: a4 {t4:3,t5:2} total:5 Year:2002 Cited using similar terms TF-IDF + cited information cite(D1) = {D2D3D5} cite(D2) = {D1D3D4D5} cite(D3) = {D1D4} cite(D4) = {D1D3} cite(D5) = {D2D4} scoreterm-citing(q,D1) = scoreterms(q,concat(term(D2),term(D3),term(D5)))

Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 Year:2007 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 Year:2008 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Year:1988 Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Year:2003 Author: a4 {t4:3,t5:2} total:5 Year:2002 avoid noise and get accurate information Use pmi to determine important words cite(D1) = {D2D3D5} cite(D2) = {D1D3D4D5} cite(D3) = {D1D4} cite(D4) = {D1D3} cite(D5) = {D2D4} pmi(t1,D1) = log (ptermciting(t1,D1) / pterm(t1)pciting(D1)) pterm(t1) = (3+2+3)/(6+4+4+6+5 ) = 8/25 pciting(D1) = 3/(3+4+2+2+2)= 3/13 ptermciting(t1,D1) = (2+3)/(13+21+12+10+10) = 5/66

Cited using similar terms Purpose : get valuable terms from cited documents, then calculate TF-IDF form words with high pmi scores in pseudo-document, call textpmi Documents that are cited at least 5 times and terms that occur at least 2 times

Similar topics topics(d)={ptopics(1,d),ptopics(2,d),ptopics(3,d)} topics(D1)={1/4,1/4,1/2} topics(q) = {1/3,1/3,1/3} scoretopics(q,D1) = cos(topics(q),topics(D1)) scoretopic-entropy(q,D1) = -(1/4*log1/4 +1/4*log1/4+1/2*log1/2)

Similar topics Purpose : compress corpus into 100 words, get accurate concept, then calculate scores Purpose : use cited information

Similar topics Purpose : focus on one concept Purpose : calculate the entropy

Social Habits Authors like to cite the paper written by him Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 Year:2007 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 Year:2008 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Year:1988 Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Year:2003 Author: a4 {t4:3,t5:2} total:5 Year:2002 Social Habits Authors like to cite the paper written by him Author: a1 a2 query scoreauthors(q,D1) = scoreterms(authors(q),authors(D1)) = tf(a1)*idf(a1)+tf(a2)+idf(a2) = 1*(1+log(5/(3+1))) + 1*(1+log(5/(2+1)))

Social Habits If author cited the document previously Author: a1 a2 Venue: CIKM Context: {t1:3,t2:2,t3:1} total: 6 Year:2007 D2 D3 D4 Author: a1 a3 {t1:2,t2:2} total:4 Year:2008 D1 D5 Author: a1 a2 a3 {t1:3,t3:1} Year:1988 Author: a2 a3 Venue: SIGIR {t2:4,t4:2} total:6 Year:2003 Author: a4 {t4:3,t5:2} total:5 Year:2002 Social Habits If author cited the document previously Author: a1 a2 query cite(D1) = {D2D3D5} cite(D2) = {D1D3D4D5} cite(D3) = {D1D4} cite(D4) = {D1D3} cite(D5) = {D2D4} scoreauthor-cited-article(q,D1) = scoreterms(authors(q) ,concat(authors(D2),authors(D3),authors(D5))) = 2*1+log(5/(2+1))+1*(1+log(5/(1+1)))

Social Habits scoreauthor-cited-article(q,D1) Don’t care D1 author cite(D1)={D2,D3,D5} D1 scoreauthor-cited-author(q,D1) Care D1 author D1D2D3D4 cite(D1),cite(D2),cite(D3),cite(D4) q D D’ cite

Social Habits scoreauthor-cited-venue(q,D1) fix D1 venue q D D’ cite cite(D1),cite(D2),cite(D3) D1D2D3

Social Habits Scoreauthors-coauthored(q,D1) D1D2D3D4 cite(D1),cite(D2),cite(D3),cite(D4) q D D’ cite {a1,a2} {a3,a4}

Weighted sum Combined all features by weighted sum Compress scale

Experiment Precision Average precision Mean average precision

Precision / Recall Precision = 400/4000 Recall = 400/500 Precision = Relevant Documents Retrieved / Total Retrieved Documents Recall = Relevant Documents Retrieved / Total Relevant Documents Example : Dataset : 10000 data 500 data related to dining food User gives a query “dining food ”, system retrieves 4000 data, 400 data related to dining food Precision = 400/4000 Recall = 400/500

Retrieval documents Q1 System Average precision Retrieval documents . . Q16 Retrieval documents System Average precision MAP = (Q1+Q2+…+Q16) / 16

Experiment

Experiment Who Should I Cite? topic + abstract reference list D1 D5 D3 System retrieve reference D1 D2 D3 D4 MAP

score(q,d) = w1*f1+w2*f2+…+w19*f19 w1 = 1, w2~w19=0 score(q,d) = w1*f1 . Model + weight Retrieve documents Calculate MAP Development data D1 D5 D3 .

data = [] put q into model m( w1=1, else = 0) Get retrieval documents d’ Training data d ref S If d’ ∈ S, label = 1 Else label = -1 data update, using classifier to adjust weights Iterative procedure

Experiment

Experiment Related work features : terms, citation-count, pagerank, age, term-citing, authors

Experiment

Conclusion Present a model for scientific article retrieval that introduces new features and new learning algorithm. Show that the weights for these features can be more efficiently learned by an iterative procedure.