Presentation is loading. Please wait.

Presentation is loading. Please wait.

 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,

Similar presentations


Presentation on theme: " CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,"— Presentation transcript:

1  CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester, MA, USA 2 University of Wisconsin Milwaukee, Milwaukee, Milwaukee, WI, USA 3 VA Central Massachusetts, Leeds, MA, USA

2 Outline  Introduction  Background  Method  Evaluation  Analysis CiteGraph, MedInfo 2013

3 Introduction  Citation network is important for  Information retrieval  Journal Impact Factor, H-index  Co-authorship network is important  Few citation networks are available for research  We built CiteGraph CiteGraph, MedInfo 2013

4 Background  Citation network analysis  Power law distribution in citation networks  Article ranking, HITS and PageRank  Community structure of physics fields  Citation network tool for given legal issue using legal document citation network  Co-authorship network analysis  Research collaboration patterns  Author authority : Erdös Number  Literature search  CiteSeer X, Google Scholar CiteGraph, MedInfo 2013

5 The CiteGraph Data CiteGraph, MedInfo 2013

6 Citation Network Example CiteGraph, MedInfo 2013

7 Challenges CiteGraph, MedInfo 2013 (1)Yu, H and Lee M. 2006. Accessing Bioscience Images from Abstract Sentences. Bioinformatics. Vol 22 No. 14, pages e547–e556. (2) Hong Yu and Minsuk Lee. Accessing Bioscience Images from Abstract Sentences. Bioinformatics. Vol 22 No. 14, pages e547–e556. 2006. (3) Yu H, Lee H. 2006. Accessing Bioscience Images from Abstract Sentences. Bioinformatics: 22 (14), e547–e556.

8 Methods  Mapping between articles  Mapping articles to the PubMed ID  Author name disambiguation CiteGraph, MedInfo 2013

9 Methods  If two of the following matching result are true, we consider the two entities (for example the citation and the article) are matched  Title matching  the set of tokens contained in one title field is a subset of the tokens in the other, or  the number of tokens common to both fields is more than 80% of the size of the larger of the two fields.  Author list matching  two lists of surnames have one-on-one mapping  surnames in one entity (citation) is fully contained in the surname set of the second (article).  Journal name matching  remove stop words such as “of”  if the number of common initials in the journal titles was greater than 80% of the tokens in the longer journal name, they were considered equivalent.

10 Evaluation Results TaskPrecisionRecallF1Inter-Annotator Agreement (Kappa) Citation Mapping10.960.981 PMID Mapping0.99 1 CiteGraph, MedInfo 2013 7 Annotators are invited to annotate the citation mapping and PMID mapping results Each annotator is presented with 20 matching results of each task

11 The CiteGraph Statistics CiteGraph, MedInfo 2013 1.65 M articles 6.35 M citations 1.37 M authors

12 The CiteGraph Statistics CiteGraph, MedInfo 2013 log y = 1.06 – 2.45* log x (p<0.05 t-test) Livak KJ., Schmittgen TD., Analysis of relative gene expression data using real- time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 2001 Dec;25(4):402-8.

13 The CiteGraph Statistics CiteGraph, MedInfo 2013 Largest connected component : 1.27 million authors (92.7%) The second largest connected component: 35 authors

14 The CiteGraph Statistics CiteGraph, MedInfo 2013 Co-authorship spans from 1 to 35 years, while 83.7% of author pairs just appear once.

15 The CiteGraph Statistics CiteGraph, MedInfo 2013 MeasureMeanMedianStdMaxMin # of Co-authors116146710 Co-authorship Year Span1.52111.576351 * The largest component is excluded when calculating the statistics in the table. Its size is 1.27 million (92.7% authors)

16 Trends CiteGraph, MedInfo 2013

17 Conclusion  We created a citation/co-authorship networks with biomedical full text literature  Our networks have high accuracy and large scale, and it can benefit biomedical text mining communities  Article ranking  Research collaboration recommendation  Social network analysis  The network database can be downloaded per request CiteGraph, MedInfo 2013

18 Acknowledgement  National Institute of Health 1R01GM095476 to Hong Yu  A start-up fund from University of Massachusetts Medical School to Hong Yu  National Center for Advancing Translational Sciences of the National Institute of Health under award number UL1TR000161. CiteGraph, MedInfo 2013


Download ppt " CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,"

Similar presentations


Ads by Google