Systematic Review Assistant De-duplication Module (SRA-DM) Elaine Beller on behalf of Matt Carter, John Rathbone, Paul Glasziou Centre for Research in Evidence-Based Practice Bond University, Australia
The problem The solution search multiple databases for trials for specific reviews maintain trials citation registers (e.g. CRS) same citation in multiple databases journal overlap MEDLINE and EMBASE 10% to 75% citation management software - very simple duplicate detection manual detection of duplicates needed or, reviewers screening same citation more than once! de-duplication tool within Systematic Review Assistant package (SRA-DM) Aims reduce time taken to produce systematic reviews reduce time spent maintaining trials citation registers
Methods Heuristic based approach Iterative development of algorithm (4 iterations) Benchmark dataset of n=1988 citations Validation datasets cytology screening tests (n=1856) stroke (n=1292) haematology (n=1415) Sensitivity, specificity against manual de-duplication
PICO P – search results from typical systematic reviews I – Index test - SRA-DM C – Comparator test – EndNote O – human being de-duplication Outcome measures - sensitivity & specificity
Sensitivity and Specificity correct classification of a record as unique important for systematic reviews don’t want to miss any included studies false positive detection of duplicates bad! particularly if process fully automated Sensitivity correct classification of a record as duplicate higher sensitivity reduces manual detection or subsequent duplicate screening by reviewers
Results - What caused duplicates? Variations in author name First name / last name sequence Initials / full name Missing author names Abbreviated / full journal names Text accents (e.g. French/German/Spanish) ‘The’ used / not used in journal and article titles Recording of page numbers (e.g. 590-595 vs 590-5)
Results – SRA-DM vs Endnote Sensitivity Specificity 2 1 2 4 139 81 38 125 134 391 518 87 Numbers on bars indicate # incorrectly identified as duplicates Numbers over bars indicate # missed duplicates
Where does SRA-DM fit? Now aids manual de-duplication of citations from multiple databases present user with de-duplicated set for screening Future integration of algorithm in other tools that automatically search and snowball from citations present automated screener software with de-duplicated set
Summary De-duplication tool for merged libraries of citations Good sensitivity, high specificity finds most duplicates algorithm optimised for very low false positives Common library and software formats supported XML RIS CSV Open source, free to use Algorithm can be embedded in other tools