The Claim Framework Catherine Blake clblake@illinois.edu School of Library and Information Science University of Illinois at Urbana-Champaign clblake@illinois.edu
Shift from Retrieval to Synthesis Motivation Relentless increase in electronically available text Life Sciences 17 millionth entry added in April 2007 5,200 journals indexed 12,000 new articles each week ! Chemistry – more than 110,000 articles in 1 year alone Consequences: Hundreds of thousands of relevant articles Implicit connections between literature go unnoticed Shift from Retrieval to Synthesis
The Claim Framework Scientists use a shared sublanguage to express claims made in an empirical study The Claim Framework captures the key characteristics of the claim sublanguage Text mining can be used to populate the Claim Framework automatically An automated system will identify all and only the claims that have been identified manually
Claim Definition “To assert in the face of possible contradiction” Example sentence reporting a claim “This study showed that Tamoxifen reduces the breast cancer risk” Explicit Claim in the Claim Framework Tamoxifenagent reduceschange [breast cancer risk] object
Distribution of Claim Categories Category Total (%) Pilot(%) Main(%) Explicit 2489 77.11 332 83.42 2157 76.63 Implicit 87 2.70 3 0.75 84 2.98 Observation 298 9.23 24 6.03 274 9.73 Correlation 174 5.39 12 3.02 162 5.75 Comparison 165 5.11 27 6.85 138 4.9 Total 3228 100 398 2830
Inter Annotator Agreement Information Facet Kappa Agreement Agent 0.71 substantial Object 0.77 substantial Change 0.57 moderate Change+ChangeDir 0.88 almost perfect
Location of Claims Total Sentences With % Section Claim Total section With % Section Claim Total section claim Abstract 98 309 31.72 7.84 Introduction 357 979 36.47 28.56 Method 6 1121 0.54 0.48 Result 293 1829 16.02 23.44 Discussion 539 1406 38.34 43.12 1250 5535 22.58 100.00
Interested ? Send me an email clblake@illinois.edu To see more details on the Claim Framework and an automated approach to populate explicit claims: Blake, C. (2010) Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles, Journal of Biomedical Informatics, 43(2), 173-189.