Download presentation
Presentation is loading. Please wait.
Published byJack Carr Modified over 9 years ago
1
Beyond Genes, Proteins, and Abstracts: A Framework to Capture Scientific Claims Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill http://www.ils.unc.edu/~cablake cablake@email.unc.edu
2
2 Motivation Relentless increase in electronically available text –Life Sciences The NLM added the 17 million th entry to PubMed in April 2007 5,200 journals indexed 12,000 new articles each week ! –Chemistry – more than 110,000 articles in 1 year alone Consequences: –Hundreds of thousands of relevant articles –Implicit connections between literature go unnoticed Shift from Retrieval to Synthesis
3
Entity Extraction Newspaper genre –People, places, and organizations –Message Understanding Conference (MUC) Biomedical genre –Genes and proteins –Diseases and treatments –Chemical compounds –Challenges: BioCreative, GENIA, JNLPBA 3
4
Relationship Extraction Newspaper genre –Person moving from one company to another Biomedicine genre –genes and proteins e.g. binds, inhibits –ARBITER (Rindflesch, Rajan, & Hunter, 2000) –Geneways (Rzhetsky, et al, 2004) –relEx (Fundel, Kuffner, & Zimmer, 2007) –GENIA www-tsujii.is.s.u-tokyo.ac.jp/GENIA 4
5
Causal Relationships Newspaper genre –Causal relationships (Khoo, Chan, & Niu, 1998) Biomedical genre –Causes and treats (Price & Delcambre, 2005) –Causal knowledge (Khoo, Chan, Niu, 2000) Universal Grammar –Causatives (Comrie, 1974, 1981) –Action verbs (Thomson, 1987) 5
6
Claim Definition “To assert in the face of possible contradiction” Example sentence reporting a claim –“This study showed that Tamoxifen reduces the breast cancer risk” Example Claim Framework –Tamoxifen agent –reduces change –[breast cancer risk] object 6
7
Goals Create a Framework that reflects how claims made in biomedical literature The Framework should –generalize beyond biomedicine –differentiate between different levels of confidence in the claim –consider claims made in the full text Populate the Framework automatically 7
8
The Claim Framework Information facets –concepts –change –basis of the claim Each information facet may have –modifiers –directionality 8
9
The Claim Framework 9 CategoryConcept AConcept B Nature of change Claim Basis 1. Explicit ClaimAgentObjectRequiredOptional 2. Implicit ClaimAgentObjectOptional 3. CorrelationRequired Optional 4. ComparisonRequired 5. ObservationN/ARequired Optional
10
Explicit Claims Indeed, glycine prevented Wy-14643-stimulated superoxide production by Kupffer cells. Claim 1 –glycine agent – prevented change –[Wy-14643-stimulated superoxide production] object Claim 2 –[Kupffer cells] agent – produces change –[Wy-14643-stimulated superoxide] object. 10
11
Implicit Claims In liver the number of peroxisomes increases from about 500-600/cell to > 5000/cell after exposure to peroxisome proliferators. Claim 1 –[Peroxisomes proliferators] agent –increases changeDirection –Peroxisomes object –[In the liver] agentModifier – [number] agentModifier 11
12
Correlations A weak but statistically significant correlation was observed between the plasma nm23-H1 level and the WBC count (Figure 1, n=102, r=0.437, P<0.0001) –[plasma nm23-H1 level] agent –[WBC count] object –correlation change –[statistically significant] changeModifier 12
13
Comparisons The plasma concentration of nm23-H1 was higher in patients with AML than in normal controls (P =.0001) Claim 1 –[plasma concentration of nm23-H1] basis of claim –[Patients with AML] agent –higher changeDirection –[normal controls] object 13
14
Observations However, the plasma nm21-H1 protein level was increased in SML-M3 patients (P=.0002) Claim 1 –[nm21-H1 protein level] object –Increased changeDirection –[SML-M3 patients] objectModifier 14
15
Working Hypothesis 1 The Claim Framework reflects how a scientist communicates her findings –Full text documents randomly selected from biomedical literature will report findings using constructs within the Claim Framework –Human annotators will agree on facets within the Claim Framework –The Claim Framework will generalize to a variety of scientific literatures 15
16
Working Hypothesis 2 Facets within the Claim Framework can be populated automatically –The system will detect all claims identified by the human annotators (i.e. recall) –The system will only identify claims that were identified by the human annotators (i.e. precision) –The system design will generalize to new literatures by avoiding domain specific constructs 16
17
Validating the Claim Framework Draft Claim Framework given to two annotators Pilot Study: Identify every claim –Include claims that don’t conform to the framework –Don’t consider how this will be automated 17
18
Validating the Claim Framework Main study –25 articles Verification –Random set of sentences annotated twice –Feedback provided daily 18
19
Results All documents –Total number of sentences: 5535 –Sentences with >=1 claim: 1250 (22.6%) –Total number of claims: 3228 –Average claims per sentence: 2.51 –Claims that did not fit in the Framework: 31 Per document –Average number of sentences: 191 –Average number of sentences with >=1 claim:43 19
20
Distribution of Claim Categories 20 CategoryTotal (%)Pilot(%)Main(%) Explicit248977.1133283.42215776.63 Implicit872.7030.75842.98 Observation2989.23246.032749.73 Correlation1745.39123.021625.75 Comparison1655.11276.851384.9 Total32281003981002830100
21
21 All Documents AnnotationTotal (%)Words (Avg) Agent289489.6552211.80 Agent Direction2858.832911.02 Agent Modifier124638.6044483.57 Object319799.0468492.14 Object Direction2718.402831.04 Object Modifier156148.3653833.44 Change189758.7719531.03 Change Direction133741.4213581.02 Change Modifier114735.5316181.41 Claim Basis1655.113942.39 Claim Basis Dir.421.30431.02 Claim Basis Mod.862.662663.09 Total3228 281078.70
22
Inter Annotator Agreement Information FacetKappaAgreement Agent0.71 substantial Object 0.77 substantial Change 0.57 moderate Change+ChangeDir 0.88 almost perfect 22
23
Location of Claims 23 Total Sentences With% SectionClaimTotalsectionclaim Abstract9830931.727.84 Introduction35797936.4728.56 Method611210.540.48 Result293182916.0223.44 Discussion539140638.3443.12 Total1250553522.58100.00
24
Findings thus far 99% of the claims made in these articles could be captured in the Claim Framework 22% of sentences report at least 1 claim 77% of the claims identified were explicit 8% of claims are made in the abstract Agreement –substantial between agents and objects –almost perfect for change and change direction 24
25
Acknowledgements –This project supported in part by –Renaissance Computing Institute (RENCI) Faculty Fellowship Program –NSF Center for Environmentally Responsible Solvents and Processes (CERSP CHE-9876674) –This project used resources provided by –the OSG, which is supported by the NSF & the U.S. Department of Energy's Office of Science The speaker thanks Nassib Nassar and Mats Rynge (RENCI) Amol Bapat and Ryan Jones (SILS)
26
Questions and Comments Welcome Catherine Blake cablake@email.unc.edu http://www.ils.unc.edu/~cablake School of Information and Library Science University of North Carolina at Chapel Hill
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.