Download presentation
Presentation is loading. Please wait.
Published byElijah Stephens Modified over 9 years ago
1
Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006
2
Don Swanson Undiscovered Public Knowledge “A affects B”, (separately) “B affects C” Does A affect C? The pieces are all public, but need to be put together to see a pattern
3
One node (open) search: formulate a problem (literature A) find a different literature C containing complementary information focus on implicit links between A and C But… most scientists already have more hypotheses and leads than they can handle!
4
The Two Node (Closed) Search Link between A and C is either known (often newly discovered) or hypothesized Examine title terms B in common between A and C as possibly pointing to meaningful links A and C don’t have to be disjoint!
5
The Arrowsmith Project Human Brain Project, NLM, NIMH Public web interfaces for one and two node searches Develop the system further in collaboration with neuroscience field testers
6
http://arrowsmith.psych.uic.edu
7
Lessons from Field Testers Used Arrowsmith two node search for many daily information needs finding, assessing or prioritizing hypotheses Items studied in common to two literatures Browsing unfamiliar lit C for the subset that is likely to be most relevant to familiar lit A Arrowsmith as an extension of PubMed searches
8
Lessons for the “Back End” Two node searches need to be fast (seconds, not minutes), B-list needs to be assessed quickly (seconds or minutes, not hours) No need to be comprehensive No need to find only “novel” links
9
Filtering and Ranking B-terms Features permitting users to filter and rank B- terms: Semantic categories Frequency Recency MeSH Characteristic-ness Coherence Stoplist
10
A quantitative model for filtering and ranking B-terms Even though each search is different, and each person has their own idea of “relevance”, can identify features that are associated with chosen B-terms Chose 5 gold standards, with user-chosen positive and *negative B-terms combined all 7 features into single logistic regression model (optimal weighting of each feature, 1 score for each B-term; score varies for each 2 node search)
11
IDID A-literature queryC-literature queryRaw B- terms Relevant B-terms sought 1retinal detachment[ti] n = 5122 aortic aneurysm[ti] n = 5687 n = 2294 a) diseases or syndromes in which both features have been described n = 30 b) surgical procedures used for diagnosis or treatment of both n = 26 2mglur5[ti] OR (metabotropic glutamate receptor[ti] OR metabotropic glutamate receptors[ti]) n = 2032 Lewy body[ti] OR Lewy bodies[ti] n = 1141 n = 820 a) signaling molecules that directly or indirectly modulate or are modulated by mGluR5 and that either modulate Lewy bodies or are altered in diseases that have Lewy bodies n = 19 b) specific brain regions studied in both n = 42 3"magnesium"[MeSH Terms] AND magnesium[ti] AND ("1900"[PDAT] : "1987/12/31" [PDAT]) n = 6238 ("migraine disorders" [MeSH] AND migraine[ti]) AND ("1900"[PDAT] : "1987/12/31"[PDAT]) n = 3205 n = 1879 terms described as relevant in the JASIST paper (ref. 23, in Appendix) excluding two judged too general to be useful (reactivity and spreading) n = 41 4beta-amyloid precursor protein [ti] OR amyloid precursor protein[ti] OR APP[ti] AND ("amyloid" [MeSH Terms] OR amyloid [Text Word]) n = 2118 reelin[All Fields] n = 493 n = 1003 genes or proteins shared in Reelin and APP (amyloid precursor protein) signal transduction pathways n = 54 5("nitric oxide"[MeSH Terms] OR nitric oxide[ti]) AND (("mitochondria"[MeSH Terms] OR mitochondria[ti]) OR mitochondrial[ti]) n = 786 (psd[ti] OR psd93[ti] OR psd95[ti] OR psds[ti]) OR "postsynaptic density"[ti] OR "postsynaptic densities"[ti] n = 545 n = 584 physiological or pathological responses that link the action of nitric oxide on mitochondria and the normal function of post-synaptic densities n = 51
12
Some Findings of the Model Coherence was most important in identifying relevant B-terms. Characteristic value, semantic category mapping, frequency and recency all contributed significantly as well. > 5% of the marked relevant B-terms in the gold standard searches were terms found on the 1400 word stoplist (e.g., Down Syndrome)
16
Implications We can now rank all B-terms rigorously and automatically, in order of the probability that they will be found relevant by SOME user We can now predict the NUMBER of relevant B-terms in any given search Can apply to B-terms arising within abstracts We now have a global measure of OVERALL implicit information linking two (topical, disjoint) literatures Can apply to one node searches too!
17
Conclusion The two node search can now be conducted and analyzed in a matter of minutes, not hours or days Can be utilized by the general scientific public for a variety of information needs, including but NOT restricted to searching for and assessing hypotheses
18
Thanks to…. Vetle Torvik Don Swanson Wei Zhou Maryann Martone & Guy Perkins Ramin Homayouni Bob Bilder & Don Kalar
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.