Download presentation
Presentation is loading. Please wait.
1
Where Do You Go for Biomedical Funding? Yi Liu, Ahmet Altay
2
Background Problem o In biomedical research there are many sources of federal funding. o How to choose the right institution for funding for a given research idea? Data o Biomedical grant summaries from 20 institutions between the period 1972 and 2009
3
Pre-Processing Clean up texts from mark-up/meta words/duplicates Remove institutions with less than 5000 grant information Bag-of-words approach with a pre-determined dictionary o Removed 319 stop words from text o Used stemming (Porter) to further collapse text o Dictionary size of 83485 with 120636 distinct spellings Use mgrep to annotate our data with dictionary words
4
Histogram for Stems per Abstract
5
Processing Generate a TFIDF matrix given the dictionary and abstracts TFIDF matrix is huge (83435 by 561769) Reduce TFIDF matrix for computational efficieny o Remove zero dictionary counts and abstracts o Use SVD and represent use a smaller sub-space of original matrix o Singular values decrease quickly. We used first 100 eigen vectors without losing much precision.
6
Distribution of Singular Values
7
Effect of Using Eigen Sub-space Tested performance of smaller data set (400). Performance of raw TFIDF is similar to eigen sub-space.
8
Evaluation For a given test abstract we used kNN search to find 100 closest abstracts. Used a custom scoring algorithm to pick a grantor that best represents 100 nearest neighbors found: Tested entire data set using Leave-1-out cross-validation
9
Results (1)
10
Results (2)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.