Who’s Sharing with Who? Acknowledgements-driven identification of resources David Eichmann School of Library and Information Science & Information Science Track, Iowa Graduate Program in Informatics
Motivation Public information regarding collaboration networks are partial and post-hoc Grants and publications Research profiling systems (e.g., VIVO) primarily feed on the above data Institutional grant tracking systems carry data on attempts at collaboration, but are not open
Goals Extend the model to include informal interactions Explore the degree to which sharing of data, resources, etc. can be identified from full text of papers
Melissa’s LinkIn Map
Holly Falk-Krzesinski’s LinkedIn Map
Ferrets in CTSAsearch
PubMed Central Open Access 886,172 papers (as of Thursday) 423,764 with acknowledgements 994,931 sentences 4,329,972 parses
The Simple Cases PMCID: SeqNum: 2 SentNum: 6 Sentence: EK analysed the data. POS: [EK/NNP, analysed/VBD, the/DT, data/NNS,./.] Parse: [S [NP EK/NNP ] [VP analysed/VBD [NP the/DT data/NNS ] ]./. ]
And the Not So Simple… PMCID: Sentence: We thank Sheila Harvey, Clinical Trials Unit Manager at ICNARC, and Ruth Canter, Trials Administrator at ICNARC, for their assistance in chasing completed surveys; Dr Kevin Gunning for early advice and project development; Drs Neill K. J. Adhikari and Gordon D. Rubenfeld for feedback and discussion of analysis plan; Dr Chris AKY Chong for his valuable comments on the initial draft of this manuscript; and our Responders: Addenbrooke’s Hospital ( Dr Kevin Gunning ), Airedale General Hospital ( Dr John Scriven ), Alexandra Hospital ( Dr Tracey Leach ), Arrowe Park Hospital ( Dr Lawrence Wilson ), Barnet Hospital ( Dr AH Wolff ), … 8,245 character long sentence
Syntax Fragment Frequency Approach Walk the syntax trees and for every interior node (basically phrases), generate a syntax fragment of depth 2 [S [NP EK/NNP ] [VP analysed/VBD [NP the/DT data/NNS ] ]./. ] [S [NP NNP ] [VP VBD [NP DT NNS ] ]. ] [NP EK/NNP ] [VP VBD [NP DT NNS ] ] [NP DT NNS ]
SFF Approach, con’t. Frequency distribution Fragments / DocumentFrequency
SFF Approach, con’t. Frequency distribution Fragments / DocumentFrequency
SFF Approach, con’t. Prior to fragmentation, annotate nodes with entity classes This is domain-specific and run-time extensible [S [NP EK/NNP ] [VP analysed/VBD [NP the/DT data/NNS ] ]./. ] [S [NP:Author NNP:Author ] [VP VBD [NP:Resource ] ]. ]
Frequency Distribution of Fragments Total distinct patterns: 4,090,978 1,768,966 [NP:Project DT NN:Project ] 1,074,603 [NP NN ] 725,626 [NP:Author NN:Author ] 657,897 [NP:Author PRP ] 654,904 [NP:Place NNP:Place ] 654,565 [ADVP RB ] 644,590 [NP:Person NNP NN ]
Filtering for Top Nodes (Sentences) Total distinct patterns: 523,602 (87% reduction) 600,618 [S [VP TO [VP ] ] ] 452,753 [S [NP:Project DT NN:Project ] [VP VBD [VP ] ] ] 169,990 [S [NP:Project DT NN:Project ] [VP VBD [VP ] ]. ] 115,543 [S [VP VBG [NP ] ] ] 79,036 [S [NP:Author NN:Author ] [VP NN [NP ] ] ]
Filtering for Co-mentions of Authors and Persons Total distinct patterns: 7,870 (98% reduction) 26,703 [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ] ]. ] 20,395 [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ] ] ] 16,588 [S [NP:Author PRP ] [VP VBP [NP:Person ] [PP ] ]. ] 16,034 [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ] [PP ] ]. ] 9,149 [S [NP:Author PRP ] [VP VBP [NP:Person ] [PP ] [PP ] ]. ]
Extract Entities/Relationships with Syntactic Queries [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ], [PP ] ] ] S <1NP:Author <2[VP <1/thank/ <2(NP) <3(PP) ] For the sentence having this pattern, match the object noun phrase and the next prepositional phrase NP <#2 <1(NNP) <2(NNP) For the noun phrase, extract two proper nouns PP <#2 <1DT <2(NP) For the prepositional phrase, match the noun phrase
Person Results Snippet IDTitleFirst NameMiddle NameLast Name 76HansMatrin 77JeffVieira 78P.ZAMORE 79Prof.EricSchon 80CarlosLois 81Andrea Möll 82ElenaGovorkova 83K.M.Pollard 84Dr.MichaelBerton
Relationships for Person 77 PMCIDCategoryPP Supportthe kind gift of rKSHV Supportthe kind gift of rKSHV.219 and for helpful discussions Collaborationhelpful discussions
Relationships for Person 79 PMCIDCategoryPP Resourcethe rabbit polyclonal antibody Resourcethe ECFP and EYFP plasmids Collaborationhis helpful advice and discussions
Category Frequencies CategoryCount Collaboration47,052 46,327 Technique33,598 Resource8,894 Support6,836 Event3,744 Project854 Place Name229 Publication Component 210 Place186 Organization93
Next Steps Continue slogging through extraction pattern definition Define patterns for funding declarations chairs, fellowships, etc. Merge data into CTSAsearch visualizations Align current category scheme with Melissa Haendel’s current draft ontology for CASRAI taxonomy and then merge with VIVO-ISF
Questions?