Biomedical innovation at the laboratory, clinical and commercial interface. Mapping research grants, publications and patents in the field of microarrays Andrei Mogoutov, Alberto Cambrosio, Peter Keating & Philippe Mustar 6 th Biennial International Triple Helix Conference on University-Industry Government-Links Singapore, May 16-18, 2007
Main goals of this paper: To analyze biomedical innovation by triangulating three sources of information: publications, patents and research projects (see Verganti et al.) In particular: to develop a methodology for linking publication, patent and project databases by using emergent (rather than pre-established) categories Methods: Heterogeneous network analysis (ReseauLu X2) Text-mining (SPSS LexiQuest Mine)
Case study: Microarrays A DNA microarray (a.k.a as biochip, DNA chip, gene array, etc.) is a collection of microscopic DNA spots, commonly representing single genes, arrayed on a solid surface by covalent attachment to chemically suitable matrices Compared to previous molecular genetic approaches, a microarray experiment involves the simultaneous analysis of many hundreds or thousands of genes, as opposed to single ones Microarrays have become a key technology of the (post)genomic era Annual compounded growth rate of the microarray market between : 63%
Databases Publications: PubMed: robust keyword system; biomedical Web of Science: addresses and citations; general S&T [PubMed/WoS intersection] Research Projects: CRISP: NIH-financed projects; biomedical [NSF] Patents: Derwent Innovation Index [USPTO]/ [EUPTO]
1. Characterizing the field of microarrays
Publications (PubMed)
Publications (Web of Science)
Publications: most cited authors
Mapping: ReseauLu X2
Co-authorship network (most cited authors)
Collaborative institutional network
Institutional network (4 nearest nodes) biotech company regulatory agency hospital university
Journal inter-citation network (5 nearest nodes) cancer cluster
Patents (Derwent)
CRISP Projects
2. Database bridges 2a. Via authors and pre-established (institutional) categories
CRISP projects by Institute
CRISP projects vs. Publications Link via authors Categories by Institutes
CRISP projects vs. Citations Link via authors Categories by Institutes
CRISP projects vs. Patents Link via authors Categories by Institutes
2. Database bridges 2b. Via content (emergent categories)
Text mining: SPSS LexiQuest Mine and Text Mining Builder Concept extraction Dictionary interface
Methodology for generating emergent categories The chosen database is text-mined (NLP software) to extract the relevant concepts (composite terms and uniterms): in the present case, WoS was chosen over CRISP because it includes biomedical and non-biomedical domains The most relevant (specific) concepts are selected by using a ChiSq filter After building a co-occurrence map (nearest nodes), clusters corresponding to sub- domains are identified by a modified fuzzy K- means clustering algorithm The list of concepts defining each sub- domain is used to analyze the other databases
Emergent sub-domains
Publications (WoS) by sub-domains
CRISP projects by sub- domains
Patents by sub-domains
% of sub-domains in projects, patents and publications
SNPs
Bioinformatics
Acknowledgments Research for this paper was supported by grants from: CIHR FQRSC SSHRC