Download presentation
Presentation is loading. Please wait.
Published bySpencer Porter Modified over 9 years ago
1
SciTech Strategies, Inc. BETTER MAPS BETTER DECISIONS Science Mapping and Applications: Choices and Trade-offs Kevin W. Boyack, SciTech Strategies Standards Workshop August 11, 2011
2
Better Maps Better Decisions SciTech Strategies Agenda Background Applications Choices and Trade-offs SciTech Choices SciTech Mapping Process Applications Enabled by SciTech Process Additional Findings Related to Choices Summary 2
3
Better Maps Better Decisions SciTech Strategies Science Mapping Most common definition: visualizing or creating a visual map of scientific information » Documents, journals, authors, words … This requires » Data acquisition and cleaning » Defining elements and similarity » Partitioning and/or layout of the data And then one creates the visual 3
4
Better Maps Better Decisions SciTech Strategies Why Do We Map Science? Basic understanding » Structure and dynamics of science High level, all of science Discipline, specialty Researchers and groups Search and retrieval » “More like this”, “related articles” Metrics » Research » Evaluation Policy / Decision Making 4
5
Better Maps Better Decisions SciTech Strategies Why Do We Map Science? Structure and dynamics » High level, all of science » Discipline, specialty » Researchers and groups Search and retrieval » “More like this”, “related articles” Metrics » Research » Evaluation Policy 5 Academic View Structure and dynamics of science » High level, all of science » Discipline, specialty » Researchers and groups Search and retrieval » “More like this”, “related articles” Metrics » Research » Evaluation Policy World View
6
Better Maps Better Decisions SciTech Strategies Map of Science & Technology Indicators 6
7
Better Maps Better Decisions SciTech Strategies Choices Data source Unit of analysis » Document, Journal, Author, Word (phrase) Sample size » Specialty, Discipline, All of science, Single year, Multiple years Similarity approach » Citation, Text, Thresholds Partitioning and/or layout » MDS, K-K, F-R, DrL (OpenOrd), Blondel, … 7
8
Better Maps Better Decisions SciTech Strategies Trade-offs 8 ChoiceTrade-offsComments Data source- Free vs. Costly - Single vs. Multiple Sources - Content vs. Context - Citation data isn’t free - De-duplication isn’t trivial - Coverage comes at a cost Unit of analysis- Breadth vs. Detail- Base on research question - Don’t base on data available Sample size- Specialty vs. All of science - Content vs. Context - Single year vs. Multiple years - Mapping ALL is costly - Specialty maps lack context - Stability or instability? Similarity approach- Simple vs. Complex - Citation vs. Text vs. Hybrid - Threshold vs. Accuracy - Comp cost: Index < Vector - Hybrid costly, but likely best - How much is really needed? Partitioning / Layout- Simple vs. Complex - Accuracy vs. Useability - Simple often size limited - Is intuition satisfied? - Are distributions reasonable? - Useful levels of aggregation?
9
Better Maps Better Decisions SciTech Strategies Mapping Choices Are often made based on what is available » Data, algorithms, expertise When they should be made based on » The research question or application » Balancing of the applicable trade-offs If the application is EVALUATION » Special care must be taken » Partition accuracy is critical What is a discipline? 9
10
Better Maps Better Decisions SciTech Strategies Our Choices 10 ChoiceTrade-offsComments Data source- Free vs. Costly - Single vs. Multiple Sources - Content vs. Context - Citation data is essential and worth paying for - Future: full text? Unit of analysis- Breadth vs. Detail - Documents Sample size- Specialty vs. All of science - Content vs. Context - Single year vs. Multiple years - Mapping ALL of SCIENCE - Context is very useful - Link single years - Show instability Similarity approach- Simple vs. Complex - Citation vs. Text vs. Hybrid - Threshold vs. Accuracy - Co-citation analysis - Future: hybrid likely - Top-n only, rest is noise Partitioning / Layout- Simple vs. Complex - Accuracy vs. Useability - Complex (DrL x 10), robust - Small clusters - We are continually testing, refining, and improving our processes
11
Better Maps Better Decisions SciTech Strategies Standards? 11 ChoiceTrade-offsStandards? Data source- Free vs. Costly - Single vs. Multiple Sources - Content vs. Context Bibliographic data Cleaning to different levels Use what you can get! Unit of analysis- Breadth vs. Detail Articles, journals, authors Sample size- Specialty vs. All of science - Content vs. Context - Single year vs. Multiple years Similarity approach- Simple vs. Complex - Citation vs. Text vs. Hybrid - Threshold vs. Accuracy Partitioning / Layout- Simple vs. Complex - Accuracy vs. Useability Anecdotal validation in most cases
12
Better Maps Better Decisions SciTech Strategies Possible Standards Units Data sources Data cleanliness Workflows Validation Datasets Products (e.g. reference maps?) … 12
13
Better Maps Better Decisions SciTech Strategies Our Process 13 Generate annual models, 2000-2008 Select single year of source data Apply thresholds to select references Generate relatedness matrix (k50/cosine) Filter matrix to top-n values per reference Cluster references intellectual base Assign current papers research front Link communities from adjacent years using overlaps in their intellectual bases threads Link annual models Research communities
14
Better Maps Better Decisions SciTech Strategies Process-Specific Applications Highly granular model of science gives detailed structure 14
15
Better Maps Better Decisions SciTech Strategies Process-Specific Applications Detailed structure enables metrics for multidisciplinary sets of topics Competencies is the name we use for these sets of topics 15
16
Better Maps Better Decisions SciTech Strategies Process-Specific Applications Research leadership (based on competencies) rather than research evaluation (based on rankings) 16 UCSDTAMU UCSF MITStellenbosch Delhi
17
Better Maps Better Decisions SciTech Strategies Process-Specific Applications Mapping all of science shows context for target literature 17
18
Better Maps Better Decisions SciTech Strategies Process-Specific Applications Linking annual models shows detailed evolution and instability Consider a topic in a given year. It can be linked (or not) in time in the following ways: 18 Prior year linkage Following year linkage
19
Better Maps Better Decisions SciTech Strategies Process-Specific Applications Linkage types correlate with various properties (e.g. textual coherence, prolific authors) These can be used as predictors of future linkage » Identification of emerging topics at the micro-level 19 Prior year linkage Following year linkage Isolate 34% Death 11% Birth 14% Continuation 41% Stasis
20
Better Maps Better Decisions SciTech Strategies Process-Specific Applications Superstar authors (top 1%) are predictive » Superstars avoid isolates » Superstars move out of topics before they die » Superstars are heavily involved in splits and merges (non-stasis parts of continuation) 20 Isolate 34% Death 11% Birth 14% Continuation 41% Stasis
21
Better Maps Better Decisions SciTech Strategies Accuracy of Similarity Approaches Identified a large data corpus (2.15M articles) » Textual information (abstracts, MeSH) from Medline » Citation information from Scopus Compare 13 approaches – 3 citation, 9 text, 1 hybrid For each approach » Calculate article-article similarities » Filter similarities to reduce the number to a similarity file of reasonable size (2.15M articles, 20M sim pairs) » Cluster the articles » Measure accuracy of the cluster solution using multiple metrics Coherence (Jensen-Shannon divergence) Grant-to-article linkage concentration Analyze results 21
22
Better Maps Better Decisions SciTech Strategies Results 22 MethodComp Cost% Coverage Coherence (combined) HerfindahlPr80 (all grants) Co-word MeSHMedium95.77%0.0758 0.1631 0.2216 LSA MeSHVery high98.22%0.0479 0.1124 0.2127 SOM MeSHVery high99.97%0.0409 0.1106 0.2203 bm25 MeSHMedium93.39%0.0759 0.1570 0.2167 Co-word TAHigh83.41%0.0685 0.1299 0.1571 LSA TAVery high90.92%0.0715 0.1255 0.2003 Topics TAHigh94.40%0.0875 0.1584 0.2379 bm25 TAHigh93.91%0.0979 0.2393 0.2578 pmraLow94.23%0.1055 0.2410 0.2637 Bib couplingLow96.62%0.10010.28490.2706 Co-citationLow98.37%0.09470.23780.2621 Direct citationLow92.68%0.07020.20370.2480 HybridMedium96.83%0.10140.28930.2752
23
Better Maps Better Decisions SciTech Strategies Summary Science mapping involves a great number of choices » There are many trade-offs to consider » Choices constrain or enable analysis types SciTech has made a set of mapping choices, and developed a methodology that enables » Development of a detailed, multi-year model of science for structure and evolution » Metrics on competencies (sets of topics) that are important at the institutional level » Context to be mapped along with any target literature » Prediction of continuance » Identification of emerging topics We continue to explore accuracy and usefulness of maps 23
24
Better Maps Better Decisions SciTech Strategies Relevant Literature Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., Schijvenaars, B., Skupin, A., Ma, N., & Börner, K. (2011). Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLOS One, 6(3): e18029. Klavans, R., & Boyack, K. W. (2011). Using global mapping to create more accurate document-level maps of research fields. Journal of the American Society for Information Science and Technology, 62(1), 1-18. Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389-2404. Klavans, R., & Boyack, K. W. (2010). Toward an objective, reliable, and accurate method for measuring research leadership. Scientometrics, 82(3), 539-553. Klavans, R., & Boyack, K. W. (2009). Toward a consensus map of science. Journal of the American Society for Information Science and Technology, 60(3), 455-476. Boyack, K. W. (2009). Using detailed maps of science to identify potential collaborations. Scientometrics, 79(1), 27-44. Klavans, R., & Boyack, K. W. (2006). Quantitative evaluation of large maps of science. Scientometrics, 68(3), 475-499. Klavans, R., & Boyack, K. W. (2006). Identifying a better measure of relatedness for mapping science. Journal of the American Society for Information Science and Technology, 57(2), 251-263. Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351-374. Boyack, K. W. (2004). Mapping knowledge domains: Characterizing PNAS. Proceedings of the National Academy of Sciences, 101, 5192-5199. Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37, 179-255. 24
25
Better Maps Better Decisions SciTech Strategies Thank you! 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.