Download presentation
Presentation is loading. Please wait.
Published byRaymond Sullivan Modified over 9 years ago
1
Co-Cited Author Maps as Interfaces to Digital Libraries: Kohonen and PFNet Displays for the Humanities Howard D. White Jan Buzydlowski Xia Lin College of Information Science and Technology Drexel University, Philadelphia, PA
2
Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document. The count of mentions may grow over time as new writings appear. Thus, co-citation counts can reflect citers’ changing perceptions of documents as more or less strongly related. Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space. Co-Citation Analysis Doc 1 Doc 2 Doc 3
3
Co-Citation Analysis Lin, Xia. 1997. Map Displays for Information Retrieval. Journal of the American Society for Information Science 48: 40-54. Chen, Chaomei. 1998. Bridging the Gap: The Use of Pathfinder Networks in Visual Navigation. Journal of Visual Languages and Computing 9: 267-286. l Document co-citation counts times two papers are cited together. l Author co-citation counts times two authors, e.g., Lin and Chen, are cited together. l Journal co-citation counts times two journals are cited together.
4
Co-Citation Analysis l Data on co-citation are readily obtainable from databases of the Institute for Scientific Information (ISI) in Philadelphia, PA: Scisearch (Science Citation Index) Social Scisearch (Social Sciences Citation Index) Arts & Humanities Search (Arts & Humanities Citation Index) l These databases are searchable online through, e.g., the Dialog Corporation.
5
Author Co-Citation Analysis (ACA) Detects patterns in the frequency with which any works by any two authors are jointly cited in later works. l Only recurrent co-citation is significant: the more times authors are cited together, the more strongly related they are in the eyes of citers.
6
Author Co-Citation Analysis l If Ben Shneiderman and Shakespeare are cited together in one article, it probably means little. l If Ben Shneiderman and Stuart Card are cited together in 205 articles,* it means a lot: their names have jointly come to symbolize something like “interactive interfaces for digital libraries.” Possibly no subject heading captures this concept. l In a cited-author (CA) search on Dialog, SELECT CA=SHNEIDERMAN B AND CA=CARD SK would retrieve the 205 citing articles. *Actual count, 7/10/00
7
Underlying Database and Software l ISI gave our college 10 years’ worth of data from the Arts & Humanities Citation Index (AHCI 1988-1997) as a research grant. Has 1.26 million bibliographic records on articles and other items from humanities journals. l For retrievals from AHCI, we bought BRS Search, an industrial-strength engine, from Dataware, Inc. l Buzydlowski and Lin have written several special programs in Java and C to implement our system on top of the BRS Search software.
8
Our Project l Produces co-cited author maps in real time (a few seconds) on a Web site. l Low cognitive load: User merely has to enter name of a single author of interest as a “seed.” E.g., Dickinson-E for Emily Dickinson l System responds with the top authors co-cited with that seed—about 25 names ranked by frequency of co-occurrence.
9
Quick Visualizations of a Database l User can choose to display the top 25 as either a Kohonen feature map (SOM, self-organizing map) or a Pathfinder network map (PFNET). l User can use either map as An aid to retrieving articles from AHCI 1988-97 that cite authors in various combinations. Combinations are made through drag-and-drop. Reproducible artwork in a new study, such as a review of a literature or a commentary on the author used as “seed.”
14
Maps in the Humanities l We are able to produce maps of authors in the humanities with high face validity. Can build maps around great names in literature, philosophy, history, religion, the fine arts. E.g., Dante, Picasso, D. H. Lawrence, Martin Luther, Edward Gibbon, Emily Dickinson, Plato, Vladimir Nabokov. Can also build maps around noted scholars, critics, or commentators. E.g., Simon Schama, Garry Wills, Elaine Showalter, Camille Paglia, Derek de Solla Price. System will work with authors in other ISI databases in the natural and social sciences. Also with other kinds of co-occurring terms: journal names, descriptors, etc.
15
Advantages of Maps l Ranked list of top 25 co-cited authors often contains names not previously known to user. l Both Kohonen maps and PFNETs show interconnections of the 25 authors not apparent in the one-dimensional ranking of a simple list.
16
Interpretation of Maps l Kohonen maps show high co-citation counts of authors by placing them closer in space. l PFNETs show highest co-citation counts of authors directly, as links between nodes bearing authors’ names. The counts themselves can be made to appear above the links.
17
Kohonen Feature Maps l Are a variety of neural network. l Are produced by an algorithm for unsupervised computer learning in which data points “compete” for the position on the output grid that best represents their numeric weights (co-citation counts) relative to all other points.
18
PFNETs l Are algorithmically connected graphs based on finding “minimum-cost” path between any two nodes. l In ACA, this is generally the highest single co-citation count between author pairs (all pairs are examined). l Results in useful simplification of graph. l Use spring embedder algorithm to produce layout.
19
PFNETs l Make sense as pictures of relations in databases! l Independent observers have found them highly intelligible: Xia Lin on Chinese philosophers Kate McCain on historians of science & technology Howard White on various literary figures and artists l Buzydlowski research will test interpretability of PFNETs and Kohonen maps as interfaces for domain experts and naïve users.
20
Interface Design Considerations l Link interface to valuable digital libraries (ISI citation databases and the journal literatures they lead to). l Focus on intellectual content: meaningful words, meaningfully presented. l Stress quick and flexible presentations over long-term displays.
25
Evidence We’re on Right Track l US Patent 6,038,574: “Method and Apparatus for Clustering Collection of Linked Documents Using Co-Citation Analysis” l Filed: March 18, 1998 l Awarded: March 14, 2000 l Inventors: James E. Pitkow, Peter L. Pirolli, Jock D. Mackinlay, Stuart K. Card, all of Xerox PARC
27
PFNET of authors co-cited with F. Schleiermacher in AHCI, 1988-1997 (Biblical and literary hermeneutics)
28
AuthorLink System Structure …….. Procedures Web Interface Java Applet Web Server Application Server Java Servlets Kohonen Mapping Procedures in C BRS Search Engine/ ISI Data PFNET Mapping Procedures in C cgi
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.