Presentation is loading. Please wait.

Presentation is loading. Please wait.

Term Co-occurrence Analysis as an Interface to Digital Libraries Jan W. Buzydlowski Howard D. White Xia Lin College of Information Science and Technology.

Similar presentations


Presentation on theme: "Term Co-occurrence Analysis as an Interface to Digital Libraries Jan W. Buzydlowski Howard D. White Xia Lin College of Information Science and Technology."— Presentation transcript:

1 Term Co-occurrence Analysis as an Interface to Digital Libraries Jan W. Buzydlowski Howard D. White Xia Lin College of Information Science and Technology Drexel University, Philadelphia, Pennsylvania, USA

2 Digital Library Research First Wave –How to store it Next Wave –How to retrieve it (IR) Text Mining Visual Information Retrieval Interface (VIRI) Term Co-occurrence Analysis (TCA) –Co-occurrence vs. lexical associations –Maps vs. lists

3 Term Definition Unit of Analysis –Words –Documents –Authors –Journals Section of Focus –Abstract/Text –Title –Bibliography –Keywords

4 Example Words in Title –Term –Co-occurrence –Analysis –Interface –Digital –Library Authors in Bibliography –Salton-G –Chen-C –White-HD –Ding-Y –Cleveland-W –McCain-K –Lin-X –Schvaneveldt-R –Kamada-T –Fruchterman-T

5 Term Co-occurrence Methodology User determines which terms are of interest –Via a seed term –From a pre-defined list The system returns the pair-wise co- occurrence counts of the terms over the collection of records

6 Example Unit: Author; Section: Bibliography User Supplied List: Plato, Aristotle, Smith, Brown For a given data set (N = 4 unique terms) –Article 1: Plato, Aristotle, Smith, … –Article 2: Plato, Smith, … –Article 3: Plato, Aristotle, Smith, Brown, … The following co-citations (C(4,2) = 6) are found –COMBINATIONCOUNTARTICLES –Plato and Smith31, 2, 3 –Plato and Aristotle21, 3 –Plato and Brown13 –Aristotle and Smith21, 3 –Aristotle and Brown13 –Smith and Brown13

7 Term Co-occurrence Significance The frequent co-occurrence of term pairs within a set of documents indicates a strong association between those terms, whereas a infrequent count indicates the opposite –The association you would expect is borne out by the frequency –The frequency you compute suggests a level of association Pain and ManagementPain and Obtainment Plato and AristotlePlato and Cher Science and NatureScience and National Tattler A and BC and D

8 Term Co-occurrence Uses Allows a user to get a “foothold” with just one term –One seed term returns many other related terms Allows a user to get a “overview” with user-supplied/system-supplied terms –Co-occurrence counts with visualization

9 Seeding User types in –One term, e.g., Plato –Boolean expression, e.g., Plato AND Brown System supplies top n terms, in ranked order of frequency of co-occurrence with the initial term

10 Example For Plato seed: ARISTOTLE PLUTARCH CICERO HOMER BIBLE EURIPIDES ARISTOPHANES XENOPHON AUGUSTINE HERODOTUS KANT-I AESCHYLUS SOPHOCLES THUCYDIDES OVID HESIOD DIOGENES-LAERTI HEIDEGGER-M DERRIDA-J PINDAR NIETZSCHE-F HEGEL-GWF VERGIL AQUINAS-T

11 Need for Visualization Given a list of user- / system-supplied terms –Find the frequency of co-occurrence of each pair-wise combination of terms Plato AND Aristotle = 1,920 Plato AND Plutarch = 380, … –Too many numbers to take in at once C(25, 2) = (25 * 24)/ 2 = 300 pairs Three major visualization techniques –Multidimensional Scaling (MDS) –Self-Organizing (Kohonen) Maps (SOMs) –PathFinder Networks (PFNETs)

12 RR Sokal PHA Sneath JC Gower JH Ward JD Carroll JB Kruskal VE McGee RN Shepard JA Hartigan HA Skinner SC Johnson M Wish P Arabie RK Blashfield PE Green White’s MDS map of 15 co-cited classificationists, ca. 1990

13

14 White’s PFNet of co-cited authors in Biblical and literary hermeneutics, 1988-1997

15 Our System Three tiered –User interface –Server –Database Real-time and interactive Significant data sources –ISI AHCI –MedLine Live interface for retrieval

16

17 User Interface - Seed

18 User Interface – SOM

19 Interface - PFNET

20 Interface - Visual Information Retrieval Interface (VIRI )

21 User Interface IV

22 Database Interface API –String [ ] findRel( String, int ) –Int [ ] findOcc( String [ ] ) Implemented on: –BRS API via a wrapper –Oracle API via JDBC –Noah Specialized co-occurrence database API via JNI

23 Future Plans User Study –Preference Type of map, etc. –Cognitive map How well does the map match experts’ mental models Larger datasets Additional data sources

24

25


Download ppt "Term Co-occurrence Analysis as an Interface to Digital Libraries Jan W. Buzydlowski Howard D. White Xia Lin College of Information Science and Technology."

Similar presentations


Ads by Google