Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY.

Similar presentations


Presentation on theme: "Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY."— Presentation transcript:

1 Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY

2 Overview Prior work Java based text mining Computation of unnamed relations Graphical display of relations Text

3 Relations between terms Noun phrase co-occurrence statistics [Roark, Charniak] Choose seed words and look for terms near them. [Brin] [Gravano, Agichtein] –Repeat Biomedical domain –Blaschke used dictionary of common verbs –Pustejovsky found inhibit relations Stevens, Palakal, Mostafa –Detected abstract-wide co-occurrence using dictionary of genes and useful verbs.

4 Graphical Displays Biolayout – protein similarity ProtInAct – interactive system using yFiles Zhang – interactive 3D system Jenssen – gene network Leroy – GeneScene

5 BioLayout –Enright and Ouzounis Spheres represent proteins and lines represent protein similarities. Five related protein families and their corresponding relationships.

6 ProInAct- Spencer and Bennett Proteins clustered by functional interaction

7 Zhang-Protein interaction mapping

8 Jenssen – A literature network Lines connect genes that have co-occurred in 1 or more papers.

9 Leroy –GeneScene

10 What would we like to do? Find scientifically meaningful connections between important terms. –Such as Swanson’s Reynaud’s disease – fish oil connection. Allow exploration of relations by user. Filter the relations by ontology or term types Perform path analysis Let the user vary the graphical display.

11 Data we analyzed Two sets of patent data –584 patents on Viagra and phosphodiesterase inhibitors. –1514 patents on quinolones (like Cipro) Recognized major technical terms in each patent. Filtered organic chemical nomenclature.

12 The Talent text mining system Text Analysis and Language Engineering Tools –Finds multiword noun phrases –Does shallow parse –Can extract NPs and VGs As well as all other sentence parts

13 The JTalent Library Java class library with JNI interface –To Talent DLL Creates database load files of terms –Paragraph –Sentence –Offset –Term type (NP, VG)

14 TalentShow Demo

15 The KSS Library Java class library of functions for –Accessing a database (DB2, Access) –Manipulating a search engine –Manipulating tables of information created by JTalent.

16 Database Tables Documents –Title, author, URL, ID TermDocs –Term –Paragraph –Sentence –Offset –Type Dictionary of terms, types and IDs –Such as MeSH

17 Computing term information Compute unique terms from Termdocs Compute frequency Compute salience –Based on frequency –Number of docs they appear in more than once

18 Compute term relations Named relations based on abbreviation expansions. Unnamed relations based on proximity, with weight based on how frequently they occur near each other. Mutual information weight:

19 Tuning Computed relations Select only terms above a salience threshold. Only relations in which one or both are members of an ontology. Store relations in a database table for rapid access: Term | weight | term

20 Original System Visual client SOAP server –Queries database to get relations –Round trip for each new query Instead, we export the data for the user to visualize as they wish.

21 Exporting relations Save relations and ontology information in xml file. – 78 MeSH – 34</doc – This XML file is a portable version of the computed relations that we can then use with any number of viewers.

22 A Graphical Relations Viewer Creates a Java Relations object for each relation it reads from the XML file. Inserts them into a Trie structure based on lower cased first term. –If there is already a Relation at that point, it adds them to a Vector for that term. Creates an alphabetical list of all terms in a 2 nd Trie.

23 Using the Viewer When you enter part of a term, it shows all terms starting with that fragment in the left list box. When you click on a term, it shows all its relations in the right list box.

24 Lexical Navigation Displays relations between terms graphically and allows you to explore them without formulating a specific query.

25 Possible enhancements Show only terms belonging to an ontology. Show only higher IQ terms Show the documents the relations occur in. Show the ontology reference. Show computed paths Show more kinds of named relations. –Inhibits, expresses

26 Evaluations of Information Visualization Few, if any, graphical displays have been evaluated thus far for effectiveness. Usability studies are hard to construct and carry out. Intuition seems to show –that exploration may result in discoveries. –Relations more than one step apart seem best displayed graphically. Remains to be shown that such visualizations are actually useful.

27 Differences in Intent Displays may represent information your system has discovered. –Gene – protein relations Or they may represent data from which the user may discover new information. –New 2 nd or 3 rd order relationships These are rather different applications of visualization technology

28 Summary Java-based text mining system Database of terms and positions Computation of relations Export as XML Graphical relations viewer The value of such visual interfaces has not yet been established.

29 Acknowledgements Bhavani Iyer – XML export Eric Brown – DictMatcher hash code Daniel Tunkelang – graphical layout Bob Mack – paper suggestions


Download ppt "Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY."

Similar presentations


Ads by Google