Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY.

Slides:



Advertisements
Similar presentations
From Words to Meaning to Insight Julia Cretchley & Mike Neal.
Advertisements

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Microsoft Excel 2003 Illustrated Complete Excel Files and Incorporating Web Information Sharing.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation. All.
Chapter 12: ADO.NET and ASP.NET Programming with Microsoft Visual Basic.NET, Second Edition.
A Guide to Oracle9i1 Introduction To Forms Builder Chapter 5.
Input Validation For Free Text Fields ADD Project Members: Hagar Offer & Ran Mor Academic Advisor: Dr Gera Weiss Technical Advisors: Raffi Lipkin & Nadav.
Tuesday Session 2 – Intro to ArcMap Starting Arc Map – Empty Map – Map Template – Project Data View – Display – Source – Selection Layout View – Draft.
Scaffold Download free viewer:
Tutorial 11: Connecting to External Data
From Words to Meaning to Insight Julia Cretchley & Mike Neal.
1 Agenda Overview Review Roles Lists Libraries Columns.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Jean Phillips Schwerdtfeger Library Space Science and Engineering Center University of Wisconsin-Madison November 2005.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
PubMed/How to Search, Display, Download & (module 4.1)
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
SciFinder Web Version Pootorn R. Book Promotion & Service Co.,Ltd. Thailand.
Classroom User Training June 29, 2005 Presented by:
Tutorial 10 Adding Spry Elements and Database Functionality Dreamweaver CS3 Tutorial 101.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
BioKnOT Biological Knowledge through Ontology and TFIDF By: James Costello Advisor: Mehmet Dalkilic.
Microsoft FrontPage 2003 Illustrated Complete Using Office Components.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
AUTOMATION OF WEB-FORM CREATION - KINNERA ANGADI – MS FINAL DEFENSE GUIDANCE BY – DR. DANIEL ANDRESEN.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
Discovering Gene-Disease Association using On-line Scientific Text Abstracts. Raj Adhikari Advisor: Javed Mostafa.
Key Applications Module Lesson 21 — Access Essentials
The Internet 8th Edition Tutorial 4 Searching the Web.
Chapter 6: Information Retrieval and Web Search
Dataware’s Document Clustering and Query-By-Example Toolkits John Munson Dataware Technologies 1999 BRS User Group Conference.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Microsoft Visual Basic 2005 BASICS Lesson 1 A First Look at Microsoft Visual Basic.
IAT Text ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT]
Building a Topic Map Repository Xia Lin Drexel University Philadelphia, PA Jian Qin Syracuse University Syracuse, NY * Presented at Knowledge Technologies.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Copyright OpenHelix. No use or reproduction without express written consent1.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
October RefWorks Basics Creating accounts and folders Adding references (manually & electronically) Sorting, editing and linking Creating a bibliography.
Work with Tables and Database Records Lesson 3. NAVIGATING AMONG RECORDS Access users who prefer using the keyboard to navigate records can press keys.
PubMed/How to Search, Display, Download & (module 4.1)
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
C Copyright © 2009, Oracle. All rights reserved. Using SQL Developer.
Using Google Scholar Ronald Wirtz, Ph.D.Calvin T. Ryan LibraryDec Finding Scholarly Information With A Popular Search Engine Tool.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Forms and Reports 09.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
PolyAnalyst Data and Text Mining tool
Search Techniques and Advanced tools for Researchers
Learning about Taxes with Intuit ProFile
InnovationQ Plus Quick Start Guide
Citation Map Visualizing citation data in the Web of Science
Sharing of Eurostat predefined tables
Learning about Taxes with Intuit ProFile
Sharing of Eurostat predefined tables
Extracting Recipes from Chemical Academic Papers
CHAPTER 7: Information Visualization
Tutorial 7 – Integrating Access With the Web and With Other Programs
Topic 11 Lesson 1 - Analyzing Data in Access
PubMed Database Interface (Basic Course: Module 4)
Presentation transcript:

Visualization of Relational Text Information for Biomedical Knowledge Discovery James W. Cooper IBM T J Watson Research Center Hawthorne, NY

Overview Prior work Java based text mining Computation of unnamed relations Graphical display of relations Text

Relations between terms Noun phrase co-occurrence statistics [Roark, Charniak] Choose seed words and look for terms near them. [Brin] [Gravano, Agichtein] –Repeat Biomedical domain –Blaschke used dictionary of common verbs –Pustejovsky found inhibit relations Stevens, Palakal, Mostafa –Detected abstract-wide co-occurrence using dictionary of genes and useful verbs.

Graphical Displays Biolayout – protein similarity ProtInAct – interactive system using yFiles Zhang – interactive 3D system Jenssen – gene network Leroy – GeneScene

BioLayout –Enright and Ouzounis Spheres represent proteins and lines represent protein similarities. Five related protein families and their corresponding relationships.

ProInAct- Spencer and Bennett Proteins clustered by functional interaction

Zhang-Protein interaction mapping

Jenssen – A literature network Lines connect genes that have co-occurred in 1 or more papers.

Leroy –GeneScene

What would we like to do? Find scientifically meaningful connections between important terms. –Such as Swanson’s Reynaud’s disease – fish oil connection. Allow exploration of relations by user. Filter the relations by ontology or term types Perform path analysis Let the user vary the graphical display.

Data we analyzed Two sets of patent data –584 patents on Viagra and phosphodiesterase inhibitors. –1514 patents on quinolones (like Cipro) Recognized major technical terms in each patent. Filtered organic chemical nomenclature.

The Talent text mining system Text Analysis and Language Engineering Tools –Finds multiword noun phrases –Does shallow parse –Can extract NPs and VGs As well as all other sentence parts

The JTalent Library Java class library with JNI interface –To Talent DLL Creates database load files of terms –Paragraph –Sentence –Offset –Term type (NP, VG)

TalentShow Demo

The KSS Library Java class library of functions for –Accessing a database (DB2, Access) –Manipulating a search engine –Manipulating tables of information created by JTalent.

Database Tables Documents –Title, author, URL, ID TermDocs –Term –Paragraph –Sentence –Offset –Type Dictionary of terms, types and IDs –Such as MeSH

Computing term information Compute unique terms from Termdocs Compute frequency Compute salience –Based on frequency –Number of docs they appear in more than once

Compute term relations Named relations based on abbreviation expansions. Unnamed relations based on proximity, with weight based on how frequently they occur near each other. Mutual information weight:

Tuning Computed relations Select only terms above a salience threshold. Only relations in which one or both are members of an ontology. Store relations in a database table for rapid access: Term | weight | term

Original System Visual client SOAP server –Queries database to get relations –Round trip for each new query Instead, we export the data for the user to visualize as they wish.

Exporting relations Save relations and ontology information in xml file. – 78 MeSH – 34</doc – This XML file is a portable version of the computed relations that we can then use with any number of viewers.

A Graphical Relations Viewer Creates a Java Relations object for each relation it reads from the XML file. Inserts them into a Trie structure based on lower cased first term. –If there is already a Relation at that point, it adds them to a Vector for that term. Creates an alphabetical list of all terms in a 2 nd Trie.

Using the Viewer When you enter part of a term, it shows all terms starting with that fragment in the left list box. When you click on a term, it shows all its relations in the right list box.

Lexical Navigation Displays relations between terms graphically and allows you to explore them without formulating a specific query.

Possible enhancements Show only terms belonging to an ontology. Show only higher IQ terms Show the documents the relations occur in. Show the ontology reference. Show computed paths Show more kinds of named relations. –Inhibits, expresses

Evaluations of Information Visualization Few, if any, graphical displays have been evaluated thus far for effectiveness. Usability studies are hard to construct and carry out. Intuition seems to show –that exploration may result in discoveries. –Relations more than one step apart seem best displayed graphically. Remains to be shown that such visualizations are actually useful.

Differences in Intent Displays may represent information your system has discovered. –Gene – protein relations Or they may represent data from which the user may discover new information. –New 2 nd or 3 rd order relationships These are rather different applications of visualization technology

Summary Java-based text mining system Database of terms and positions Computation of relations Export as XML Graphical relations viewer The value of such visual interfaces has not yet been established.

Acknowledgements Bhavani Iyer – XML export Eric Brown – DictMatcher hash code Daniel Tunkelang – graphical layout Bob Mack – paper suggestions