BioKnOT Biological Knowledge through Ontology and TFIDF By: James Costello Advisor: Mehmet Dalkilic.

Slides:



Advertisements
Similar presentations
EBSCO Discovery Service
Advertisements

PubMed/Filters (Limits) and Advanced Search (module 4.2)
EndNote Web Reference Management Software (module 5.1)
In the Format section, we have activated the Bibliographic style drop down menu. From this page, you can choose a specific journal or format (e.g. BMC.
PubMed/How to Search, Display, Download & (module 4.1)
Comparison of BIDS ISI (Enhanced) with Web of Science Lisa Haddow.
Welcome to informaworld TM. The following demo will show you just a few of the features on informaworld TM. Please select where you would like start. ePublication.
Searching for Information: advanced & using Endnote Web to manage references Sport & Exercise Science Year 2: Autumn 2012 Peter Bradley: Subject Librarian.
Transferable Skills beyond the academic training 22nd January, 14-18h, Building 3, Floor 1, Computer Room 9 (16.P1.E3) 29nd January, 14-18h, Building.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
New Features Update ISI Web of Knowledge. Copyright 2006 Thomson Corporation 2 New features added Mozilla Firefox web browser is now supported New access.
Information Retrieval in Practice
1 Database Description and Details. Biological & Agricultural Index offers individuals convenient online access to the literature of biology and agriculture.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Vocabulary Spectral Analysis as an Exploratory Tool for Scientific Web Intelligence Mike Thelwall Professor of Information Science University of Wolverhampton.
Managing Citations on Journal of Dairy Science ® July
Overview of Search Engines
Jean Phillips Schwerdtfeger Library Space Science and Engineering Center University of Wisconsin-Madison November 2005.
PubMed/How to Search, Display, Download & (module 4.1)
New Web of Science Rachel Mangan Customer Education
Welcome to the Web of Science tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to.
Web of Science. Copyright 2006 Thomson Corporation 2 Example: (bird* or avian) and (flu or influenz*) Enter your terms to be searched. Search fields are.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Getting Started with. EndNote Web: It allows you to: Access your references from any computer with internet Collect references from online sources Drop.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
ISI Web of Knowledge Service for UK Education
Web of Knowledge Service for UK Education April 2007 An Overview Web of Knowledge Support Officer
Welcome to the Science Direct tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
Discovery Phase: where do we go from here? Co-directors contact information: Dr. Maureen Powers, Department of Cell Biology,
Article Database Tutorial (and quick guide to library resources)
Discovering Gene-Disease Association using On-line Scientific Text Abstracts. Raj Adhikari Advisor: Javed Mostafa.
From the initial (HINARI) PubMed page, we will run the HIV and pregnancy search and then apply various Filters. Note the to Advanced search and Help options.
Chapter 6: Information Retrieval and Web Search
The ISI Web of Knowledge nce/training/wok/#tab3.
WISER: Citation searching Web of Knowledge is a powerful way to access the ISI's multidisciplinary citation indexes. It allows you to discover what research.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
ISI Web of Knowledge Service for UK Education An Overview Suzanne Williams ISI Web of Knowledge Support Officer
Welcome to the Business Source Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
1 OSTI - Accelerating Science Information Dr. Walter L. Warnick Director U.S. Department of Energy Office of Scientific and Technical Information Federal.
UoS Libraries 2011 EndNote X5 - basic graduate session.
October RefWorks Basics Creating accounts and folders Adding references (manually & electronically) Sorting, editing and linking Creating a bibliography.
PubMed/Limits and Advanced Search (module 4.2). MODULE 4.2 PubMed/Limits & Advanced Search Instructions - This part of the:  course is a PowerPoint demonstration.
PubMed/How to Search, Display, Download & (module 4.1)
EBSCOhost Advanced Search Guided Style Find Fields Tutorial support.ebsco.com.
1 CS 430: Information Discovery Lecture 5 Ranking.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
A Bibliographic Management Software NORSHUHADA SAIDIN REFERENCE & RESEARCH DIVISION PERPUSTAKAAN KEJURUTERAAN UNIVERSITI SAINS MALAYSIA.
© 2005 Bioinformatics Indiana University April, ::: Troy Campbell Advisors: Mehmet Dalkilic, Informatics Claudia Johnson, Paleontology Erika Elswick,
Information Retrieval in Practice
Why indexing? For efficient searching of a document
Using the Advanced Search Guided Style Find Fields on
Documentation Chicago APA MLA.
Using the Advanced Search Guided Style Find Fields on
EBSCO Discovery Service
Introduction to EBSCOhost
Exporting references - Web of Science
ISI Web of Knowledge update: April 2009
ProQuest Databases.
Presentation transcript:

BioKnOT Biological Knowledge through Ontology and TFIDF By: James Costello Advisor: Mehmet Dalkilic

2June 11, 2004 Bioinformatics Capstone Project Costello Outline Motivation and Goals Background Program Architecture Populating the Article Database Developing an Article Scoring Model BioKnOT demonstration Summary and Future Work

3June 11, 2004 Bioinformatics Capstone Project Costello Motivation and Goals Motivation Current online text searching methods are not good enough for highly specific research. Current online text searching methods are not good enough for highly specific research.ImportanceTimelinessRelevance Goal of Project Create an online text retrieval system that will allow users to construct their own set of highly specific, timely, and important research articles that are custom fit to a user’s needs. Create an online text retrieval system that will allow users to construct their own set of highly specific, timely, and important research articles that are custom fit to a user’s needs.

4June 11, 2004 Bioinformatics Capstone Project Costello D = set of documents D’ = set of documents that meet some search criteria D’ D D’ = {d 1, d 2, …d k } Where d i is an individual document and we hope d i is more interesting than d i+1 Where d i is an individual document and we hope d i is more interesting than d i+1 |D’| = huge number of documents |D’| for a filtered search on PubMed for “apoptosis” is 65,832 articles ∩ Standard Search Model

5June 11, 2004 Bioinformatics Capstone Project Costello BioKnOT Search Model D = set of documents D’ = set of documents that meet the initial search criteria D’ D D’ t = set of documents that pass the filter D’ t D’ D’ tu = set of documents that have been ranked by based on semantic content from user input D’ tu D’ t D’ tu = {d 1, d 2, …d k } |D’ tu | = very small and very specific Where d i is an individual document and d i is more interesting than d i+1 Where d i is an individual document and d i is more interesting than d i+1 ∩ ∩ ∩

6June 11, 2004 Bioinformatics Capstone Project Costello Program Architecture Initial Search Page Boolean Search Filter Page Filter Your Search apoptosis term User Input Page Submit Description User’s sentences Results Page Refine Your Search 1.Article Title … View Word Graph See All Data 2. … Actual Online Article All Stored Data On the Article (title, author(s),…) Illustration of Word Relationships in Article Word Weighting Page Add Word Weights Bad Good term Hyperlink

7June 11, 2004 Bioinformatics Capstone Project Costello Populating the Article Database Data we need Author(s) Author(s) Article Title Article Title Abstract Abstract Journal title Journal title Date and year of publication Date and year of publication Count of how many times the article was cited Count of how many times the article was cited URL of online full text article or PubMed Search results URL of online full text article or PubMed Search results Some Type of Accession Number Some Type of Accession Number

8June 11, 2004 Bioinformatics Capstone Project Costello Resources Used in Populating the Database Institute of Scientific Information (ISI) Web of Science EndNote 7 PubMed

9June 11, 2004 Bioinformatics Capstone Project Costello Steps Taken to Populate the Article Database ISI’s Web of Science Search Interface Endnote 7 Export article information PubMed Search Interface PubMed Article Abstract Interface Article Database > 2,000 Export XML and Parse Web Bot to search for URL information using article title and author(s) Either PubMed URL or Online Article URL inserted After PubMed Abstract found, Web Bot searches for online article URL

10June 11, 2004 Bioinformatics Capstone Project Costello Initial Search Boolean search Searches all article’s in the database with a URL Searches an article’s title and abstract Searches an article’s title and abstract

11June 11, 2004 Bioinformatics Capstone Project Costello Filter Page TFIDF LUCAS Web Service Web Service TFIDF Calculations TF = number of occurrences of a term in a document TF = number of occurrences of a term in a document IDF = log of the total number of documents over the number of documents that contain the desired term IDF = log of the total number of documents over the number of documents that contain the desired term tf i,d = |d i | |Σ i k d i | idf i,D = log 2 |D| |{d i | d i D}| tfidf i,d = (1 + tf i,d )idf i,D if tf i,d ≥ 1

12June 11, 2004 Bioinformatics Capstone Project Costello Term Relationship Measurements Intra-sentence distance Sentence structure taken into account Sentence structure taken into account Inter-sentence distance Sentence structure ignored Sentence structure ignored “... and is not present in the mitochondria. Permeability is another...” “... mitochondrial permeability is an important aspect of apoptosis...” Ex.

13June 11, 2004 Bioinformatics Capstone Project Costello Inter-sentence vs. Intra-sentence distance Searching for the relationship cell death …cell… Doc A …death… Doc D …cell death… Doc B …cell. Death… Doc C …cell death… Doc E Documents used to Construct the Random Model Initial Search Set of Documents Document that are scored and returned to the user

14June 11, 2004 Bioinformatics Capstone Project Costello Visual Representation of Term Relationships Example of a Term Relationship Graph that was specified by the user Example of a Term Relationship Graph that was taken from an Article’s Abstract Graph M Graph N

15June 11, 2004 Bioinformatics Capstone Project Costello Scoring an Article M = User Defined Term Relationships N = Abstract of Individual Article Term Relationships S = Scoring Matrix P = Presence or Absence of a Term Relationship from M in N f = Sigmoidal Term Relationship Function Abstract Score = ∑ P M,N (i,j) × S i,j × f M i,j (N i,j ) Abstract Score = ∑ P M,N (i,j) × S i,j × f M i,j (N i,j ) P M,N (i,j) = × N i,j ≠ 0 1 M i,j × N i,j ≠ 0 -1 Otherwise

16June 11, 2004 Bioinformatics Capstone Project Costello Sigmoidal Scoring Function β - α x - α if α < x ≤ β if x ≤ α 1 - x - α if β < x ≤ γ 1 0 if x > γ 1 0 ½ γβα f M i,j (N i,j ) = ½ ½ β - α Term Distance % Term Membership

17June 11, 2004 Bioinformatics Capstone Project Costello Scoring Matrix (Random Model) Derived from the TFIDF Terms that were defined by the user and abstracts of all the articles returned by the initial term search. User defined term relationships are found in all the abstracts and the log-odds score is taken (tj | ti, is found by first finding a word, t i, that the user has defined and then opening up a 5 word reading frame,, following t i. The presence of a second user defined word, t j, must be within (tj | ti, Δ) is found by first finding a word, t i, that the user has defined and then opening up a 5 word reading frame, Δ, following t i. The presence of a second user defined word, t j, must be within Δ LOD Score(t i,t j ) = log 2 P(t j | t i, Δ) P(t i ) × P(t j )

18June 11, 2004 Bioinformatics Capstone Project Costello Determine important terms cell, death, human cell, death, human Look for relationships of those words in the search space. Relationships Relationships cell→death, cell→human, death→cell, death→human, human→cell, human→death Search Space (abstract) Search Space (abstract) ← The effects … cell in a human … in cancer. → Once an important term is found, a 5 word reading frame is opened. If a relationship is found within the reading frame, then the distance between the words is taken. cell→human = 3 cell→human = 3 If multiple occurrences of the same relationship are found in the search space, the average is taken. 20 words Steps to derive the Scoring Matrix

19June 11, 2004 Bioinformatics Capstone Project Costello Steps to derive the Scoring Matrix Lastly, these relationships, along with the individual word probabilities, can be taken, scored and structured into a matrix. P(cell→human) = =.167 P(cell→human) = =.167 P(cell) =.03 P(cell) =.03 P(human) =.06 P(human) =.06 LOD(cell→human) = 1.97 LOD(cell→human) = 1.97 Continue for all relationships Continue for all relationships 2 apoptosishumancell apoptosis Human Cell

20June 11, 2004 Bioinformatics Capstone Project Costello Adding User Weights to Term Matrix User is asked to enter weights for each word relationship that is found within the user’s expansion statement. Weights range from [0,2] Score is noted r i,j for term i to term j Weights multiplied by matrix values to add user’s input into the random model.

21June 11, 2004 Bioinformatics Capstone Project Costello S i,j celldeathprotein cell death protein celldeathproteincell death protein Scoring Matrix Before User’s Word Weights Scoring Matrix After User’s Word Weights cell death … 2.0 death cell …… 1.0 protein cell …… 0.5 protein death … 1.5 User’s Word Weight submissions Final Score S i,j 0 if S i,j = 0 × r i,j × S i,j if S i,j > 0 if Si,j < 0S i,j× 1 r i,j =

22June 11, 2004 Bioinformatics Capstone Project Costello Visual Representation of Term Relationships Example of a Term Relationship Graph that was specified by the user Example of a Term Relationship Graph that was taken from an Article’s Abstract Graph M Graph N

23June 11, 2004 Bioinformatics Capstone Project Costello Comparing Term Relationship Graphs In order to compare the word graphs, an adjacency matrix must be created. This is where the values of M i,j and N i,j are taken. In order to compare the word graphs, an adjacency matrix must be created. This is where the values of M i,j and N i,j are taken. apoptosistumor apoptosis05.00 tumor00fasinducefas03.00 induce00 Matrix MMatrix N

24June 11, 2004 Bioinformatics Capstone Project Costello Results and Refinement Support Score in the form of Citation Frequency, which is simply the citation count supplied by ISI’s Web of Science divided by the difference in year from now to the publication date. Semantic Score from the equation ∑ PM,N(i,j) × Si,j × fMi,j(Ni,j)

25June 11, 2004 Bioinformatics Capstone Project Costello Software Demonstration BioKnOT PubMed

26June 11, 2004 Bioinformatics Capstone Project Costello Summary Offer a new and effective way to search research articles. BioKnOT offers many features that aid the user in deciding what factors are important in retrieving articles. Currently under submission to SIGIR Bioinformatics workshop.

27June 11, 2004 Bioinformatics Capstone Project Costello Future Work Adding more sophisticated support through citation frequency. Increase efficiency of scoring method Usability analysis Incorporate BioKnOT into CATPA Developing a Bioinformatics Knowledge Base locally using BioKnOT.

28June 11, 2004 Bioinformatics Capstone Project Costello Acknowledgments Professor MehmetDalkilic Professor Javed Mostafa Professor Sun Kim