Please have a seat. Our program will commence shortly.
Biomarker Automated Retrieval Tool Ronny Chan, Kim Ngo Earth Science Data Systems Dept.
Bioinformatics Relationship Science produces massive amounts of data Data needs to be analyzed, stored, & retrieved This is data-mining We want to apply computer science to improve this process
Motivation Problems with conventional data mining Time consuming Accuracy not defined (subjective) No objective scientific info retrieval tool Where are the Biomarkers?
Cancer Biomarkers An indicator of cancerous growth.
Proposed Solution Create a program that allows people to quickly scan literature for the most relevant keywords/biomarkers B.A.R.T. HER-2 HPEBP4 EP-CAM ERBB2 BAG-1
Significance What is the need of the project? More efficient research Save time conventional enhanced B.A.R.T.
Goals Make biomarker/keyword searches more efficient Learn Java Learn SQL
Approach Write a program Read in articles Use part of Vector Space Model algorithm to rank terms Output relevant terms in statistical rankings they BRCA1 VS.
Vector Space Model Information Retrieval System Introduced by Gerald Salton in the 60’s. Used widely in different search engines
Algorithm for B.A.R.T. Keywords Input PubMed Query Agent Data Store Data Retrieval and Output Content Analyzer Keyword Parser Content Ranker
DCIS CU-TP3982 ERBB2 HER-2 HPEBP4 BAG-1 EP-CAM 99M Results
Lessons & Difficulties Deciding on algorithm choice Ease of implementation and effectiveness Limited knowledge & experience Java, SQL Initial implementation is slow 5 ARTICLES=160 sec UPDATE: AUGUST 18, 2004 100 ARTICLES=8^19 years 20 ARTICLES=1904 sec 100 ARTICLES=8^38 years
Future work Apply different term weight functions to make results more robust Optimize the program for speed
Citations 1. SpaceImplementation-6per.PDF /cs419/ rank.pdf /Lectures/04-BooleanVectorSpaceB.pdf 5. Biomarkers Definitions Working Group. Biomarkers and surrogate endoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), (2001).
Acknowledgements Earth Science Data System, JPL Tina Xiao Paul Ramirez Chris Mattmann Roshanak Roshandel Sean Hardman ALL SoCalBSI Colleagues National Institute of Health (NIH) National Science Foundation (NSF) Southern California Bioinformatics Summer Institute (So Cal BSI) SoCalBSI Professors Jacqueline Heras
Q :malignant breast cancer D 1:detection of malignant level in the cell D 2:sighting of breast stage in the breast cancer D 3:detection of malignant stage in the cancer docthestagelevelsightingcellmalignantinofbreastdetectioncancer D11(0)01(.477)0 1(.176)1(0) 01(.176)0 D21(0)1(.176)01(.477)001(0) 2(.477)01(.176) D31(0)1(.176)000 1(0) 01(.176) Q VSM Example IDTERMDFIDF 1the30 2stage level sighting cell malignant in30 8of30 9breast detection Cancer2.176
Example Continued… Keyword tf * idf