DISCUSSION Using a Literature-based NMF Model for Discovering Gene Functional Relationships Using a Literature-based NMF Model for Discovering Gene Functional.

Slides:



Advertisements
Similar presentations
Zoology 305 Library Databases/Indexes Lab Goals for session: 1) Meet your librarian Kevin Messner 2) Understand.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Benjamin J. Deaver Advisor – Dr. LiGuo Huang Department of Computer Science and Engineering Southern Methodist University.
Evidence-Based Information Retrieval in Bioinformatics
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Information Retrieval in Practice
Automatic Classification of Accounting Literature Nineteenth Annual Strategic and Emerging Technologies Workshop Vasundhara Chakraborty, Victoria Chiu,
Archives and Information Retrieval
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Kathryn Linehan Advisor: Dr. Dianne O’Leary
Improved Cancer Risk Assessment Using Text Mining Ilona Silins 1, Anna Korhonen 2, Johan Högberg 1, Lin Sun 2 and Ulla Stenius 1 1 Institute of Environmental.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Overview of Search Engines
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Medline Text Searching Tools – a Comparison Experiment McDermott Center for Human Growth and Development Center for Biomedical Inventions.
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
Non Negative Matrix Factorization
University of Illinois at Urbana-Champaign INSTITUTE FOR GENOMIC BIOLOGY BeeSpace: An Interactive Environment for Functional Analysis of Social Behavior.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
BioQUEST / SCALE-IT Module From Omics Data to Knowledge Case 1: Microarrays Namyong Lee Minnesota State University, Mankato Matthew Macauley Clemson University.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Discovering Gene-Disease Association using On-line Scientific Text Abstracts. Raj Adhikari Advisor: Javed Mostafa.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.
BioSumm A novel summarizer oriented to biological information Elena Baralis, Alessandro Fiori, Lorenzo Montrucchio Politecnico di Torino Introduction text.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
Bioinformatics and Computational Biology
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Handling Floods of Literature: BioMedical Document Management Andrew Dolbey Center for Computational Pharmacology University of Colorado, School of Medicine.
A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
UM/UT Microarray Short Course May 4, 2006
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Sul-Ah Ahn and Youngim Jung * Korea Institute of Science and Technology Information Daejeon, Republic of Korea { snowy; * Corresponding Author: acorn
Information Retrieval in Practice
Best pTree organization? level-1 gives te, tf (term level)
SNS COLLEGE OF TECHNOLOGY
Outline Introduction NMF Chemistry Problem
gene-to-gene relationships & networks
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
School of Computer Science & Engineering
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
Presentation transcript:

DISCUSSION Using a Literature-based NMF Model for Discovering Gene Functional Relationships Using a Literature-based NMF Model for Discovering Gene Functional Relationships Elina Tjioe, * Michael Berry, § Ramin Homayouni, ‡ and Kevin Heinrich Θ * Genome Science and Technology Graduate School, § Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville; ‡ Bioinformatics Program, Department of Biology, University of Memphis, TN; and Θ Computable Genomix, LLC ABSTRACT The rapid growth of the biomedical literature and genomic information present a major challenge for determining the functional relationships among genes. Several bioinformatics tools have been developed to extract and identify gene relationships from various biological databases. In this study, we develop a Web-based bioinformatics tool called Feature Annotation Using Nonnegative matrix factorization (FAUN) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of nonnegative matrix factorization (NMF) for processing gene sets is discussed. FAUN is first tested on a small manually constructed 50 gene collection that we, as well as others, have previously used. We then apply FAUN to analyze several microarray- derived gene sets obtained from studies of the developing cerebellum in normal and mutant mice. FAUN provides utilities for collaborative knowledge discovery and identification of new gene relationships from text streams and repositories (e.g., MEDLINE). It is particularly useful for the validation and analysis of gene associations suggested by microarray experimentation. FUTURE WORK GENE DOCUMENT TEST SET Gene List PMID Citations in Entrez Gene PubMed Titles & Abstracts Gene A Documen t Gene Document Collection Gene B Document Gene B Document Gene C Documen t Nonnegative Matrix Factorization (NMF) Feature x Gene (H) Matrix Term x Feature (W) Matrix Dominant Terms Dominant Terms Feature Annotation Dominant Genes Dominant Genes Gene-to-Gene Correlation Gene-to-Gene Correlation Term x Gene Doc Matrix General Text Parser (GTP) New Gene Document New Gene Document Feature Classifier Dominant Features Dominant Features FAUN site: Development Alzheimer Cancer 5 F: 7, 13, 16, F: 1, 3, 4, 5, 8, 9, 12, 14, 15 17, 18, F: 2, 6, 10, 11 REFERENCES 1.Heinrich, K.E., Berry, M.W., Homayouni, R. (2008) Gene Tree Labeling Using Nonnegative Matrix Factorization on Biomedical Literature. Journal of Computational Intelligence and Neuroscience, to appear. 2.Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J. (2007) Algorithms and Applications for Approximate Nonnegative Matrix Factorization. Computational Statistics & Data Analysis 52(1): Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J. (2006) Document Clustering Using Nonnegative Matrix Factorization. Information Processing & Management 42(2), Homayouni, R., Heinrich, K., Wei, L., Berry, M.W. (2005) Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts. Bioinformatics 21(1),  Capture annotation of feature across different resolutions (ranks of nonnegative matrix factorization).  Improve FAUN model by optimizing the NMF rank; consider use of smoothness constraints.  Build a FAUN-based classifier with annotated features for the real-time classification of new gene document streams. Fig. 3. Classification of genes and features. Classification of genes in the 50-test-gene document collection based on manual examination of the biomedical literature. Classification of annotated features based on manual examination of the dominant terms in each feature. For a preliminary assessment of FAUN feature classification, each gene in the 50-test-gene collection was classified based on its most dominant annotated feature or based on some feature weight threshold. The FAUN classification using the strongest feature (per gene) yielded 90% accuracy. A FAUN-based analysis of a new cerebellum gene set has revealed new knowledge – the gene set contains a large component of transcription factors. Fig. 1. FAUN screenshot showing use of dominant terms across genes highly associated with the user-selected feature. Fig. 2. FAUN screenshot of gene-to-gene correlation and feature strength. ActualSFT3T5T6T2T1 Alzheimer Cancer Development Alz & Dev Can & Dev Total genes Table. 1. The number of genes that are classified correctly using the strongest feature (SF), feature weight threshold 3, 5, 6, 2, and 1 for T3, T5, T6, T2, and T1 respectively. The actual number of genes that are classified based on manual examination of the biomedical literature is shown in the first column.