A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Slides:



Advertisements
Similar presentations
Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
Advertisements

13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.
Gene Ontology John Pinney
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Gene expression analysis summary Where are we now?
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Literature Mining Tools for Analysis of Genomic Data Ramin Homayouni, Ph.D. Associate Professor of Biology Director of Bioinformatics UTHSC BINF April.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Topics in Computational Biology (COSI 230a) Pengyu Hong 09/02/2005.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Ch10. Intermolecular Interactions and Biological Pathways
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
Bioinformatics and medicine: Are we meeting the challenge?
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
Finish up array applications Move on to proteomics Protein microarrays.
Bioinformatics Brad Windle Ph# Web Site:
Discovering Gene-Disease Association using On-line Scientific Text Abstracts. Raj Adhikari Advisor: Javed Mostafa.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.
Modeling of complex systems: what is relevant? Arno Knobbe, Marvin Meeng, Joost Kok Leiden Institute of Advanced Computer Science (LIACS)
MINING FOR MEANING: Data mining & Knowledge extraction Laboratory of Parasitic Diseases, NIAID.
Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
EB3233 Bioinformatics Introduction to Bioinformatics.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
A curated database of biological pathways.
Bioinformatics and Computational Biology
Identification of Different Phenotypes of Breast Cancer Based on Two-Step Selective Clustering Analysis of Gene Expression Profiling of Several Signal.
UM/UT Microarray Short Course May 4, 2006
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
UBC Bioinformatics Centre Copyright 2004 UBC Bioinformatics Centre Common evidence network: Investigating Medline co-citations of candidate disease genes.
DISCUSSION Using a Literature-based NMF Model for Discovering Gene Functional Relationships Using a Literature-based NMF Model for Discovering Gene Functional.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Selection of Resources for the Development of an Information Service Program in Molecular Biology and Genetics Ansuman Chattopadhyay, PhD Information Specialist.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Using Spotfire for Proteomic Analysis
Batyr Charyyev.
Presentation transcript:

A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng, Yang The institute of biochemistry, NYMU Bioinformatics program Bioinformatics program and core labcore lab Tor-Kristian Jenssen, Astrid Laegreid, Jan Komorowski & Eivind Hovig Nature Genetics. Volume 28. may2001

Goals for system biology ? Cell., 100(1):57–70 Review, PNAS, Vol. 95,

How to Find Biologically Significant Events Using Microarray Tech? Fitting to current knowledge Sifting out variations

Mapping Gene Expression Data to KEGG Pathways

Linking Molecular Information to Phenotypes Can Provide Insights to Biological Processes Pathways: metabolic, signal transduction, etc. Phenotype: angiogenesis, metastasis

Information Hidden in Literature  Molecular functions  Protein-protein interactions  Protein-DNA (RNA) interactions  Phenotypic information  Physiological and pathological processes (ex. Angiogenesis, tumor metastasis)  Drug and chemical response

No Efficient Way to Find Genes Related to Angiogenesis

Strategies of Literature Mining  Keyword indexing (a gene)  protein annotation  Semantics ( 語意學 ) (genes)  Protein binding and interaction  Keyword co-occurrence (terms and genes)  Biomedical terms vs genes -> biological processes

Medicine and Related Subjects from MeSH Classified by NLM

Gene Ontology (GO) Can Provide Links between Biological Processes and Genes

Approach to construct the literature network (part one) Step One: gene-to-term co-associated to a common set of articles Articles Gene Term annotation Index MeSH Gene Ontology TM

Approach to construct the literature network (part two) Step Two: gene-to-gene co-citation (co-mentioned, co-occurrence) Articles Gene B Gene A Index Biological relation Global approach Network Extension and Expansion

Linking gene-gene, gene-term, and term-term relations Term 2 (Metastasis) Gene 5 Term 1 (Angiogenesis) Gene 1 Gene 3Gene 4 Gene 2

Research design step by step logically Mapping/matching symbol to gene Filtering procedure Gene-articles index Term-articles index MeSH Gene Ontology TM Gene-gene network Gene-term network PubGene Database Gene network browser Internet PubGene TM Gene Database and Tools

Automated indexing of named human genes Gene nomenclature Database(13712) HUGO (9722) LocusLink (2729) GENATLAS (1239) GDB (358) Primary symbol Gene name Alternative symbol (142)

Contribution to the gene-to-article index over time The total number of gene occurrences The MEDLINE before 1975 don’t contain abstracts More articles of the years 1999 & 2000 were expected to be include into MEDLINE

Distribution of genes with respect to the number of articles found to be reverent Distribution of genes with respect to the number of gene neighbors The histogram show ‘smoothed’ values. The distribution of genes by article ref. is almost exponentially decreasing. Genes tended to be mentioned in triplets almost as much as for the ref.

Types of gene relationships found in PubGene  To examine over-represented or incorrectly assigned relationship (40%) (29%) Symbols belong to more than one gene symbol Very general symbols coinciding with general acronyms Very short gene name

DIP  C(171,2) OMIM  C(6404,2)? 8643? DIP: “Number of actual links”  “Number of genes” OMIM: “Number of genes”  “Number of actual links” “Number of actual links”  “PubGene”  “Number of actual link found in PubGene” “Number of possible links”  “PubGene”  “Number of all links found in PubGene” Comparison of PubGene with manually curated database  To examine the under-represented gene pairs (51%)(45%)

(a) insufficient synonym lists (b) synonym case variation (c) complex gene family with immature or complex naming convention Reasons for under-representation of DIP derived gene pairs

The sum up from the verification of DIP and OMIM The numbers of interactions in DIP and OMIM contained in PubGene reflect that PubGene captures substantial amounts of the existing biological information on protein- protein interactions and on gene mapping and disease.

Linking relations to expression profiles (microarray, proteomics etc.) Term 2 (Metastasis) Gene 5 Term 1 (Angiogenesis) Gene 1 Gene 3Gene 4 Gene 2 Time series, expression levels, patterns, etc.

Verify the applicability of the tools by analyzing two publicly available microarray data sets  Discrimination analysis:  Literature associations highlight background knowledge for signature genes in patient sample data.  Kinetic & mechanism study  Detection of complex co-regulatory patterns between biologically related genes.

The “signature gene cluster” from unsupervised hierarchical clustering analysis (Nature. 403, ) Cell type Biological process

To explore the correlation between unsupervised clustering and supervised PubGene approach (Nature. 403, ) 4062 clones  1032symbol(PubGene)  50(up/down regulated) (7+14)/50=42% 6%  (1302,50)  B-cell signature 42/6=7 x significant compare to the random

Network of the genes in the GC-B signature GC-B signature  25genes  only 20genes map to network+the most important neighbors Underlying biological relationship between these genes Link signature gene to disease MeSH term  Fragile X, Angelman syndrome, lymphoma, leukaemia,… Link signature gene to Gene Ontology  transcriptional regulator Translocation in lymphomas Immunoglobulin recombination

To visualize complex co-regulatory patterns of gene expression and simultaneously highlight biological relationships 1hour 8hour (from Science. 283, 83-87) Transcription factors 8613clones  517clones  340 genes + 1hour-expression level  superimpose into sub- network of PubGene Angiogenesis

Rapid profiling of genes through the distribution of MeSH terms 6 hour1 hour MeSH indexing: the identification of strong association between genes and biological process Liking literature network to MeSH-terms ‘angiogenesis’  10/12 (highest fraction) (from Science. 283, 83-87) MeSH index

Summary  With the indexing strategy (gene-gene & gene-term co- citation), rich and varied information content and analytical flexibility, can incorporate more of the available biological knowledge for high-throughput gene expression analysis than any other analytical tool available.  Web-base solution and multiple-query can offer end-user literature information to microarray data by global and systematical view.