Presentation is loading. Please wait.

Presentation is loading. Please wait.

A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,

Similar presentations


Presentation on theme: "A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,"— Presentation transcript:

1 A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng, Yang The institute of biochemistry, NYMU Bioinformatics program Bioinformatics program and core labcore lab Tor-Kristian Jenssen, Astrid Laegreid, Jan Komorowski & Eivind Hovig Nature Genetics. Volume 28. may2001

2 Goals for system biology ? Cell., 100(1):57–70 Review, 2000. PNAS, Vol. 95, 14863-14868

3 How to Find Biologically Significant Events Using Microarray Tech? Fitting to current knowledge Sifting out variations

4 Mapping Gene Expression Data to KEGG Pathways

5 Linking Molecular Information to Phenotypes Can Provide Insights to Biological Processes Pathways: metabolic, signal transduction, etc. Phenotype: angiogenesis, metastasis

6 Information Hidden in Literature  Molecular functions  Protein-protein interactions  Protein-DNA (RNA) interactions  Phenotypic information  Physiological and pathological processes (ex. Angiogenesis, tumor metastasis)  Drug and chemical response

7 No Efficient Way to Find Genes Related to Angiogenesis http://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=0&form=1&term=angiogenesis

8 Strategies of Literature Mining  Keyword indexing (a gene)  protein annotation  Semantics ( 語意學 ) (genes)  Protein binding and interaction  Keyword co-occurrence (terms and genes)  Biomedical terms vs genes -> biological processes

9 Medicine and Related Subjects from MeSH Classified by NLM http://wwwcf.nlm.nih.gov/class/schedule.html

10 Gene Ontology (GO) Can Provide Links between Biological Processes and Genes

11 Approach to construct the literature network (part one) Step One: gene-to-term co-associated to a common set of articles Articles Gene Term annotation Index MeSH Gene Ontology TM

12 Approach to construct the literature network (part two) Step Two: gene-to-gene co-citation (co-mentioned, co-occurrence) Articles Gene B Gene A Index Biological relation Global approach Network Extension and Expansion

13 Linking gene-gene, gene-term, and term-term relations Term 2 (Metastasis) Gene 5 Term 1 (Angiogenesis) Gene 1 Gene 3Gene 4 Gene 2

14 Research design step by step logically Mapping/matching symbol to gene Filtering procedure Gene-articles index Term-articles index MeSH Gene Ontology TM Gene-gene network Gene-term network PubGene Database Gene network browser Internet PubGene TM Gene Database and Tools http://www.pubgene.org/

15 Automated indexing of named human genes Gene nomenclature Database(13712) HUGO (9722) LocusLink (2729) GENATLAS (1239) GDB (358) Primary symbol Gene name Alternative symbol 63 352 14048 13570(142)

16 Contribution to the gene-to-article index over time The total number of gene occurrences The MEDLINE before 1975 don’t contain abstracts More articles of the years 1999 & 2000 were expected to be include into MEDLINE

17 Distribution of genes with respect to the number of articles found to be reverent Distribution of genes with respect to the number of gene neighbors The histogram show ‘smoothed’ values. The distribution of genes by article ref. is almost exponentially decreasing. Genes tended to be mentioned in triplets almost as much as for the ref.

18 Types of gene relationships found in PubGene  To examine over-represented or incorrectly assigned relationship (40%) (29%) Symbols belong to more than one gene symbol Very general symbols coinciding with general acronyms Very short gene name

19 DIP  C(171,2) OMIM  C(6404,2)? 8643? DIP: “Number of actual links”  “Number of genes” OMIM: “Number of genes”  “Number of actual links” “Number of actual links”  “PubGene”  “Number of actual link found in PubGene” “Number of possible links”  “PubGene”  “Number of all links found in PubGene” Comparison of PubGene with manually curated database  To examine the under-represented gene pairs (51%)(45%)

20 (a) insufficient synonym lists (b) synonym case variation (c) complex gene family with immature or complex naming convention Reasons for under-representation of DIP derived gene pairs

21 The sum up from the verification of DIP and OMIM The numbers of interactions in DIP and OMIM contained in PubGene reflect that PubGene captures substantial amounts of the existing biological information on protein- protein interactions and on gene mapping and disease.

22 Linking relations to expression profiles (microarray, proteomics etc.) Term 2 (Metastasis) Gene 5 Term 1 (Angiogenesis) Gene 1 Gene 3Gene 4 Gene 2 Time series, expression levels, patterns, etc.

23 Verify the applicability of the tools by analyzing two publicly available microarray data sets  Discrimination analysis:  Literature associations highlight background knowledge for signature genes in patient sample data.  Kinetic & mechanism study  Detection of complex co-regulatory patterns between biologically related genes.

24 The “signature gene cluster” from unsupervised hierarchical clustering analysis (Nature. 403, 503-511) Cell type Biological process

25 To explore the correlation between unsupervised clustering and supervised PubGene approach (Nature. 403, 503-511) 4062 clones  1032symbol(PubGene)  50(up/down regulated) (7+14)/50=42% 6%  (1302,50)  B-cell signature 42/6=7 x significant compare to the random

26 Network of the genes in the GC-B signature GC-B signature  25genes  only 20genes map to network+the most important neighbors Underlying biological relationship between these genes Link signature gene to disease MeSH term  Fragile X, Angelman syndrome, lymphoma, leukaemia,… Link signature gene to Gene Ontology  transcriptional regulator Translocation in lymphomas Immunoglobulin recombination

27 To visualize complex co-regulatory patterns of gene expression and simultaneously highlight biological relationships 1hour 8hour (from Science. 283, 83-87) Transcription factors 8613clones  517clones  340 genes + 1hour-expression level  superimpose into sub- network of PubGene Angiogenesis

28 Rapid profiling of genes through the distribution of MeSH terms 6 hour1 hour MeSH indexing: the identification of strong association between genes and biological process Liking literature network to MeSH-terms ‘angiogenesis’  10/12 (highest fraction) (from Science. 283, 83-87) MeSH index

29 Summary  With the indexing strategy (gene-gene & gene-term co- citation), rich and varied information content and analytical flexibility, can incorporate more of the available biological knowledge for high-throughput gene expression analysis than any other analytical tool available.  Web-base solution and multiple-query can offer end-user literature information to microarray data by global and systematical view.


Download ppt "A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,"

Similar presentations


Ads by Google