ABSTRACT First genomic scale data about gene expression have recently started to become available in addition to complete genome sequence data and annotations.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Characterization of microbial communities in a fluidized-pellet-bed bioreactor by DGGE analysis As an extension of the fluidized pellet bed operation used.
AI and Bioinformatics From Database Mining to the Robot Scientist.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Continuous Random Variables and Probability Distributions
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Sai Moturu. Introduction Current approaches to microarray data analysis –Analysis of experimental data followed by a posterior process where biological.
1 Intro & materials. 2 Overview Monday –MA experimental basic –MA data analysis –Introduction to lab 1 –lab 1 Tuesday –Introduction to lab 2 –lab 2 Bio-Informatic.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown Science Vol. 278.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Understanding Data Analytics and Data Mining Introduction.
The Nature of Science Biology 20.
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Reconstructing Gene Networks Presented by Andrew Darling Based on article  “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised.
1/17 Identification of thermophilic species by the amino acid compositions deduced from their genomes Reporter: Yu Lun Kuo
How producers and consumers are connected! obtaining ENERGY RADIANT ENERGY CHEMICAL ENERGY MECHANICAL ENERGY AND THERMAL ENERGY.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
EMBL- EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK T +44 (0) F +44 (0) Gene Co-expression.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
9. Impact of Time Sale on Ω When all EMs are completely uncorrelated, When all EMs produce the exact same time series, Predictability of Ensemble Weather.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Pareto Principle “The Vital Few and Trivial Many Rule” “Predictable Imbalance” “80:20 Rule”
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
A collaborative tool for sequence annotation. Contact:
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
Module Networks BMI/CS 576 Mark Craven December 2007.
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Diauxic Shift Fermentation: energy-converting metabolism without the involvement of an exogenous oxidizing agent(typically anaerobic) Respiration: energy-converting.
Clustering Gene Expression Data BMI/CS 776 Mark Craven April 2002.
What is a Hidden Markov Model?
Fig. 2 Two-dimensional embedding result obtained using nMDS.
Results for all features Results for the reduced set of features
Presented by: Dr Beatriz de la Iglesia
Evaluating classifiers for disease gene discovery
Avdesh Mishra, Manisha Panta, Md Tamjidul Hoque, Joel Atallah
Ratio distributions of gene expression in each trisomy and ploidy compared with diploids. Ratio distributions of gene expression in each trisomy and ploidy.
Nonspecific Protein-DNA Binding Is Widespread in the Yeast Genome
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Interpretation of Similar Gene Expression Reordering
Volume 14, Issue 7, Pages (February 2016)
Functional genomics: Learning to think about gene expression data
Brandon Ho, Anastasia Baryshnikova, Grant W. Brown  Cell Systems 
Evaluating Classifiers for Disease Gene Discovery
Construction of a Rice Glycosyltransferase Phylogenomic Database and Identification of Rice-Diverged Glycosyltransferases  Cao Pei-Jian , Bartley Laura.
Correlation of miR-21 expression levels with DEGs in 28 bladder cancer cell lines. Correlation of miR-21 expression levels with DEGs in 28 bladder cancer.
Integrated analysis of gene expression and copy number alterations.
Correlation between journal impact factor and percentage of papers with image duplication. Correlation between journal impact factor and percentage of.
Presentation transcript:

ABSTRACT First genomic scale data about gene expression have recently started to become available in addition to complete genome sequence data and annotations. For instance, DeRisi et al (Science, Vol 278, 1997) have measured 1 relative changes in the expression levels of almost all yeast genes during the diauxic shift at seven time points at 2 hour intervals. The amounts of such data will be increasing rapidly, thus providing researchers with new challenges of finding ways to transform this data into knowledge, on one hand, while opening new possibilities of pure in silico studies of various aspects of genome functioning, on the other hand. We have used publicly available data about the diauxic shift to study some aspects of yeast metabolism and gene regulationDeRisi et al (Science, Vol 278, 1997) 1 data A shorter term goal is to explore ways to relate gene expression profiles during the diauxic shift to specific functional classes or specific regulation mechanisms.. To pursue the stated goals we used several approaches in parallel: we used visualisation approaches to look for correlations between gene functional classes and their expression levels at different time-points we used decision-trees to find rules predicting different gene functional classes based on their expression levels at various time-points. We used a general purpose data mining and visualisation tool Decisionhouse developed by Quadstone Ltd. DecisionhouseQuadstone Ltd Gene expression profiles Expression profiles of 250 randomly selected genes from over The horizontal axis depicts the time- points of the measurements. The 250 ORF names are given along the axis perpendicular to the plain, while the height of the glyphs and the colour depicts the logarithm (basis 2) of the expression rate change. Mining the Yeast Genome Alvis Brazma and Alan Robinson European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. Discussion The decision tree for discriminating the respiration genes from other genes with assigned function. At the top node there is a total of all 3347 genes, 64 of which are respiration genes. The tree provides a rule that allows one to distinguish more than one third of the respiration genes (concretely 22 genes) with 25% accuracy from the total list of 88 genes. This rule applied to the yet unclussified genes (total of 2731) depicts 61 gene. A naïve prediction rule based on this says that any of these 61 genes can have a respiration function with 25% probability. Decision tree for “respiration” genes In conclusion we can say that, although the gene expression data that we used are only the first publicly available such data on genomic scale, the pure in silico studies have already revealed new facts about the genome. This should encourage one to believe that with more high quality gene expression data becoming available, in silico discoveries regarding gene regulation will be a reality. To facilitate this process, a public gene expression database should be established. Such a database would not only help in developing gene expression data analysis tools and methods, but also allow one to compare data obtained by different technologies, to evaluate their reliability, and to establish "gold" standards for gene expression measurements. We would like to encourage the community to support an initiative to establish such a database. Average expression Average expression level for genes from various energy subclasses Total distribution The distribution of gene counts for different expression levels at time-points 1-7. The height of the glyphs represent the number of genes in the respective bins. The glyphs are colored according to the expression levels of the respective genes at time-point 7 Average expression at the 7 time-points. Note the drop of the expression levels at time-points 4 and 5 Expression profiles of all the genes from the 5 largest energy subclasses and the rest of the energy genes merged in one subclass. The vertical axis depicts various energy subclasses - tricarboxylic-acid pathway, respiration, reserves, others, glycolysis, and fermentation. The horizontal axis show the time-points, the axis perpendicular to the plain - various genes. The size and the colour of the glyphs depicts the ratio of the increase or decrease in the gene expression. Note that, as expected, all the respiration genes increase their expression level, while, most of the fermentation genes decrease. Contrary to the expected, there are several “fermentation” genes increasing their expression level in the last time- point. All these genes have been annotated based on the sequence similarity.