CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
CGeMM – University of Louisville Mining gene-gene interactions from microarray data - Coefficient of Determination Marcel Brun – CGeMM - UofL.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Hidden Markov Models Modified from:
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Gene expression analysis summary Where are we now?
Microarrays Dr Peter Smooker,
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
DNA Extraction
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Fuzzy K means.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Automatic methods for functional annotation of sequences Petri Törönen.
Demetris Kennes. Contents Aims Method(The Model) Genetic Component Cellular Component Evolution Test and results Conclusion Questions?
Exploring Current DNA Research of Longhorn Cattle.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Networks and Interactions Boo Virk v1.0.
Suppose we have analyzed total of N genes, n of which turned out to be differentially expressed/co-expressed (experimentally identified - call them significant)
Bioinformatics Brad Windle Ph# Web Site:
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Gene expression analysis
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
INTRODUCTION TO PLANT MOLECULAR GENETICS. Genetics The study of heredity The study of heredity The study of how differences between individuals are transmitted.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Central dogma: the story of life RNA DNA Protein.
Statistical Testing with Genes Saurabh Sinha CS 466.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Bioinformatics and Computational Biology
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Proteomics, the next step What does each protein do? Where is each protein located? What does each protein interact with, if anything? What role does it.
Flat clustering approaches
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Sequence Alignment.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
BME435 BIOINFORMATICS.
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Clustering Manpreet S. Katari.
GO : the Gene Ontology & Functional enrichment analysis
Statistical Testing with Genes
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
INTRODUCTION TO MOLECULAR GENETICS
Interpretation of Similar Gene Expression Reordering
Exact Test Fisher’s Statistics
Bar plot representation of the transcriptomic changes in Δsaci_ptp and Δsaci_pp2a. Bar plot representation of the transcriptomic changes in Δsaci_ptp and.
INTRODUCTION TO MOLECULAR GENETICS
Statistical Testing with Genes
Presentation transcript:

CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT BE THE FINAL GOAL OF A PROJECT.

LISTS OF GENES DON’T GIVE BIOLOGICAL ANSWERS. STATISTICS CAN COMPLETELY DETACHED FROM BIOLOGY. THE AMOUNT OF RESULTS IS ALWAYS BIGGER THAN OUR IMAGINATION. CAVEAT 2

WITH MICROARRAYS WE OBSERVE ONLY THE TRANSCRIPTOME. WE CAN ONLY BUILD UP HYPOTHESIS ABOUT GENOME AND PROTEOME. CAVEAT 3

CAREFUL AND EXTENSIVE ANNOTATION OF THE RESULTS IS NEEDED.

Dai M, et al Nucleic Acids Res Nov 10;33(20):e175. PMID:

THE PROBLEM OF ANNOTATION THE PROBLEM OF: WHO:WHO ARE THEY? WHAT:WHAT DO THEY DO? WHERE:WHERE ARE THEY AND WHERE DO THEY WORK? WHEN:WHEN DO THEY WORK? HOW:HOW DO THEY WORK?

WHO WE NEED TO GET ALL POSSIBLE INFORMATION ON THE GENES WE GET FROM MICROARRAYS. AVAILABLE TOOLS: Gene (EX-LocusLink), OMIM, PubMed

WHAT THE FUNCTION OF MANY GENES IS ALREADY KNOWN. AVAILABLE TOOLS: KEGG, GeneOntology (Biological Process, Molecular Function), OMIM, PubMed.

WHERE LOCATE THE GENES ON THE GENOME IS VERY IMPORTANT IN MANY SITUATIONS (--- a portion of a chromosome is strongly affected under a certain clinical condition) (--- genes closed to each other can be regulated with the same mechanisms). AVAILABLE TOOLS: NCBI-Genome, EnsEMBL. WHERE THE PRODUCTS OF THE GENES OPERATE INTO THE CELL? AVAILABLE TOOLS: KEGG, GeneOntology (Cellular Component), PubMed.

WHEN IN WHICH CONDITIONS THE EXPRESSION OF A GIVEN GENE CHANGES? AVAILABLE TOOLS: PubMed, GEO

HOW HOW DO GENES WORK? AVAILABLE TOOLS: PubMed, OMIM, Gene, GeneOntology

THE SOCIAL LIFE OF THE GENES DIFFERENT SOCIAL DIMENSIONS: DNA LEVEL (GENOMIC POSITION) RNA LEVEL (RNA PROCESSING) PROTEIN LEVEL (INTERACTION OF PROTEINS)

Consider a population of genes representing a diverse set of biological roles or themes shown below as different colors. Diverse Biological Roles

Many algorithms can be applied to expression data to partition genes based on expression profiles over multiple conditions. Many of these techniques work solely on expression data and disregard biological information.

-What are the some of the predominant biological themes represented in the cluster and how should significance be assigned to a discovered biological theme? Consider a particular cluster…

Example: Population Size: 40 genes Cluster size: 12 genes 10 genes, shown in green, have a common biological theme and 8 occur within the cluster.

The frequency of the theme in the population is 10/40 = 25% The frequency of the theme within the cluster is 8/12 = 67% * 80% of the genes related to the theme in the population ended up within the relatively small cluster. AND Consider the Outcome

Contingency Matrix A 2x2 contingency matrix is typically used to capture the relationships between cluster membership and membership to a biological theme.

out in Themeoutin Cluster Contingency Matrix

Assigning Significance to the Findings The Fisher’s Exact Test permits us to determine if there are non-random associations between the two variables, expression based cluster membership and membership to a particular biological theme inout in out Cluster Theme p .0002 ( 2x2 contingency matrix )

Hypergeometric Distribution ab cd a+c a+b b+d c+d The probability of any particular matrix occurring by random selection, given no association between the two variables, is given by the hypergeometric rule.

Probability Computation For our matrix, , we are not only interested in getting the probability of getting exactly 8 annotation hits in the cluster but rather the probability of having 8 or more hits. In this case the probabilities of each of the possible matrices is summed x x10 -8 