Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Periodic clusters. Non periodic clusters That was only the beginning…
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Reduced Support Vector Machine
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Gene Expression and Networks. 2 Microarray Analysis Unsupervised -Partion Methods K-means SOM (Self Organizing Maps -Hierarchical Clustering Supervised.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Fuzzy K means.
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Whole Genome Expression Analysis
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
From motif search to gene expression analysis
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
More on Microarrays Chitta Baral Arizona State University.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Finish up array applications Move on to proteomics Protein microarrays.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Microarray - Leukemia vs. normal GeneChip System.
Gene expression analysis
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
MCB 317 Genetics and Genomics Topic 11 Genomics. Readings Genomics: Hartwell Chapter 10 of full textbook; chapter 6 of the abbreviated textbook.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
An overview of Bioinformatics. Cell and Central Dogma.
RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Proteomics, the next step What does each protein do? Where is each protein located? What does each protein interact with, if anything? What role does it.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
Gene expression. Gene Expression 2 protein RNA DNA.
Motif Search and RNA Structure Prediction Lesson 9.
Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
FINAL PROJECT- Key dates
Gene Expression Analysis
Gene expression.
Microarray Technology and Applications
Molecular Classification of Cancer
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Dimension reduction : PCA and Clustering
Presentation transcript:

Gene Expression and Networks

2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised -Partion Methods K-means SOM (Self Organizing Maps -Hierarchical Clustering

3 Support Vector Machine (SVM) As applied to gene expression data, an SVM would begin with a set of genes that have a common function, for example, genes coding for components of the proteasome. In addition, a separate set of genes that are known not to be members of the functional class is specified. These two sets of genes are combined to form a set of training of positive and negative examples Using this training set, an SVM would learn to discriminate between the members and non- members of a given functional class based on expression data. Having learned the expression features of the class, the SVM could recognize new genes as members or as non-members of the class based on their expression data.

4 How do SVM’s work ? Knowing the label of each example, the SVM tries to separates all training examples correctly and maximizes the distance between the points of each class If this is not possible in the input space a it searches for A hyperplane in a higher dimension space kernel ?

5 Clustering Grouping genes together according to their expression profiles. Hierarchical clustering: generate a tree –Each gene is a leaf on the tree –Distances reflect similarity of expression –Internal nodes represent functional groups –Similar approach to phylogenetic trees k-means clustering: generate k groups –Number k is chosen in advance –Each group represents similar expression

6 Hierarchical Clustering Example Five separate clusters are indicated by colored bars and by identical coloring of the corresponding region of the dendrogram. The sequence-verified named genes in these clusters contain multiple genes involved in (A) cholesterol biosynthesis, (B) the cell cycle, (C) the immediate-early response, (D) signaling and angiogenesis, and (E) wound healing and tissue remodeling. These clusters also contain named genes not involved in these processes and numerous uncharacterized genes.

7 Expression Correlation Causes of similar expression between genes –One gene controls the other in a pathway –Both genes are controlled by another –Both genes relate to same time in cell cycle –Both genes have similar function Clusters can help identify regulatory motifs –Search for motifs in upstream promoter regions of all the genes in a cluster

8 Probe Selection Probe on DNA chip is shorter than target –Choice of which section to hybridize Select a region which is unstructured –RNA folding, DNA stem-and-loop Choose region which is target-specific –Avoid cross-hybridization with other DNA Avoid regions containing variation –Minimize presence of SNP sites

9 Probe Design Two main factors to optimize Sensitivity –Strength of interaction with target sequence –Requires knowledge of target only Specificity –Weakness of interaction with other sequences –Requires knowledge of ‘background’

10 Measuring Sensitivity Basic measure: best gapless alignment of entire probe against part of target sequence: AGTGCAAGTCCGATATGCCGTAATGCTATCA -2+6=+4 CTACACGA -7+1=-6 CTACACGA CTACACGA -6+2=-4 CTACACGA -8 Better: +3 for C–G, +2 for A–T, etc… -6+2=-4 CTACACGA

11 Measuring Specificity Calculate sensitivity scores –For target and all background sequences Convert to hybridization probabilities –Based on binding energy, thermodynamics Calculate expected hybridizations –Gene abundance  hybridization probability Calculate proportion of good hybridizations –Target hybridizations ÷ total hybridizations

12 Sources of Inaccuracy Some sequences bind better than others –Cross-hybridization, A–T versus G–C Scanning of microarray images –Scratches, smears, cell spillage Effects of experimental conditions –Point in cell cycle, temperature, density

13 Gene Expression Databases and Resources on the Web GEO Gene Expression Omnibus - List of gene expression web resources – Another list with literature references – Cancer Gene Anatomy Project – Stanford Microarray Database –

14 Functional Genomics The task is to define the function of a gene (or its protein) in the life processes of the organism, where function refers to the role it plays in a larger context.

15 Levels of Function Gene function –Gene  mRNA  protein  reaction Pathways –Gene  protein  gene  protein Networks –Interaction between multiple pathways Organism –End result of many networks

16 Cellular Processes The cell is a dynamic entity –Grows, divides, responds to environmental changes Cellular processes - composed of molecular interactions Yeast cell cycle

17 Representing Genetic Networks Entity Relationship Gene, protein, ligand Enhances, represses, becomes Enabler Energy source, catalyst

18 Metabolic Network

19 Regulatory Network

20 A large network of 8184 interactions among 4140 S. Cerevisiae proteins A network of interactions can be built For all proteins in an organism DATA TYPE Gal4 Gal80 Ste12 Dig2 Swi4 Swi6 ……. P1 P2

21 Learning Networks (1) Measure direct interactions –DNA footprinting –One-hybrid, two-hybrid experiments –Accurate but low throughput

22 Learning Networks (2) Expression levels with microarrays –Examine expression correlations –Problem: multiple interpretations –High throughput but only suggestive

23 Learning Networks (3) Literature mining –Scan existing scientific literature –Problems: no standard sentence structure, diverse nomenclature, limited historically –Shows promise but many false positives Protein microarrays –Same as DNA microarrays but for proteins –Huge potential but not ready yet

24 Other Resources BioCyc – Biomolecular Interaction Network Database – ‘What is There’ Interaction Database – Gene Ontology Consortium –