Gene Expression and Networks. 2 Microarray Analysis Unsupervised -Partion Methods K-means SOM (Self Organizing Maps -Hierarchical Clustering Supervised.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
Gene Ontology John Pinney
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Gene expression analysis summary Where are we now?
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Introduction to BioInformatics GCB/CIS535
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Clustering (Gene Expression Data) 6.095/ Computational Biology: Genomes, Networks, Evolution LectureOctober 4, 2005.
Cluster analysis  Function  Places genes with similar expression patterns in groups.  Sometimes genes of unknown function will be grouped with genes.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Genetics: From Genes to Genomes
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Whole Genome Expression Analysis
From motif search to gene expression analysis
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Finish up array applications Move on to proteomics Protein microarrays.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Gene expression analysis
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Changes in Gene Regulation in Δ Zap1 Strain of Saccharomyces cerevisiae due to Cold Shock Jim McDonald and Paul Magnano.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Protein and RNA Families
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
An overview of Bioinformatics. Cell and Central Dogma.
Introduction to biological molecular networks
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Comparative transcriptomic analysis of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Gene expression. Gene Expression 2 protein RNA DNA.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Motif Search and RNA Structure Prediction Lesson 9.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
FINAL PROJECT- Key dates
Gene Expression Analysis
Microarray Experiment Design and Data Interpretation
Gene expression.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Dimension reduction : PCA and Clustering
Presentation transcript:

Gene Expression and Networks

2 Microarray Analysis Unsupervised -Partion Methods K-means SOM (Self Organizing Maps -Hierarchical Clustering Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM)

3 Clustering Grouping genes together according to their expression profiles. Hierarchical clustering: generate a tree –Each gene is a leaf on the tree –Distances reflect similarity of expression –Internal nodes represent functional groups –Similar approach to phylogenetic trees k-means clustering: generate k groups –Number k is chosen in advance –Each group represents similar expression

4 Hierarchical Clustering Example Five separate clusters are indicated by colored bars and by identical coloring of the corresponding region of the dendrogram. The sequence-verified named genes in these clusters contain multiple genes involved in (A) cholesterol biosynthesis, (B) the cell cycle, (C) the immediate-early response, (D) signaling and angiogenesis, and (E) wound healing and tissue remodeling. These clusters also contain named genes not involved in these processes and numerous uncharacterized genes.

5 Expression Correlation Similar expression between genes –One gene controls the other in a pathway –Both genes are controlled by another –Both genes required at the same time in cell cycle –Both genes have similar function Clusters can help identify regulatory motifs –Search for motifs in upstream promoter regions of all the genes in a cluster

6 Support Vector Machine(SVM) As applied to gene expression data, an SVM would begin with a set of genes that have a common function, for example, genes coding for components of the proteasome (positive set). In addition, a separate set of genes that are known not to be members of the functional class (negative set) is specified. Using this training set, an SVM would learn to discriminate between the members and non- members of a given functional class based on expression data. Having learned the expression features of the class, the SVM could recognize new genes as members or as non-members of the class based on their expression data.

7 How do SVM’s work ? Knowing the label of each example, the SVM tries to separates all training examples correctly and maximizes the distance between the points of each class If this is not possible in the input space it searches for a hyperplane in a higher dimension space kernel ?

8 Probe Selection Probe on DNA chip is shorter than target –Choice of which section to hybridize Select a region which is unstructured –RNA folding, DNA stem-and-loop Choose region which is target-specific –Avoid cross-hybridization with other DNA Avoid regions containing variation –Minimize presence of SNP sites

9 Probe Design Two main factors to optimize Sensitivity –Strength of interaction with target sequence –Requires knowledge of target only Specificity –Weakness of interaction with other sequences –Requires knowledge of ‘background’

10 Sensitivity Basic measure: best gapless alignment of entire probe against part of target sequence: AGTGCAAGTCCGATATGCCGTAATGCTATCA -2+6=+4 CTACACGA -7+1=-6 CTACACGA CTACACGA -6+2=-4 CTACACGA -8 Better: +3 for C–G, +2 for A–T, etc… -6+2=-4 CTACACGA

11 Selectivity E-value Can be calculated by Blasting the probe against the genome studied in the specific experiment.

12 Sources of Inaccuracy Some sequences bind better than others –Cross-hybridization, A–T versus G–C Scanning of microarray images –Scratches, smears, cell spillage Effects of experimental conditions –Point in cell cycle, temperature, density

13 Gene Expression Databases and Resources on the Web GEO Gene Expression Omnibus - List of gene expression web resources – Another list with literature references – Cancer Gene Anatomy Project – Stanford Microarray Database –

14 Functional Genomics The task is to define the function of a gene (or its protein) in the life processes of the organism, where function refers to the role it plays in a larger context.

15 GO (gene ontology) The GO project is aimed to develop three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated molecular functions (F) biological processes (P) cellular components (C) Ontology is a description of the concepts and relationships that can exist for an agent or a community of agents

16 GO AnnotationsRIM11 GO evidence and references Molecular Functionglycogen synthase kinase 3 activityglycogen synthase kinase 3 activity (ISS) protein serine/threonine kinase activity (IDA)ISS protein serine/threonine kinase activityIDA Biological Processprotein amino acid phosphorylationprotein amino acid phosphorylation (IGI, ISS) proteolysis (IGI) response to stress (IGI, IMP) sporulation (sensu Fungi) (IMP)IGIISS proteolysisIGI response to stressIGIIMP sporulation (sensu Fungi)IMP Cellular Component cytoplasm (IDA)cytoplasmIDA Extracted from SGD Saccharomyces Genome Database

17 Cellular Processes The cell is a dynamic entity –Grows, divides, responds to environmental changes Cellular processes - composed of molecular interactions Yeast cell cycle

18 Different cellular processes can be represented as graphs -Genetic networks -Metabolic pathways -Regulatory networks -protein-protein interaction networks

19 Representing Genetic Networks Entity Relationship Gene, protein, ligand Enhances, represses, becomes Enabler Energy source, catalyst

20 Metabolic pathways

21 Regulatory Network

22 Network Motifs Connected patterns of interactions that recur in the integrated cellular network statistically significantly more often than at random Analysis of transcription regulation networks

23 Analysis of transcription regulation networks Feed-forward loop Single input module (Shen-Orr S. et al., 2002) ………. ……….. Dense regulons A P1 g2

24 A large network of 8184 interactions among 4140 S. Cerevisiae proteins A network of interactions can be built For all proteins in an organism DATA TYPE Gal4 Gal80 Ste12 Dig2 Swi4 Swi6 ……. P1 P2

25 Highthroughput biological data is required for for generating networks Measure direct interactions –DNA footprinting –One-hybrid, two-hybrid experiments –Accurate but low throughput

26 Networks generated from microarray data are less accurate Expression levels with microarrays –Examine expression correlations –Problem: multiple interpretations –High throughput but only suggestive

27 Other Resources BioCyc – Biomolecular Interaction Network Database – ‘What is There’ Interaction Database – Gene Ontology Consortium –