Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data Marek Kimmel Rice University Houston, TX, USA

Slides:



Advertisements
Similar presentations
BiGCaT Bioinformatics Hunting strategy of the bigcat.
Advertisements

Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Molecular & Genomic Surgery Eric M. Wilson 1/5/10.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Gene Expression Chapter 9.
DNA microarray and array data analysis
Proteomics: Its Function and Methods Ryan Victor.
Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes Mark Schena, Dari Shalon, Renu Heller, Andrew Chai, Patrick O. Brown,
Microarrays: Tools for Proteomics
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Alternative Splicing As an introduction to microarrays.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Introduce to Microarray
Gene Expression BMI 731 Winter 2005 Catalin Barbacioru Department of Biomedical Informatics Ohio State University.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
and analysis of gene transcription
with an emphasis on DNA microarrays
Identification of regulatory proteins from human cells using 2D-GE and LC-MS/MS Victor Paromov Christian Muenyi William L. Stone.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Chapter 5: Hybridisation & applications
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
The dynamic nature of the proteome
Data Type 1: Microarrays
Microarray Technology
Finish up array applications Move on to proteomics Protein microarrays.
Scenario 6 Distinguishing different types of leukemia to target treatment.
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Quantification of Membrane and Membrane- Bound Proteins in Normal and Malignant Breast Cancer Cells Isolated from the Same Patient with Primary Breast.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
LEQ: HOW DOES DNA PROFILING WORK? 12.8 to NUCLEIC ACID PROBES  Short single strands of DNA w/ specific nucleotide sequences are created using.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Figure S1 Figure S1. Effect of MeJA on spore germination of M. oryzae. The data presented were the means (± standard error) of spore germination from three.
Proteome and Gene Expression Analysis Chapter 15 & 16.
The type of target influence the type of drug Main drugs’ categories –Small molecules –Biologicals (e.g. antibodies, hormones, etc) The drug discovery.
Disease Diagnosis by DNAC MEC seminar 25 May 04. DNA chip Blood Biopsy Sample rRNA/mRNA/ tRNA RNA RNA with cDNA Hybridization Mixture of cell-lines Reference.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Introduction to Oligonucleotide Microarray Technology
Microarray: An Introduction
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
Part 3 Gene Technology & Medicine
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Microarray Technology and Applications
Relationship between Genotype and Phenotype
Lecture 2 Techniques in proteomics By Ms. Shumaila Azam
DNA Technology.
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
Relationship between Genotype and Phenotype
Proteomics Informatics David Fenyő
Proteomics Informatics David Fenyő
Data Type 1: Microarrays
Presentation transcript:

Analysis of Gene Networks and Signaling Pathways Based on Gene Expression and Proteome Data Marek Kimmel Rice University Houston, TX, USA

Outline Basics: gene expression vs. protein abundance. Basics: gene expression vs. protein abundance. Perceptron analysis of gene networks Perceptron analysis of gene networks Proteomic analysis of FGF-2 signaling in breast cancer Proteomic analysis of FGF-2 signaling in breast cancer

Now that we have the sequence of the Human Genome – What Next?

Clinical Sciences Basic Sciences Molecular Medicine StructuralBiology GenomicsProteomics Bioinformatics

BCM-HGSC Genes make up only 3% of the genome 30,000

Measuring Gene Expression: Oligonucleotide Gene Microarrays A Probe Pair consists of a Perfect Match (PM) & a Mismatch (MM). There are typically 20 Probe Pairs in a Probe Set. A Probe Set usually corresponds to a single gene. The Affymetrix 95A human GeneChip contains 12,626 Probe Sets. Thus, there are almost 500,000 Probe Cells on a GeneChip. Affymetrix GeneChips ™

Oligonucleotide Gene Microarrays Each probe is 25 nucleotides long Affymetrix GeneChips ™

mRNA Preparation GAATTCAGTAACCCAGGCATTATTTTATCCTCAAGTCTTAGGTTGGTTGGAGAAAGATAACAAAAAGAAACATGA TTGTGCAGAAACAGACAAACCTTTTTGGAAAGCATTTGAAAATGGCATTCCCCCTCCACAGTGTGTTCACAGTGT GGGCAAATTCACTGCTCTGTCGTACTTTCTGAAAATGAAGAACTGTTACACCAAGGTGAATTATTTATAAATTAT GTACTTGCCCAGAAGCGAACAGACTTTTACTATCATAAGAACCCTTCCTTGGTGTGCTCTTTATCTACAGAATCC AAGACCTTTCAAGAAAGGTCTTGGATTCTTTTCTTCAGGACACTAGGACATAAAGCCACCTTTTTATGATTTGTT GAAATTTCTCACTCCATCCCTTTTGCTGATGATCATGGGTCCTCAGAGGTCAGACTTGGTGTCCTTGGATAAAGA GCATGAAGCAACAGTGGCTGAACCAGAGTTGGAACCCAGATGCTCTTTCCACTAAGCATACAACTTTCCATTAGA TAACACCTCCCTCCCACCCCAACCAAGCAGCTCCAGTGCACCACTTTCTGGAGCATAAACATACCTTAACTTTAC AACTTGAGTGGCCTTGAATACTGTTCCTATCTGGAATGTGCTGTTCTCTT 5’ 3’ DNA Sequence for IL-8 GAATTCAGTAACCCAGGCATTATTT|TATCCTCAAGTCTTAGGTTGGTTGG|AGAAAGATAACAAAAAGAAACATGA| TTGTGCAGAAACAGACAAACCTTTT|TGGAAAGCATTTGAAAATGGCATTC|CCCCTCCACAGTGTGTTCACAGTGT| GGGCAAATTCACTGCTCTGTCGTAC|TTTCTGAAAATGAAGAACTGTTACA|CCAAGGTGAATTATTTATAAATTAT| GTACTTGCCCAGAAGCGAACAGACT|TTTACTATCATAAGAACCCTTCCTT|GGTGTGCTCTTTATCTACAGAATCC| AAGACCTTTCAAGAAAGGTCTTGGA|TTCTTTTCTTCAGGACACTAGGACA|TAAAGCCACCTTTTTATGATTTGTT| GAAATTTCTCACTCCATCCCTTTTG|CTGATGATCATGGGTCCTCAGAGGT|CAGACTTGGTGTCCTTGGATAAAGA| GCATGAAGCAACAGTGGCTGAACCA|GAGTTGGAACCCAGATGCTCTTTCC|ACTAAGCATACAACTTTCCATTAGA| TAACACCTCCCTCCCACCCCAACCA|AGCAGCTCCAGTGCACCACTTTCTG|GAGCATAAACATACCTTAACTTTAC| AACTTGAGTGGCCTTGAATACTGTT|CCTATCTGGAATGTGCTGTTCTCTT 5’ 3’ Chop into short pieces suitable for hybridizing to 25mers on GeneChip Attach chromophore, then inject onto the GeneChip

Affymetrix Hybridization PMMM AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGAGCTATACGGTTC|AGTCGGATTAAGAGCTATACGGTTC| AGTCGGATTAAGTGCTATACGGTTC|AGTCGGATTAAGTGCTATACGGTTC| AGTCGGATTAAGGGCTATACGGTTC|AGTCGGATTAAGGGCTATACGGTTC| AGTCGGATTAAGAGCTATACGGTTC|AGTCGGATTAAGAGCTATACGGTTC| AGTCGGATTAAGGGCTATACGGTTC|AGTCGGATTAAGGGCTATACGGTTC| AGTCGGATTAAGTGCTATACGGTTC|AGTCGGATTAAGTGCTATACGGTTC| AGTCGGATTAAGAGCTATACGGTTC|AGTCGGATTAAGAGCTATACGGTTC| AGTCGGATTAAGGGCTATACGGTTC|AGTCGGATTAAGGGCTATACGGTTC| |TCAGCCTAATTCGCGATATGCCAAG|TCAGCCTAATTCGCGATATGCCAAG |TCAGCCTAATTCGCGATATGCCAAG|TCAGCCTAATTCGCGATATGCCAAG X

PMMM AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGCGCTATACGGTTC|AGTCGGATTAAGCGCTATACGGTTC| AGTCGGATTAAGAGCTATACGGTTC|AGTCGGATTAAGAGCTATACGGTTC| AGTCGGATTAAGTGCTATACGGTTC|AGTCGGATTAAGTGCTATACGGTTC| AGTCGGATTAAGGGCTATACGGTTC|AGTCGGATTAAGGGCTATACGGTTC| AGTCGGATTAAGAGCTATACGGTTC|AGTCGGATTAAGAGCTATACGGTTC| AGTCGGATTAAGGGCTATACGGTTC|AGTCGGATTAAGGGCTATACGGTTC| AGTCGGATTAAGTGCTATACGGTTC|AGTCGGATTAAGTGCTATACGGTTC| AGTCGGATTAAGAGCTATACGGTTC|AGTCGGATTAAGAGCTATACGGTTC| AGTCGGATTAAGGGCTATACGGTTC|AGTCGGATTAAGGGCTATACGGTTC| |TCAGCCTAATTCGCGATATGCCAAG|TCAGCCTAATTCGCGATATGCCAAG |TCAGCCTAATTCGCGATATGCCAAG|TCAGCCTAATTCGCGATATGCCAAG X Forms duplex with complementary strand Mismatch! Match

Probe Cell Intensities Average Difference =  (PM – MM)/Pairs in Average 1,662

Measuring Gene Expression “Spotted DNA Microarrays” Each spot is the cDNA for a specific gene. RNA from the experimental sample is labeled with Cy5 red fluorescent dye. RNA from the reference sample is labeled with Cy3 green fluorescent dye. Fluorescent intensity ratios (Cy5/Cy3) are measured

Where Do We Get the Data? Disease, Pathogens, Drugs, etc… mRNA expressed in response to stimulus mRNA collected and hybridized onto microarray Microarray analyzed for spot intensities Gene co-expression patterns cDNA Gene Microarray

Method Get mRNA samples from multiple conditions. Get mRNA samples from multiple conditions. Hybridize to DNA microarrays. Hybridize to DNA microarrays. Measure intensities. Measure intensities. Cluster. Cluster. Analyze results. Analyze results. Design new experiment. Design new experiment.

Discrimination between samples Green is “down”. Green is “down”. Red is “up”. Red is “up”. We can differentiate clearly between tumor and normal tissue. We can differentiate clearly between tumor and normal tissue. Can we find differences between progressing and non-progressing tumors? Can we find differences between progressing and non-progressing tumors?

Problematic quality of data Note the large dynamic range. And the very large number of data points. And the limited information content.

Proteomics Is to protein expression what genomics is to gene expression. Is to protein expression what genomics is to gene expression. Due to variations like post- translational modifications, there are many more proteins than genes. Due to variations like post- translational modifications, there are many more proteins than genes.

Proteomics Holds new promise for the future understanding of complex biological systems. Holds new promise for the future understanding of complex biological systems. Post-translational modifications include: Post-translational modifications include: – Phosphorylation – Glycosylation – Oxidation Many challenges remain, e.g. isolating, identifying, characterizing, and quantifying small amounts of a very large number of varieties of proteins Many challenges remain, e.g. isolating, identifying, characterizing, and quantifying small amounts of a very large number of varieties of proteins Currently, we primarily use 2D gels and mass spectroscopy. Currently, we primarily use 2D gels and mass spectroscopy.

Protein Separation Using 2D Gel Electrophoresis Protein analysis uses a diseased or treated sample and a control sample. 2D gel electrophoresis is performed for each sample to separate proteins based on their molecular weight and charge. Protein analysis uses a diseased or treated sample and a control sample. 2D gel electrophoresis is performed for each sample to separate proteins based on their molecular weight and charge. Black marks on the gel images indicate a protein or cluster of proteins and are referred to as "features." Black marks on the gel images indicate a protein or cluster of proteins and are referred to as "features." The x-axis is the Isoelectric point (pI) which is analagous to pH, while the y-axis is molecular weight (Mw) or size. The x-axis is the Isoelectric point (pI) which is analagous to pH, while the y-axis is molecular weight (Mw) or size.

Protein Separation

Protein Analysis Gels are fixed and stained with a fluorescent dye, then scanned. Gels are fixed and stained with a fluorescent dye, then scanned. Expression levels are measured based on the size of each feature on the gel. Expression levels are measured based on the size of each feature on the gel. Provides information about those proteins which are up and down- regulated, including how their abundance changed. Provides information about those proteins which are up and down- regulated, including how their abundance changed.

Protein Analysis

Protein Characterization Proteins are excised from the gel and treated with a succession of enzymes that cut amino acid chains into short polypeptides about 5-10 amino acids in length. Proteins are excised from the gel and treated with a succession of enzymes that cut amino acid chains into short polypeptides about 5-10 amino acids in length. The polypeptide fragments for each protein are then separated by capillary electrophoresis and analyzed using rapid-throughput mass spectrometry. The polypeptide fragments for each protein are then separated by capillary electrophoresis and analyzed using rapid-throughput mass spectrometry. At this point, we know the amino acid sequence of the polypeptide fragments, their mass, as well as post- translational modifications that occurred such as glycosylation and phosphorylation. At this point, we know the amino acid sequence of the polypeptide fragments, their mass, as well as post- translational modifications that occurred such as glycosylation and phosphorylation.

Protein Characterization

Systems Biology Consolidates genomics and proteomics differential expression data into a systematic description of pathways. Consolidates genomics and proteomics differential expression data into a systematic description of pathways. – Signaling pathways. – Inflammatory response pathways. – Metabolic pathways. – Etc… Potential for understanding the interrelationships between genes, proteins, and disease and identifying potential therapeutic targets. Potential for understanding the interrelationships between genes, proteins, and disease and identifying potential therapeutic targets.

Gene Expression vs. Protein Abundance What exactly are we measuring? What exactly are we measuring? What is the relationship between What is the relationship between - “level of gene expression” and - “abundance of proteins” ?

Dogma of Molecular Biology

Balance equations In the steady state, for a given gene i

Complicating Factors For any gene, product (protein) abundance is not necessarily proportional to the relative expression level, even under “steady state”. For any gene, product (protein) abundance is not necessarily proportional to the relative expression level, even under “steady state”. Products do not follow 1-order elimination kinetics. Instead they enter into complicated interactions with each other and with external factors. Products do not follow 1-order elimination kinetics. Instead they enter into complicated interactions with each other and with external factors.

Application: Identification of Gene Networks General ideas: Level of expression of a gene affects levels of expressions of other genes Level of expression of a gene affects levels of expressions of other genes Only three levels possible: Only three levels possible: ­ Normal (0) ­ Over-expression (1) ­ Under-expression (-1) Data: Arrays of perturbed expression levels in a set of genes Data: Arrays of perturbed expression levels in a set of genes Model: Perceptron (simplest neural net) Model: Perceptron (simplest neural net)

Reference Kim et al. (2000) “General nonlinear framework for the analysis of gene interaction via multivariate expression arrays” Journal of Biomedical Optics 5, 411– 424

Data table Data table Perceptron function: Perceptron function: ­ g(.) is sigmoidal, ­ X’s and Y quantized to 3 levels

Training: Estimating coefficients a so that a coefficient of determination (  ) is maximized. Training: Estimating coefficients a so that a coefficient of determination (  ) is maximized. Of all possible dependencies, only these with  above threshold, are retained. Of all possible dependencies, only these with  above threshold, are retained.

Application FGF-2 Signaling Pathways and Breast Cancer General ideas: Use 2-D protein gels and mass spectrometry to measure abundance changes of proteins in cancer cells, relative to normal cells. Use 2-D protein gels and mass spectrometry to measure abundance changes of proteins in cancer cells, relative to normal cells. Use perturbed systems to draw conclusions on some specific signaling pathways. Use perturbed systems to draw conclusions on some specific signaling pathways. Example: Signaling pathways of one of the Fibroblast growth factors (FGF-2) in breast cancer. Example: Signaling pathways of one of the Fibroblast growth factors (FGF-2) in breast cancer.

Reference Hondermarck et al. (2001) “Proteomics of breast cancer for marker discovery and signal pathway profiling” Proteomics 1, 1216–1232

Figure 2. Silver stained 2- DE profile of MCF-7 breast cancer cells. The major proteins were determined by MALDI-TOF and MS/MS after trypsin digestion.

Figure 3 MALDI-TOF and MS/MS spectra obtained for HSP70. (A) MALDI-TOF and (B) MS/MS analysis of peak m/z was performed. The letters labeling the peaks are the single letter code for the amino acids identified by MS/MS. Database searching allowed the identification of HSP70.

Figure 5 2-D patterns showing the down- regulation of sigma (indicated by an arrow) in seven representativ e breast tumor samples (C– I)

Design of experiments Previously depicted: “abundance proteomics”, no clues as to how things work. Previously depicted: “abundance proteomics”, no clues as to how things work. “Functional proteomics” “Functional proteomics” ­ Use perturbations of the hypothetical causal factor. ­ Measure not simply abundance but characteristics indicating, e.g., ­ Synthesis rates ­ Activation

Figure 7 Changes of protein synthesis induced by FGF- 2 stimulation in MCF-7 breast cancer cells. 35 S-labeled proteins from unstimulated (A, C) or stimulated (B, D) MCF-7 cells were separated by 2- DE and 2-D gels were subjected to autoradiography.

Credits Bruce Luxon (UTMB, Galveston, TX) Bruce Luxon (UTMB, Galveston, TX) George Weinstock (BCM, Houston, TX) George Weinstock (BCM, Houston, TX) Guy de Maupassant Guy de Maupassant [“three major virtues of a French writer: clarity, clarity, and clarity”]