Bio-Trac 40 (Protein Bioinformatics) October 9, 2008

Slides:



Advertisements
Similar presentations
Biology of cultured cells conti- Part 4 By : Saib al owini.
Advertisements

Biological pathway and systems analysis An introduction.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Microarray Data Analysis Day 2
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
SIGNALING FROM THE CELL SURFACE TO THE NUCLEUS
Cell signaling: responding to the outside world Cells interact with their environment by interpreting extracellular signals via proteins that span their.
Gene Ontology John Pinney
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Gene expression analysis summary Where are we now?
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Systems Biology Biological Sequence Analysis
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Introduction to Systems Biology. Overview of the day Background & Introduction Network analysis methods Case studies Exercises.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Genetics: From Genes to Genomes
Colinearity of Gene and Protein DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription translation.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Modeling Functional Genomics Datasets CVM Lessons 4&5 10 July 2007Bindu Nanduri.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Cell signaling Lecture 8. Transforming growth factor (TGF β) Receptors/Smad pathway BMP7 TGF β1, TGF β2, TGF β3 Dpp Inhibins Activins TGF β receptors.
Ch10. Intermolecular Interactions and Biological Pathways
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Dr Mohammad S Alanazi, MSc, PhD Molecular Biology KSU Cell Cycle Control, Defects and Apoptosis 1 st Lecture.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
LSM3241: Bioinformatics and Biocomputing Lecture 9: Biological Pathway Simulation Prof. Chen Yu Zong Tel:
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Cellular macromolecule catabolism cellular macromolecule metabolism cytoplasm organization and biogenesis establishment of cellular localization intracellular.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Cell Communication Chapter Cell Communication: An Overview  Cells communicate with one another through Direct channels of communication Specific.
Bioinformatics and Computational Biology
Introduction to biological molecular networks
Trends Biomedical In silico. “Omics” a variety of new technologies help explain both normal and abnormal cell pathways, networks, and processes simultaneous.
GO based data analysis Iowa State Workshop 11 June 2009.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
CELL CYCLE AND CELL CYCLE ENGINE OVERVIEW Fahareen-Binta-Mosharraf MIC
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Lung Cancer Tumour Markers
University of California at San Diego
Interrogation of cross talk between proteins and gene regulatory networks in breast cancer Chambers, Teressa Lee Hiren Karathia Sridhar Hannenhalli.
System Structures Identification
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
University of California at San Diego
Tutorial: Bioinformatics Resources
Regulation of Gene Expression
Presentation transcript:

Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis Bio-Trac 40 (Protein Bioinformatics) October 9, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center

Overview Introduction Approaches Case studies Systems biology What are large-scale omics data? What do they tell you? How to interpret? Approaches Omics data integration Resources: databases and tools Case studies Systems biology Top-down, bottom-up Pathway, network modeling

Bioinformatics focus is changing… Individual molecules DNA, RNA, proteins Sequence, structure, function Evolutionary analysis Population of molecules Genome, proteome and other “-omes” Interactions, complexes Pathways, processes High level organizations Genomics, Proteomics

From One Gene: multiple genetic variants, multiple transcripts, multiple protein products… This is at single gene level. There were large ethnic variations in both allele frequencies and types, with 69 polymorphisms in African-American DNA, 37 in DNA samples from Caucasian-American subjects, 30 in Han Chinese-American subjects, and 44 in DNA from Mexican-American subjects. Figure 1. Human CYP19 genetic polymorphisms. Schematic representation of the CYP19 gene structure. Arrows, locations of polymorphisms. Orange rectangles, open reading frame; light blue rectangles, UTRs. Red arrows, frequencies of >10%; dark blue arrows, frequencies from 1% to 10%; black arrows, polymorphisms with frequencies of <1%. AA, African-American subjects; CA, Caucasian-American subjects; HCA, Han Chinese-American subjects; MA, Mexican-American subjects. I/D, insertion/deletion event. The GT and TCT I/D polymorphisms and the variable number of tandem repeat (TTTA)n polymorphism, as well as amino acid changes resulting from nonsynonymous cSNPs, are also indicated. and PTMs…

To Global Knowledge: The “-ome” and “-omics” Genome Transcriptome Proteome Metabolome Other “-omes”: ORFeome Promoterome Interactome Receptome Phenome more… Functional proteomics mapping of a human signaling pathway. PMID: 15231748 Functional Proteomics: http://sysbio.harvard.edu/csb/macbeath/research/func_prot.html High throughput protein production for functional proteomics: http://cardiogenomics.med.harvard.edu/publication-detail?publication_id=1184

Gastric Cancer Global analysis Genes Potential Gene Markers ECM cluster Global analysis Genes Potential Gene Markers SPARC COL3A1 SULF1 YARS ABCA5 THY1 SIDT2 Corresponding to ECM cluster (Chen et al., 2003; Qiu et al, 2007)

Identification of novel MAP kinase pathway signaling targets (PMA/TPA  K562 cells  MAPK pathway  targets) ~3500 spots Digest of U-24 ~91spot changes reproducible Figure 1. Detection of PMA-Responsive Proteins in K562 Cells Whole-cell extracts (200 g) of non-treated proliferating K562 cells (A) or cells treated for 240 min with 10 nM PMA (B) were separated by 2D PAGE, and protein changes (indicated by red arrows) were detected using gel analysis software as described in Experimental Procedures. Arrows show proteins that (A) decrease in intensity following PMA treatment or (B) appear de novo or increase in intensity over the time course of treatment. Experiments were also run using extracts of cells treated for 20, 60, and 120 min and 24 hr (data not shown). Figure 4. Summary of Proteome Responses to PMA and MKK/ERK SignalingShown are protein spots that decrease (Control) or increase (PMA 240 min) in response to PMA treatment. Responses blocked by pretreatment with 20 M U0126 or induced upon expression of CA-MKK1/2 are shown in their respective boxes. Overlap between these two boxes highlights proteins identified as MAPK pathway targets. Brackets surround proteins found to be identical by mass spectrometry. Arrows denote instances in which the spot pattern of a particular protein is shifted to decreased pI, indicating posttranslational modification. Asterisks indicate that unknowns U-26 through U-32 represent the same protein, although observed changes are not solely due to activation of the MAPK pathway. U-52 is an intermediately modified form of U-93 that increased by only 20% and thus was not scored as a spot change. PMID: 11163208 Twenty-five targets of this signaling pathway were identified, of which only five were previously characterized as MKK/ERK effectors. The remaining targets suggest novel roles for this signaling cascade in cellular processes of nuclear transport, nucleotide excision repair, nucleosome assembly, membrane trafficking, and cytoskeletal regulation. -- Mol Cell. 6:1343-54, 2000

Drosophila Embryo Interaction Map Using Y2H technology, 102 bait protein homologous to human cancer genes, 2300 interactions detected, 710 high confidence. The proteins in the map that bear an RA (Ras Association) or RBD (Raf-like Ras-binding) domain define a discrete subnetwork around Ras-like GTPases (colored in yellow). The exploration of the present map leads to numerous biological hypothesis and expands our knowledge of regulatory protein networks important in human cancer as shown by the biological analysis of a particularly interesting network surrounding the Ras oncogene. 1: Genome Res. 2005 Mar;15(3):376-84. Epub 2005 Feb 14. Protein interaction mapping: a Drosophila case study. The Drosophila (fruit fly) model system has been instrumental in our current understanding of human biology, development, and diseases. Here, we used a high-throughput yeast two-hybrid (Y2H)-based technology to screen 102 bait proteins from Drosophila melanogaster, most of them orthologous to human cancer-related and/or signaling proteins, against high-complexity fly cDNA libraries. More than 2300 protein-protein interactions (PPI) were identified, of which 710 are of high confidence. The computation of a reliability score for each protein-protein interaction and the systematic identification of the interacting domain combined with a prediction of structural/functional motifs allow the elaboration of known complexes and the identification of new ones. The full data set can be visualized using a graphical Web interface, the PIMRider (http://pim.hybrigenics.com), and is also accessible in the PSI standard Molecular Interaction data format. Our fly Protein Interaction Map (PIM) is surprisingly different from the one recently proposed by Giot et al. with little overlap between the two data sets. Analysis of the differences in data sets and methods suggests alternative strategies to enhance the accuracy and comprehensiveness of the post-genomic generation of broad-scale protein interaction maps. PMID: 15710747 [PubMed - in process] Genome Res. 15:376-84, 2005.

Functional annotation Strategy for Functional Analyses of Omics Data Omics Data Microarray, 2D, IP, MS, etc. Protein mapping Bioinformatics Databases Gene, Protein, PPI, Pathway, PTM, etc. Literature (MEDLINE) Data integration Functional annotation Text mining Functional analysis ~50% GO annotations Biological pathways (e.g. KEGG, Reactome, PID, BioCarta) GO Profiling: Molecular function, biological process, cellular component Molecular networks (e.g. interaction, association) <10% pathway annotations biological insights Pathway, network, biomarker discovery

Methods for Functional Analysis Omics data integration Functional profiling Pathway analysis Resources/knowledgebases Molecular databases Omics data repositories Bioinformatics tools Open source: DAVID, FatiGO, iProXpress Commercial: Ingenuity, GeneGO Literature Text mining

Principles of multi-omics data integration for Systems Biology Protein-Centric –Omics Analysis Transcriptomics mRNA microarray dbEST coding EST Protein precursor Signaling Pathways Splicing forms Functional Profiling and Analysis Biological Processes Function Sites Metabolic Pathways Enzyme1 Enzyme2 Protease/ Peptidase iProXpress Proteomics Protein Peptide Peptidomics Natural peptides DNA methylation profiling: coding genes Epigenomics dbSNP/ HapMap: NS-SNP Genomics Metabolomics Metabolites: HMDB

ID Mapping Batch gene/protein retrieval and profiling Enter ID, gi # Information matrix Functional profiling http://pir.georgetown.edu/pirwww/search/idmapping.shtml

Cross References (DR line) Protein annotations Comments (CC line) Features (FT line) Well annotated entry: human p53 (P53_HUMAN) References (RX line) 21 years! Cross References (DR line) GO

what molecular function? what biological process? dir.niehs.nih.gov/ microarray/datamining/ what cellular component?

Biological Pathways and Networks Signaling pathways Metabolic pathways Organelle biogenesis Molecular networks

Pathways Human metabolic maps Global gene expression in skeletal muscle from gastric bypass patients before surgery and 1 year afterward. General trend after surgery: up-regulated anaerobic metabolism; down-regulated oxidative phosphorylation green, down-regulated genes red, up-regulated genes white, no data available Proc Natl Acad Sci U S A. 2007 Feb 6;104(6):1777-82 http://www.pnas.org/cgi/data/0610772104/DC1/30

Databases of Protein Functions Metabolic Pathways KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) Inter-Molecular Interactions and Regulatory Pathways IntAct: Protein interaction data from literature and user submission BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins Reactome - A curated knowledgebase of biological pathways Pathway Interaction Database (PID) BioCarta: Biological pathways of human and mouse Pathway Commons GO and GO annotation projects MetaCyc is a metabolic-pathway database. The database describes pathways, reactions, and enzymes of a variety of organisms, with a microbial focus.

Gene Ontology (GO) - Molecular Function - Biological Process - Cellular Component (http://www.geneontology.org/) Gene Ontology (GO)

GO Slim http://www.geneontology.org/GO.slims.shtml

Biological Pathway Resource Collection http://www.pathguide.org/ Protein-protein interactions Metabolic pathways Signaling pathways Pathway diagrams Transcription factors / gene regulatory networks Protein-compound interactions Genetic interaction networks

http://www.pathwaycommons.org/pc/home.do

KEGG Metabolic & Regulatory Pathways KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/pathway.html)

BioCarta Cellular Pathways (http://www.biocarta.com/index.asp) Transforming Growth Factor (TGF) beta signaling [Homo sapiens]

Transforming Growth Factor (TGF) beta signaling [Homo sapiens] (http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&) Reactome: events and objects (including modified forms and complex) Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol] Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens] Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol] Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……

PID Transforming Growth Factor beta signaling

Transforming Growth Factor (TGF) beta signaling Reactome PID ~26 proteins in PID are not defined in Reactome, while only 2 in Reactome not defined in PID

DNA binding and transcription regulation TGF-beta signaling – comparison between PID and Reactome Furin TGF-b LAP Growth signals II II Ca2+ PRO:000000616 TGF-b I TGF-b I Growth signals Stress signals TGF-beta receptor PRO:000000523 PRO:000000410 S P S P S P S P Y P S P Y P S P Cytoplasm Y P T P Y P T P S P Smad 2 S P Y P Y P Smad 7 STRAP T P T P PRO:000000650 Smad 2 S P MEKK1 Smad 4 ERK1/2 Smad 2 S P X Shc XIAP CaM TAK1 X Shc S P Y Smad 4 Smad 2 S P TAK1 K U Smad 2 T P S Degradation P38 MAPK pathway MAPKKK JNK cascade Smad 4 Smad 2 S P Smad 4 Smad 2 S P S P T Y K U Phosphorylation (P) at Serine (S), Threonine (T) and Tyrosine (Y) Ubiquitination (U) at Lysine (K) Ski Nucleus X Common in both Reactome & PID Only reported in Reactome * All others are in PID. Not all components in the pathway from both databases are listed DNA binding and transcription regulation

GEO: a gene expression/ molecular abundance repository http://www.ncbi.nlm.nih.gov/geo/ IntAct: open source database system and analysis tools for protein interaction data http://www.ebi.ac.uk/pride/ PRIDE: centralized, standards compliant, public data repository for proteomics data http://www.ebi.ac.uk/pride/

Analysis Tools iProXpress DAVID Babelomics - FatiGO Commercial: http://pir.georgetown.edu/iproxpress/ DAVID http://david.abcc.ncifcrf.gov/ Babelomics - FatiGO http://babelomics.bioinfo.cipf.es/ Commercial: Ingenuity: http://www.ingenuity.com/ GeneGO: http://www.genego.com/ Visual tools: Cytoscape: http://www.cytoscape.org/ CellDesigner: http://www.celldesigner.org/

iProXpress: Integrative analysis of proteomic and gene expression data MS spectrum Peptide ident. Protein ident. http://pir.georgetown.edu/iproxpress/ Information Function Pathway Family Categorize Statistics Association Knowledge

iProXpress – Pathway Profiling Organelle proteome data sets ER Mit Mit Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway… Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination. ER KEGG pathway

iProXpress Analysis Interface 1 2 3 4 5 6 8 7 Cross-data groups comparative profiling

http://david.abcc.ncifcrf.gov/

A Literature-Derived Network for Yeast All MEDLINE abstracts processed using statistical co-occurrence and NLP methods: Functional association (co-occurrence – grey shades Physical interaction – green Regulation of expression – red Phosphorylation – dark blue Dephosphorylation – light blue Inference: Ssn3 ->Hsp104 (b) and Ume6 -> Ino2 & Erg9 (c) expressions a | A yeast protein network was derived that applied information-extraction approaches to all abstracts that are stored in Medline, using both a statistical co-occurence method54 and a natural-language-processing (NLP)-based one62. Functional associations that were derived from co-occurrence are shown in shades of grey according to the level of confidence that was achieved. The NLP method extracts four types of relationship: stable physical interactions (green), regulation of expression (red), phosphorylation (dark blue) and dephosphorylation (light blue). The proteins (circles) are coloured according to their functional annotation: (co-)regulators of expression (red), kinases and cyclins (dark blue), phosphatases (light blue) and other proteins (grey). A version of this figure that includes all protein names is available in the supplementary information S1 (figure). b,c | Examples of unpublished relationships that can be inferred from the network. From the network we can infer that Ssn3 probably influences Hsp104 expression through phosphorylation of Msn2 (b). In addition, Ume6 probably regulates Erg9 expression and Rim11 is predicted to regulate the expression of both Ino2 and Erg9 (c). None of these hypotheses has been tested experimentally. Jensen et al., 2006

Case Studies Pathway studies: analysis of proteomics and gene expression data from cancer research I. Estrogen Signaling Pathways (estrogen-induced apoptosis) Breast cancer cells (+E2)  IP (AIB1, pY)  1D-gel  MS/MS II. Purine Metabolic Pathways (radiation-induced DNA repair) Human fibroblast (AT patient) + irradiation  2D-gel  MS  DNA microarray III. Melanosome Biogenesis (comparative organelle proteomic profiling) Melanoma cell  isolation of stage specific melanosmes  MS

Integrated Bioinformatics I. Estrogen Signaling Pathways (estrogen-induced apoptosis) E2 MCF-7 MCF-7/5C Estrogen deprived condition Apoptosis Breast cancer cells Signaling pathway: early events? AIB1 Growth Mimicking clinical condition: 2nd phase anti-estrogen drug resistance 200nM for 2h pY-IP AIB1-IP MS proteomics Hu ZZ, et al. (2008) US HUPO Expression Profiling, Pathway/Network Mapping Integrated Bioinformatics

Proteins only in E2 treated MCF-7/5C cells from both pY-IP and AIB1-IP GO profiling (biological process) Transcription Cell communication Chromosome remodeling & co-repression, cell cycle inhibition, apoptosis

G(o) alpha-2 subunit (pY/5C +E2) Pathway Mapping: G(o) alpha-2 subunit (pY/5C +E2) RAP1GAP (AIB1/5C+E2)

Hypothesized E2-induced Apoptosis Pathways Rap1GAP E2 Rap1a GPR30 Cytoplasm ? GNAO2 pY ERa ERK Nucleus TLE3 RUNX3 Sirt3 CIP29 MEK Apoptosis CDK1 BAD AIB1 Cell growth Gas pY-IP AIB1-IP Function GNAO2 G(o) alpha-2, GPCR signaling Rap1GAP Growth inhibition/apoptosis CDK1 BAD-mediated apoptosis Sirt3 Histone modification, apoptosis TLE3 Co-repression, apoptosis CIP29 Cell cycle arrest/apoptosis

Text mining for protein-protein interaction (PPI) information

Integrated Bioinformatics II. Purine Metabolic Pathways (radiation-induced DNA repair) Ionizing Radiation AT5BIVA ATCL8 ATM introduced AT patient fibroblast ATM-mutated ATM-wild type ATM Sensitive to IR damage Resistant to IR damage Proteins differentially expressed (1093) 2D-gel/MS mRNAs differentially expressed (231) DNA Microarray (13 proteins/genes) Intersections Expression Profiling, Pathway/Network Mapping Integrated Bioinformatics Hu ZZ, et al. (2008) J Prot. Bioinfo.

KEGG pathway profiles

(RRM2)

Purine metabolic pathway Ribonucleoside diphosphate reductase subunit M2 (RRM2) DNA synthesis DNA repair 1.17.4.1 ATP X dATP ADP  dADP dGTP X GTP dGDPGDP

Functional Association Networks RRM2 p53 BRCA1 HDAC1 RRM2 connected to other major DNA repair and cell cycle proteins, such as p53, BRCA1, HDAC1.

RRM2 in radiation-induced ATM-p53-mediated DNA repair pathway BRCA1 p53 RRM2 RRM1 DNA repair HDAC1 RR complex

III. Organelle Proteomes Comparative organelle proteome profiling allows to propose key proteins potentially involved in regulation of organelle biogenesis Schematic drawing of melanosome biogenesis pathway and key proteins involved in each stage. Chi A, et al. (2006) J. Prot. Res.

Towards Systems Biology (Nature 422:193, 2003) Genomics Transcriptomics Proteomics Metabolomics Bioinformatics Bibliomics …mics …omics Literature Mining Integrated knowledge and tools are needed for Systems Biology’s research

What is Systems Biology? ‘Systems biology defines and analyses the interrelationships of all of the elements in a functioning system in order to understand how the system works.’ -- Leroy Hood How an organism works from an overall perspective. Interactions of parts of biological systems how molecules work together to serve a regulator function in cells or between cells. how cells work to make organs, how organs work to make a person. Systems biology is the converse of reductionist biology.

Reductionist vs. Systems Biology The driving force in 20th century biology has been reductionism: From the population to the individual From the individual to the cell From the cell to the biomolecule From the biomolecule to the genome From the genome to the genome sequence With the publication of genome sequences, reductionist biology has reached its endpoint The driving force for 21st century biology will be integration: Integrating the activity of genes and regulators into regulatory networks Integrating the interactions of amino acids into protein folding predictions Integrating the interactions of metabolites into metabolic networks Integrating the interactions of cells into organisms Integrating the interactions of individuals into ecosystems

Universal Organizing Principles Large-scale organization Level 4 Functional modules Level 3 Regulatory motif, pathway Level 2 From the particular to the universal. The bottom of the pyramid shows the traditional representation of the cell's functional organization: genome, transcriptome, proteome, and metabolome (level 1). There is remarkable integration of the various layers both at the regulatory and the structural level. Insights into the logic of cellular organization can be achieved when we view the cell as a complex network in which the components are connected by functional links. At the lowest level, these components form genetic-regulatory motifs or metabolic pathways (level 2), which in turn are the building blocks of functional modules (level 3). These modules are nested, generating a scale-free hierarchical architecture (level 4). Although the individual components are unique to a given organism, the topologic properties of cellular networks share surprising similarities with those of natural and social networks. This suggests that universal organizing principles apply to all networks, from the cell to the World Wide Web. PMID: 12399572 Omics data, information Level 1 Although the individual components are unique to a given organism, the topologic properties of cellular networks share surprising similarities with those of natural and social networks

Approaches: top-down or bottom-up Three types of models Diagram portraying the top-down and bottom-up approach to systems biology. The molecular properties, deriving from experiments carried out in the molecular biosciences and from bioinformatics, lie at the basis of the construction of various network descriptions (models). Three types of models that are often applied in systems biology are shown. Bottom-up systems biology starts with the molecular properties to construct models to predict systemic properties followed by experimental validation and model refinement. By contrast, top-down systems biology is systemic-data driven. It starts with experimental data to discover or refine pre-existing models that describe the measured data successfully. In this way, previously unidentified interactions, mechanisms and molecules can be identified. Contemporary bottom-up systems biology often considers kinetic models whereas top-down systems biology predominantly typically focuses on regulatory models to analyze data. Molecular species such as enzymes, transcription factors or metabolites are shown as colored shapes. Reactions are displayed as full arrows. Dashed arrows depict regulatory influences (e.g. inhibitory allosteric feedback interactions). a Abbreviations: MCA, metabolic control analysis; +, incorporated in model; −, not incorporated in model. b Reaction stoichiometry (e.g. hexokinase: glucose + ATP ↔ glucose-6P + ADP). c Effectors of reactions (e.g. AMP is an activator of phosphofructokinase). d Reaction mechanism. For example, for hexokinase: e With flux balance analysis, an optimal flux distribution can be calculated provided that an optimality criterion is supplied [67]. f By application of flux balance analysis (the effector interactions are now redundant information). g The numerical values of the fluxes can be calculated without use of optimality criteria. h Qualitative distribution of control can be determined for particular systems [71]. i Quantitative distribution of control can be calculated. top-down: systemic-data driven, to discover or refine pre-existing models that describe the measured data (more on regulatory models). Emerges as dominant method due to “-omics”. bottom-up: starts with the molecular properties to construct models to predict systemic properties followed by validation and model refinement (more on kinetic models) (Silicon cell program: http://www.siliconcell.net/) Bruggeman FJ, Westerhoff HV. Trends Microbiol. 2007 15:45-50.

Top-down Yeast two-hybrid Combination of techniques (Y2H, protein arrays) Integration of other types of information (expression, localization or genetic studies) Figure 2. PPI maps serve as valuable frameworks for a molecular network representation of dynamic cellular processes. (a) Large PPI networks can be determined using genome scale HTP techniques such as the Y2H approach (proteins blue, links black). (b) The combination of different techniques (e.g. Y2H, LUMIER, protein arrays or affinity purification/MS approaches) for the identification of PPIs helps to draw a comprehensive picture of the human interactome (additional proteins and links, grey). These highly connected maps are static and do not account for spatial and temporal aspects in a cell. (c) Integration of other types of information (colored bars) (e.g. from expression, localization or genetic studies) with PPI data permits the identification of dynamic biologically relevant interaction subnetworks. Some proteins (red) are involved in several of the networks, changing their interaction patterns over time. dynamic biologically relevant interaction subnetworks Curr Opin Chem Biol. 2006 Dec;10(6):551-8.

EGFR-GAB1-ERK/Akt network EGFR signaling network model is constructed based on the reaction stoichiometry and kinetic constants Bottom-up J Biol Chem. 2006 281:19925-38 Flow chart representation of the EGFR-GAB1-ERK/Akt network. The reaction stoichiometry and kinetic constants of the EGFR network model are given in supplemental Tables S1–S3. PMID: 16687399 Supplementary data: http://www.jbc.org/cgi/data/M600482200/DC1/1 The model allows predictions of temporal patterns of cellular responses to EGF under diverse perturbations (e.g., EGF doses): The dynamics of GAB1 tyr-phosphorylation is controlled by positive GAB1-PI3K and negative MAPK-GAB1 feedbacks. The essential function of GAB1 is to enhance PI3K/Akt activation and extend the duration of Ras/MAPK signaling. GAB1 plays a critical role in cell proliferation and tumorigenesis by amplifying positive interactions between survival and mitogenic pathways

Gene regulatory networks (GRNs) WIRED Systems biology looks at the connections between components in cells. 1.2. Basics of GRNs Britten and Davidson [4] posited the concept that “A given state of differentiation tends to require the integrated activation of a very large number of noncontiguous genes”. This theory of gene regulation stands as a basis for current work that seeks to delineate the molecular mechanisms that govern cellular development and differentiation. Important to the progress made in this field has been the development of a multitude of experimental methods, such as gene knock-outs, reporter molecules, microarrays, proteomics, and computational methods, that are needed to develop models of gene networks. One of the goals of systems biology and computational modeling is to understand the coordination of the expression of interrelated genes across time and under various physiological conditions. Fig. 1. Representation of basic subunits of gene regulatory networks (GRNs). Genes 1 and 2 code for regulatory factors, and Genes 3–5 represent structural genes. The regulatory region of gene 1 receives input from sources A and B, and, in this case, both A and B are required to activate the gene. The product of this activation plus input from another source, C, permits activation of Gene 2 and transcription of the corresponding transcription factor. Transcription factor 2 activates Genes 4 and 5 but inhibits the activation of Gene 3, a gene additionally regulated by other transcription factors not shown. Downward arrows represent activation, and horizontal bars represent repression. Adapted with permission from Bolouri and Davidson [7]. Fig. 3. Interrelationships among the components of the proposed GRN for micromere specification in sea urchin. See text for details. Early m signal: an early signal in micromere specification; repressor: the ubiquitous repressor that is repressed by Pmar1. Hnf, Delta, Dri, Tbr, and Ets represent other genes involved in the network. Downward arrows represent activation, horizontal bars represent repression, and dotted lines indicate repressed pathways. The diamond represents the early veg2 signal and the square represents the secondary veg2. Adapted with permission from Oliveri et al. [1]. Fig. 4. Depiction of some of the essential elements of the role of Dorsal in establishing dorsoventral polarity in Drosophila embryonic development. (A) High levels of dorsal activate twi and snail with consequent downstream events. (B) Low levels of dorsal lead to downstream events that include inhibition of an inhibitor (snail) permitting the activation of rho, sim, and sog. See text for details. Dotted lines represent inactive pathways; downward arrows represent gene activation; horizontal bars represent repression. Adapted with permission from Stathopoulos and Levine [2]. Essential elements of the role of Dorsal in establishing dorsoventral polarity in Drosophila embryonic development Reprod Toxicol. 19:281-90, 2005

Modeling of the main modules of cell-cycle progression Three functional units: Start function: onset of S-phase Cyclin cascades (C1, C2, C3) End function: onset of mitosis to cell division Figure 2. Modeling the sequential interconnections between the main modules of cell-cycle progression. The cell-cycle blueprint is given by three functional units: a Start function that allows the onset of the S-phase when a critical protein level has been achieved, that is, when the Far1-Cln3 threshold is achieved, followed by a cascade of three cyclin subsystems (indicated as C1, C2, C3) and an End function that comprises the events from the onset of mitosis to cell division. and indicate that cyclin and Cki act as the positive and negative component of the threshold mechanism, respectively (PMID: 15457529). Chembiochem 5:1322-33, 2004

Challenges to Systems Biology A complete characterization of an organism (molecular constituents  interactions  cell function) Spatial-temporal molecular characterization of a cell A thorough systems analysis of “molecular response” of a cell to external/internal perturbations Information must be integrated into mathematical models to enable knowledge testing by formulating hypothesis and discovery of new biological mechanisms…

Cellular Maps? signaling, metabolism, gene regulation …