Download presentation
Presentation is loading. Please wait.
1
1. Lecture WS 2004/05Bioinformatics III1 Bioinformatics III “Systems biology”,“Integrative cell biology” Course will address two areas: 25% genomics: single protein phylogenies versus genome rearrangement, comparative genomics 75% integrated view of cellular networks
2
1. Lecture WS 2004/05Bioinformatics III2 Content Week1scale-free networks in biology Week2transcription, regulatory networks Week3protein complexes (Cellzome, Aloy et al. 2004) Week4protein networks: exp. data (Y2H; MS), computational data (Rosetta) Week5protein networks: graphical layout (force minimization) Week6protein networks: quality check (Bayesian analysis) Week7protein networks: modularity? Week8phylogeny Week9genome rearrangement (breakpoint analysis) Week10+11 metabolic networks: metabolic flux analysis, extreme pathways, elementary modes, C 13 method Week12mathematical modelling of signal transduction networks Week13integration of protein networks with metabolic pathways Week14exam
3
1. Lecture WS 2004/05Bioinformatics III3 Literature lecture slides will be available 1-2 days prior to lecture suggested reading: links will be put up on course website http://gepard.bioinformatik.uni-saarland.de/teaching...
4
1. Lecture WS 2004/05Bioinformatics III4 assignments 12 weekly assignments planned Homework assignments are handed out in the Thursday lectures and are available on the course website on the same day. Solutions need to be returned until Thursday of the following week 14.00 to Tihamer Geyer in room 1.09 Geb. 17.1, first floor, or handed in prior (!) to the lecture starting at 14.15. 2 students may submit one joint solution. Also possible: submit solution by e-mail as 1 printable PDF-file to tihamer.geyer@bioinformatik.uni-saarland.de. Tutorial: participation is recommended but not mandatory. Tue 11-13. Homeworks submitted on Thursdays will be discussed on the following Tuesday. In case of illness please send E-mail to: kerstin.gronow-p@bioinformatik.uni-saarland.dekerstin.gronow-p@bioinformatik.uni-saarland.de and provide a medical certificate.
5
1. Lecture WS 2004/05Bioinformatics III5 Schein = successful written exam The successful participation in the lecture course („Schein“) will be certified upon successful completion of the written exam in February 2005. Participation at the exam is open to those students who have received 50% of credit points for the 12 assignments. Unless published otherwise on the course website until 3 weeks prior to exam, the exam will be based on all material covered in the lectures and in the assignments. In case of illness please send E-mail to: kerstin.gronow-p@bioinformatik.uni-saarland.dekerstin.gronow-p@bioinformatik.uni-saarland.de and provide a medical certificate. A „second and final chance“ exam will be offered in April 2005.
6
1. Lecture WS 2004/05Bioinformatics III6 tutor Dr. Tihamer Geyer – assignments Geb. 17.1, room 1.09 tihamer.geyer@bioinformatik.uni-saarland.de
7
1. Lecture WS 2004/05Bioinformatics III7 Systems biology Biological research in the 1900s followed a reductionist approach: detect unusual phenotype isolate/purify 1 protein/gene, determine its function However, it is increasingly clear that discrete biological function can only rarely be attributed to an individual molecule. new task of understanding the structure and dynamics of the complex intercellular web of interactions that contribute to the structure and function of a living cell.
8
1. Lecture WS 2004/05Bioinformatics III8 Systems biology Development of high-throughput data-collection techniques, e.g. microarrays, protein chips, yeast two-hybrid screens allow to simultaneously interrogate all cell components at any given time. there exists various types of interaction webs/networks - protein-protein interaction network - metabolic network - signalling network - transcription/regulatory network... These networks are not independent but form „network of networks“.
9
1. Lecture WS 2004/05Bioinformatics III9 DOE initiative: Genomes to Life a coordinated effort slides borrowed from talk of Marvin Frazier Life Sciences Division U.S. Dept of Energy
10
1. Lecture WS 2004/05Bioinformatics III10 Facility I Production and Characterization of Proteins Estimating Microbial Genome Capability Computational Analysis –Genome analysis of genes, proteins, and operons –Metabolic pathways analysis from reference data –Protein machines estimate from PM reference data Knowledge Captured –Initial annotation of genome –Initial perceptions of pathways and processes –Recognized machines, function, and homology –Novel proteins/machines (including prioritization) –Production conditions and experience
11
1. Lecture WS 2004/05Bioinformatics III11 Analysis and Modeling –Mass spectrometry expression analysis –Metabolic and regulatory pathway/ network analysis and modeling Knowledge Captured –Expression data and conditions –Novel pathways and processes –Functional inferences about novel proteins/machines –Genome super annotation: regulation, function, and processes (deep knowledge about cellular subsystems) Facility II Whole Proteome Analysis Modeling Proteome Expression, Regulation, and Pathways
12
1. Lecture WS 2004/05Bioinformatics III12 Facility III Characterization and Imaging of Molecular Machines Exploring Molecular Machine Geometry and Dynamics Computational Analysis, Modeling and Simulation –Image analysis/cryoelectron microscopy –Protein interaction analysis/mass spec –Machine geometry and docking modeling –Machine biophysical dynamic simulation Knowledge Captured –Machine composition, organization, geometry, assembly and disassembly –Component docking and dynamic simulations of machines
13
1. Lecture WS 2004/05Bioinformatics III13 Facility IV Analysis and Modeling of Cellular Systems Simulating Cell and Community Dynamics Analysis, Modeling and Simulation –Couple knowledge of pathways, networks, and machines to generate an understanding of cellular and multi-cellular systems –Metabolism, regulation, and machine simulation –Cell and multicell modeling and flux visualization Knowledge Captured –Cell and community measurement data sets –Protein machine assembly time-course data sets –Dynamic models and simulations of cell processes
14
1. Lecture WS 2004/05Bioinformatics III14 “Genomes To Life” Computing Roadmap Biological Complexity Comparative Genomics Constraint-Based Flexible Docking Computing and Information Infrastructure Capabilities Constrained rigid docking Genome-scale protein threading Community metabolic regulatory, signaling simulations Molecular machine classical simulation Protein machine Interactions Cell, pathway, and network simulation Molecule-based cell simulation Current U.S. Computing
15
1. Lecture WS 2004/05Bioinformatics III15 First breakthrough: scale-free metabolic networks (d) The degree distribution, P(k), of the metabolic network illustrates its scale-free topology. (e) The scaling of the clustering coefficient C(k) with the degree k illustrates the hierarchical architecture of metabolism (The data shown in d and e represent an average over 43 organisms). (f) The flux distribution in the central metabolism of Escherichia coli follows a power law, which indicates that most reactions have small metabolic flux, whereas a few reactions, with high fluxes, carry most of the metabolic activity. It should be noted that on all three plots the axis is logarithmic and a straight line on such log–log plots indicates a power-law scaling. CTP, cytidine triphosphate; GLC, aldo-hexose glucose; UDP, uridine diphosphate; UMP, uridine monophosphate; UTP, uridine triphosphate. Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
16
1. Lecture WS 2004/05Bioinformatics III16 Second breakthrough: Yeast protein interaction network: first example of a scale-free network A map of protein–protein interactions in Saccharomyces cerevisiae, which is based on early yeast two-hybrid measurements, illustrates that a few highly connected nodes (which are also known as hubs) hold the network together. The largest cluster, which contains 78% of all proteins, is shown. The colour of a node indicates the phenotypic effect of removing the corresponding protein (red = lethal, green = non-lethal, orange = slow growth, yellow = unknown). Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
17
1. Lecture WS 2004/05Bioinformatics III17 Characterising metabolic networks Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004) To study the network characteristics of the metabolism a graph theoretic description needs to be established. (a) illustrates the graph theoretic description for a simple pathway (catalysed by Mg 2+ - dependant enzymes). (b) In the most abstract approach all interacting metabolites are considered equally. The links between nodes represent reactions that interconvert one substrate into another. For many biological applications it is useful to ignore co-factors, such as the high-energy- phosphate donor ATP, which results (c) in a second type of mapping that connects only the main source metabolites to the main products.
18
1. Lecture WS 2004/05Bioinformatics III18 Degree Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004) The most elementary characteristic of a node is its degree (or connectivity), k, which tells us how many links the node has to other nodes. a In the undirected network, node A has k = 5. b In networks in which each link has a selected direction there is an incoming degree, k in, which denotes the number of links that point to a node, and an outgoing degree, k out, which denotes the number of links that start from it. E.g., node A in b has k in = 4 and k out = 1. An undirected network with N nodes and L links is characterized by an average degree = 2L/N (where <> denotes the average).
19
1. Lecture WS 2004/05Bioinformatics III19 Degree distribution Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004) The degree distribution, P(k), gives the probability that a selected node has exactly k links. P(k) is obtained by counting the number o f nodes N(k) with k = 1,2... links and dividing by the total number of nodes N. The degree distribution allows us to distinguish between different classes of networks. For example, a peaked degree distribution, as seen in a random network, indicates that the system has a characteristic degree and that there are no highly connected nodes (which are also known as hubs). By contrast, a power-law degree distribution indicates that a few hubs hold together numerous small nodes.
20
1. Lecture WS 2004/05Bioinformatics III20 Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004) Aa The Erdös–Rényi (ER) model of a random network starts with N nodes and connects each pair of nodes with probability p, which creates a graph with approximately pN (N-1)/2 randomly placed links. Ab The node degrees follow a Poisson distribution, where most nodes have approximately the same number of links (close to the average degree ). The tail (high k region) of the degree distribution P(k ) decreases exponentially, which indicates that nodes that significantly deviate from the average are extremely rare. Ac The clustering coefficient is independent of a node's degree, so C(k) appears as a horizontal line if plotted as a function of k. The mean path length is proportional to the logarithm of the network size, l log N, which indicates that it is characterized by the small-world property. Random networks
21
1. Lecture WS 2004/05Bioinformatics III21 Origin of scale-free topology and hubs in biological networks Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004) The origin of the scale-free topology in complex networks can be reduced to two basic mechanisms: growth and preferential attachment. Growth means that the network emerges through the subsequent addition of new nodes, such as the new red node that is added to the network that is shown in part a. Preferential attachment means that new nodes prefer to link to more connected nodes. For example, the probability that the red node will connect to node 1 is twice as large as connecting to node 2, as the degree of node 1 (k 1 =4) is twice the degree of node 2 (k 2 =2). Growth and preferential attachment generate hubs through a 'rich-gets-richer' mechanism: the more connected a node is, the more likely it is that new nodes will link to it, which allows the highly connected nodes to acquire new links faster than their less connected peers.
22
1. Lecture WS 2004/05Bioinformatics III22 Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004) Scale-free networks Scale-free networks are characterized by a power-law degree distribution; the probability that a node has k links follows P(k) ~ k - - , where is the degree exponent. The probability that a node is highly connected is statistically more significant than in a random graph, the network's properties often being determined by a relatively small number of highly connected nodes („hubs“, see blue nodes in Ba). In the Barabási–Albert model of a scale-free network, at each time point a node with M links is added to the network, it connects to an already existing node I with probability I = k I / J k J, where k I is the degree of node I and J is the index denoting the sum over network nodes. The network that is generated by this growth process has a power-law degree distribution with = 3. Bb Such distributions are seen as a straight line on a log–log plot. The network that is created by the Barabási–Albert model does not have an inherent modularity, so C(k) is independent of k. (Bc). Scale-free networks with degree exponents 2< <3, a range that is observed in most biological and non-biological networks, are ultra-small, with the average path length following ℓ ~ log log N, which is significantly shorter than log N that characterizes random small-world networks.
23
1. Lecture WS 2004/05Bioinformatics III23 Network measures Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004) Scale-free networks and the degree exponent Most biological networks are scale-free, which means that their degree distribution approximates a power law, P(k) k - , where is the degree exponent and ~ indicates 'proportional to'. The value of determines many properties of the system. The smaller the value of , the more important the role of the hubs is in the network. Whereas for >3 the hubs are not relevant, for 2> >3 there is a hierarchy of hubs, with the most connected hub being in contact with a small fraction of all nodes, and for = 2 a hub-and-spoke network emerges, with the largest hub being in contact with a large fraction of all nodes. In general, the unusual properties of scale-free networks are valid only for - 2, increases with the number of nodes (that is, diverges), resulting in a series of unexpected features, such as a high degree of robustness against accidental node failures. For >3, however, most unusual features are absent, and in many respects the scale-free network behaves like a random one.
24
1. Lecture WS 2004/05Bioinformatics III24 Shortest path and mean path length Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004) Distance in networks is measured with the path length, which tells us how many links we need to pass through to travel between two nodes. As there are many alternative paths between two nodes, the shortest path — the path with the smallest number of links between the selected nodes — has a special role. In directed networks, the distance ℓ AB from node A to node B is often different from the distance ℓ BA from B to A. E.g. in b, ℓ BA = 1, whereas ℓ AB = 3. Often there is no direct path between two nodes. As shown in b, although there is a path from C to A, there is no path from A to C. The mean path length,, represents the average over the shortest paths between all pairs of nodes and offers a measure of a network's overall navigability.
25
1. Lecture WS 2004/05Bioinformatics III25 Clustering coefficient Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004) In many networks, if node A is connected to B, and B is connected to C, then it is highly probable that A also has a direct link to C. This phenomenon can be quantified using the clustering coefficient C I = 2n I /k(k-1), where n I is the number of links connecting the k I neighbours of node I to each other. In other words, C I gives the number of 'triangles' that go through node I, whereas k I (k I -1)/2 is the total number of triangles that could pass through node I, should all of node I's neighbours be connected to each other. For example, only one pair of node A's five neighbours in a are linked together (B and C), which gives n A = 1 and C A = 2/20. By contrast, none of node F's neighbours link to each other, giving C F = 0. The average clustering coefficient,, characterizes the overall tendency of nodes to form clusters or groups. An important measure of the network's structure is the function C(k), which is defined as the average clustering coefficient of all nodes with k links. For many real networks C(k) k -1, which is an indication of a network's hierarchical character. The average degree, average path length and average clustering coefficient depend on the number of nodes and links (N and L) in the network. By contrast, the P(k) and C(k ) functions are independent of the network's size and they therefore capture a network's generic features, which allows them to be used to classify various networks.
26
1. Lecture WS 2004/05Bioinformatics III26 Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004) Hierarchical networks To account for the coexistence of modularity, local clustering and scale- free topology in many real systems it has to be assumed that clusters combine in an iterative manner, generating a hierarchical network. The starting point of this construction is a small cluster of 4 densely linked nodes (4 central nodes in Ca). Next, 3 replicas of this module are generated and the 3 external nodes of the replicated clusters connected to the central node of the old cluster, which produces a large 16-node module. 3 replicas of this 16-node module are then generated and the 16 peripheral nodes connected to the central node of the old module, which produces a new module of 64 nodes. The hierarchical network model seamlessly integrates a scale-free topology with an inherent modular structure by generating a network that has a power-law degree distribution with degree exponent = 1 + ln4/ln3 = 2.26 (Cb) and a large, system-size independent average clustering coefficient ~ 0.6. The most important signature of hierarchical modularity is the scaling of the clustering coefficient, which follows C(k) ~ k -1 a straight line of slope - 1 on a log–log plot (Cc). A hierarchical architecture implies that sparsely connected nodes are part of highly clustered areas, with communication between the different highly clustered neighbourhoods being maintained by a few hubs (Ca).
27
1. Lecture WS 2004/05Bioinformatics III27 First breakthrough: scale-free metabolic networks (d) The degree distribution, P(k), of the metabolic network illustrates its scale-free topology. (e) The scaling of the clustering coefficient C(k) with the degree k illustrates the hierarchical architecture of metabolism (The data shown in d and e represent an average over 43 organisms). (f) The flux distribution in the central metabolism of Escherichia coli follows a power law, which indicates that most reactions have small metabolic flux, whereas a few reactions, with high fluxes, carry most of the metabolic activity. It should be noted that on all three plots the axis is logarithmic and a straight line on such log–log plots indicates a power-law scaling. CTP, cytidine triphosphate; GLC, aldo-hexose glucose; UDP, uridine diphosphate; UMP, uridine monophosphate; UTP, uridine triphosphate. Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)
28
1. Lecture WS 2004/05Bioinformatics III28 Second breakthrough: Yeast protein interaction network: first example of a scale-free network A map of protein–protein interactions in Saccharomyces cerevisiae, which is based on early yeast two-hybrid measurements, illustrates that a few highly connected nodes (which are also known as hubs) hold the network together. The largest cluster, which contains 78% of all proteins, is shown. The colour of a node indicates the phenotypic effect of removing the corresponding protein (red = lethal, green = non-lethal, orange = slow growth, yellow = unknown). Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
29
1. Lecture WS 2004/05Bioinformatics III29 Summary Many cellular networks show properties of scale-free networks - protein-protein interaction networks - metabolic networks - genetic regulatory networks (where nodes are individual genes and links are derived from expression correlation e.g. by microarray data) - protein domain networks However, not all cellular networks are scale-free. E.g. the transcription regulatory networks of S. cerevisae and E.coli are examples of mixed scale-free and exponential characteristics. Next lecture: - mathematical properties of networks - origin of scale-free topology - topological robustness Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.