CSE891-002 Selected Topics in Bioinformatics Jin Chen 232 Plant Biology Bld. 2011 Spring
About me… Jin Chen, Assistant Professor in CSE and PRL from 2009 Office: 232 Plant Biology Lab. Tel: (517) 355-5015. Email: jinchen@msu.edu
Outline Course Description Introduction to Computational Network Biology
Course Description Course objectives: study interesting computational network biology problems and their algorithms, with a focus on the principles used to design those algorithms. (3 credits) Instructor: Jin Chen, Office: 232 Plant Biology Bld. Email: jinchen@msu.edu Office hours: Thursday 2PM-3PM. If you cannot attend office hours, email me about scheduling a different time. Web page: http://www.msu.edu/~jinchen/cse891a
Course Description Course work: One 80 minutes lecture, and 80 minutes of discussion & student presentations each week Grading policies: The course will be graded on attendance (10%), participation (20%), and presentation (70%). No Final Exam
Course Description Prerequisites: Graduate students in science or engineering. Note: an override is necessary for non-CSE graduate students; please send your PID & NetID to me. No prior knowledge of biology is required. Computationally inclined biology graduate students are encouraged to take the class as well.
Suggested books A.-L. Barabási, Linked: The new science of networks U. Alon, An Introduction to Systems Biology B. Palsson. Systems Biology: Properties of Reconstructed Networks K. Kaneko, Life: An Introduction to Complex Systems Biology
Course Description Graph model Graph clustering subgraph mining Protein-protein interaction network Gene regulatory network Metabolic network Integrative study Network Biology Graph Mining
Date Title Topics 1/11/11 Course Organization. Introduction. Introduction to Computational Network Biology 1/13/11 Protein-Protein Interaction networks I PPI network construction and false positive detection 1/18/11 Student Presentation & Discussion 1 1/20/11 Protein-Protein Interaction networks II Topological analysis in PPI networks. Network motif. 1/25/11 Student Presentation & Discussion 2 1/27/11 Protein-Protein Interaction networks III Applications of PPI network (protein function prediction, network comparison) 2/1/11 Student Presentation & Discussion 3 2/3/11 Gene Correlation Networks I Gene co-expression study 2/8/11 Student Presentation & Discussion 4 2/10/11 Gene Correlation Networks II Gene co-regulation study 2/15/11 Student Presentation & Discussion 5 2/17/11 Gene Transcriptional Regulation Networks I cis-elements and gene co-regulation 2/22/11 Student Presentation & Discussion 6 2/24/11 Gene Transcriptional Regulation Networks II Bayesian network for GRN construction 3/1/11 Student Presentation & Discussion 7 3/3/10 Gene Transcriptional Regulation Networks III ChIP-seq and its applications in GRN construciton 3/7-11 Spring Break 3/15/11 Student Presentation & Discussion 8 3/17/11 Gene Transcriptional Regulation Networks IV GRN topological study 3/22/11 Student Presentation & Discussion 9 3/24/11 Metabolic Networks I Flux balance analysis and metabolic control analysis 3/29/11 Student Presentation & Discussion 10 3/31/11 Metabolic Networks II Integrative study: r-FBA model 4/5/11 Student Presentation & Discussion 11 4/7/11 Graph Mining I Graph models 4/12/11 Student Presentation & Discussion 12 4/14/11 Graph Mining II Graph clustering and partitioning 4/19/11 Student Presentation & Discussion 13 4/21/11 Graph Mining III Frequent subgraph mining 4/26/11 Student Presentation & Discussion 14
Paper list Chua et al. Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinformatics (2006) 22 (13): 1623-1630. Kashani et al. Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics 2009, 10:318 Deng et al. Prediction of Protein Function Using Protein–Protein Interaction Data. Journal of Computational Biology. December 2003, 10(6): 947-960. Hu et al. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics. Vol. 21 Suppl. 1 pp. i213–i221. 2005 Xu et al. Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles. ICDE 2006 Xu et al, Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines. PLoS Computational Biology. 5(4) 2009 Huang et al. Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Systems. 43. 1207–1225. 2007 Honkela et al. Model-based method for transcription factor target identification with limited data. PNAS vol 107(17) pp. 7793–7798. 2009 Vermeirssen et al. Transcription factor modularity in a Gene-Centered C. elegans Protein-DNA interaction network. Genome Research 17, 061-1071. 2007 Covert et al, Transcriptional Regulation in Constraints-Based Metabolic Models of Escherichia coli, Journal of Biological Chemistry, 277(31): pp. 28058-28064. 2002 Herrgard et al. Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae. Genome Research. 16:627–635. 2006 Barabási et al. Network Biology: Understanding the Cell's Functional Organization. Nature Reviews Genetics 5, 101-113. 2004 Dongen. A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000 Huan et al. Mining Family Specific Residue Packing Patterns from Protein Structure Graphs, RECOMB, pp. 308-315, 2004
Course Description Select at least one paper for presentation from the paper list. Email me which paper you will present by next Mon (1/17/2011) Each presentation is 45 min, including 15 min Q&A, followed with a discussion Your grade will be largely determined by the presentation (70%) Presentation starts from next Tue (1/18/2011)
Important Days: Class Begins 1/10/2011 Open adds end 1/14/2011 Last day to drop with refund 2/3/2011 Last day to drop with no grade reported 3/2/2011 Class Ends 5/6/2011
Introduction to Computational Network Biology Network biology belongs to systems biology, which belongs to genomics Interested in the relations between entities rather than the entities themselves The figure shows the resulting module in a 3 dimensional plot for an example data set on diffuse large B-cell lymphomas, used in the study of Dittrich et al. (2008). Differential expression is depicted by node colouring (red: upregulated in tumor subclass I (GBC), green: upregulated in tumor subclass II (ACB)). Known disease-relevant modules (shaded) (Rosenwald et al., 2002) are captured and extended by the integrated network approach. http://bionet.bioapps.biozentrum.uni-wuerzburg.de/
Network’s everywhere Internet, social network, anti-terrorism network Biological networks Protein-protein interaction (PPI) network protein-DNA interaction network gene correlation network gene regulatory network metabolic network signaling network… Network is a tool for under standing complex systems Network models explains network properties and support network behavior study Network measures provide quantitative analysis for complex systems Robustness
Definition of network (graph) Self-loop Multi-set of edges Edge G(V,E) Node (vertex) Simple graph: does not have loops (self-edges) and does not have multi-edges.
Definition of network (graph) Directed graph vs. Undirected graph Labeled graph vs. Unlabeled graph Symmetric graph vs. Asymmetric graph
Webpage layout Pages on a web site and the hyperlinks between them The different shades denote the optimal division into communities found by the shortest-path version of our algorithm. M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 2004
Adopted from R Albert’s slides
Biological networks Many non-identical elements; diverse interactions
Yeast Protein-Protein Interaction network Map of protein-protein interactions. The colour of a node signifies the phenotypic effect of removing the corresponding protein (red, lethal; green, non-lethal; orange, slow growth; yellow, unknown). Hawoong Jeong
Gene regulation network of sea urchin Eric Davidson
Abhishek Murarka Metabolic flux analysis of E. coli Metabolic flux analysis of wild-type Escherichia coli and mutants deficient in pyruvate-dissimilating enzymes during the fermentative metabolism of glucuronate Abhishek Murarka
Why study networks? Complex systems cannot be described in a reductionist view Behavior study of complex systems starts with understanding the network topology Network - related questions: How do we reconstruct a network? How can we quantitatively describe large networks? How did networks get to be the way they are?
Simple measures Node Degree: the number of edges connected to the node In-degree & Out-degree Total in-degree == total out-degree Average Degree: the average of node degrees for all the nodes in the network, denoted as: Degree distribution: the degree distribution P(k) gives the fraction of nodes that have k edges Edge weight: the weight of the edges where N is the number of nodes in the network, ki is the node degree of node i
Simple measures Shortest path: to find a path between two nodes such that the sum of the weights of its constituent edges is minimized Graph diameter: the longest shortest path between any pair of nodes in the graph. Connected graph: any two vertices can be joined by a path Bridge: if we erase the edge, the graph becomes disconnected
Simple measures Betweenness centrality: for all node pairs (i, j), find all the shortest paths between nodes i and j, denoted as C(i,j), and determine how many of these pass through node k, denoted as Ck(i,j). Betweenness centrality of node k is Calculating the betweenness involves calculating the shortest paths between all pairs of vertices on a graph. O(V2logV + VE) for sparse graph with Johnson’s algorithm. L. C. Freeman, Sociometry 40, 35 (1977); P. E. Black, Dictionary of Algorithms and Data Structures (2004)
Simple measures Clustering coefficient: a measure of degree to which nodes in a graph tend to cluster together. It is based on triplets of nodes. Neighborhood N for a vertex vi is defined as its immediately connected neighbors as The local clustering coefficient Ci for a vertex vi is then given by the proportion of links between the vertices within its neighborhood divided by the number of links that could possibly exist between them: L. C. Freeman, Sociometry 40, 35 (1977)
Complex measures Frequent subgraph mining Graph comparison & classification Graph isomorphic testing
Useful software Visualization & Topological Analysis Cytoscape (www.cytoscape.org) Pajek (vlado.fmf.uni-lj.si/pub/networks/pajek) Graph related programming LEDA (www.algorithmic-solutions.com) Nauty (www.cs.sunysb.edu/~algorith/implement/nauty/implement.shtml)
1960 1999 2002
Real networks are much more complex Transcription regulatory networks of Yeast and E. coli show an interesting example of mixed characteristics how many genes a TF interacts with how many TFs interact with a given gene - scale-free - exponential
Modularity and network motif Cellular function are likely to be carried out in a highly modular manner Modular -- a group of genes/proteins that work together to achieve distinct functions Biology is full of examples of modularity
Remaining challenges Discovery of network motifs is closely related to the generation of random networks Structure of network motifs does not necessary determine function Relation between higher-level organizational, functional states and networks has not yet been studied Voigt, W. et al. Genetics 2005 Ingram P.J.et al. BMC Genomics 2006 Eric Werner. Nature 2007
Next class PPI network construction False-positive detection