Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1
Outline Introduce ― How to construct a gene co-expression network? ― Steps and our criterion Method Result & Analysis 2
Introduce In gene co-expression networks, nodes represent gene transcripts. Two genes are connected by an edge if their expression values are highly correlated. Definition of “high” correlation is somewhat tricky ― One can use statistical significance… ― But we propose a criterion for picking threshold parameter: spectral graph theory. 3
Introduce 4
Methods Microarray data sets ― Homo sapiens ― Saccharomyces cerevisiae: baker’s yeast 5
Methods Network construction ― Construct a complete graph ― Compute Pearson correlation coefficient between each nodes. ― A high-pass filter between 0.70 to 0.95 threshold Network representation ― Laplacian of the graph G 6
Methods Eigenvalue and eigenvector computation ― Aim to solve the eigenvalue problem defined above. ― resulting eigenvalues and associated eigenvectors, ― The eigenvector associated with λ 1 was exacted and sorted in increasing order. 7
Exmaple 8 λ 1 =0.7216V1=V1= Result:
Methods Cluster detection ― Using a sliding window technique ― Significant difference m + s/2, m:median ; s:standard deviation ― If less than 10 nodes, discard 9
Methods Paraclique extraction [17.] ― The base maximum clique size is 3. 10
Methods Functional comparisons ― To analyze some resulting paracliques in yeast and human, respectively. ― Use Saccharomyces Genome Database GO Slim Viewer and Ingenuity Pathways Analysis software 11
Results and discussion A nearly-disconnected components. [10.] 12 Result: λ1λ1 The ability to find the nearly-disconnected pieces allows us to identify those nodes sharing a well connected,or dense, cluster.
Results and discussion Spectral properties & Algebraic connectivity ― the multiplicity of the zero eigenvalue is equal to the number of connected components in the graph. ― When analyzing only the spectrum of the largest component, the smallest nonzero eigenvalue (λ 1 ): algebraic connectivity 13
Results and discussion Spectral properties & Algebraic connectivity 14 Algebraic connectivity yeast human
Results and discussion Spectral clustering :potential threshold ― Resulting in a likely nearly-disconnected component
Results and discussion Comparison with other results ― Traditional methods 16
Results and discussion Comparison with other results ― Previous studies (1) [19.] ― Based on RMT approach to determine correlation threshold ― result = 0.77,corresponds approximately to
Results and discussion Comparison with other results ― Previous studies (2) [14.] ― We select g=3, to enumerate paraclique. 18
Results and discussion Functional comparisons : SGD & IPA ― yeast t=0.78 Three largest paracliques size 21, 17, of the 21 genes had unknown molecular function; t=0.55 Three largest paracliques size 93, 53, 37. Many more of these gene have unknown molecular function(40,13,17). 19 Largely the same categories appeared within the three largest paracliques in both groups.
Results and discussion Functional comparisons : SGD & IPA ― human t= st paraclique related to cellular organization, gene expression, genetic disorder, drug metabolism, and cell signaling;2 nd protein synthesis; these were related to reproductive systems development and disease, respectively. t=0.65The networks seem to be annotated with a larger range of functions.ex : 2 nd matched 13 networks ranging from cellular assembly and organization, genetic disorder, to inflammatory disease, and many others. 20
Conclusion Here presented a systematic threshold selection method that make use of spectral graph theory. The results in agreement with previous study. At higher threshold ― Fewer of these genes fail to be categorized based upon the gene ontology. ― Fewer networks were identified as being enriched in the paracliques, making interpretation of the results easier. 21
Reference [10.] Ding CHQ, He X and Zha H: A spectral method to separate disconnected and nearly-disconnected web graph components.Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining: 26–29 August 2001; San Francisco [14.] Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB,Brown PO, Botstein D and Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccaromyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 1998, 9 [17.] Chesler EJ and Langston MA: Combinatorial genetic regulatory network analysis tools for high throughput transcriptomic data. RECOMB Satellite Workshop on Systems Biology and Regulatory Genomics: 2–4 December 2005; San Diego