T Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015
From high-throughput data to network biology: gain in statistical power and biological relevance Stockholm Bioinformatics Centre Andrey Alexeyenko
PLoS Med (8):e124
Why Most Published Research Findings Are False “Positive facts”: the discoveries we are after, e.g. genomic associations, differentially expressed genes, relations “phenotype disease” etc. Statistical model: no positive facts, and an allowed rate of Type I error True negatives False positives Positive factsTrue positives Biological reality: negative facts are the vast majority, positive facts are yet to be discovered Negative facts
Network is just a graph! The fact that I can draw a network does not yet make it a biological reality!..
Conversion “data pieces confidence” in a Bayesian framework
A
Enrichment of functional groups Enrichment analysis in the networks turns to be more powerful than on gene lists
Enrichment of functional groups
Partial correlations
r PLC = 0.88 r PLC = 0.95 r PLC = 0.76
Benjamini-Hochberg correction
Quantitative modeling of multi-component system with mutually dependent elements
Why going “list network” is an advancement? Functional context “Anchoring”, i.e. interdependence Biological interpretability Statistical features Data integration Many of those can be applied to the lists as well, but mind the flexibility!
Ways to augment confidence Trivial: 1) increase power 2) decrease false prediction rate Data integration –Evaluation prior to integration! Consider biological context Remove spurious edges Generalize to a higher level of organization
Ways to evaluate confidence Supervised learning Balance comprehensiveness and complexity (s.c. information criteria) Benjamini-Hochberg Show it a biologist Go out to the real world and test
Ways to employ confidence Initialize network Add node and edge attributes to the network Filter network elements for higher relevance Build more complex models accounting for confidence