Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Identification of optimal parameter ranges in building and assessing correlation networks built from gene expression. Qianran Li Supervisor:Dr. Kathryn.

Similar presentations


Presentation on theme: " Identification of optimal parameter ranges in building and assessing correlation networks built from gene expression. Qianran Li Supervisor:Dr. Kathryn."— Presentation transcript:

1  Identification of optimal parameter ranges in building and assessing correlation networks built from gene expression. Qianran Li Supervisor:Dr. Kathryn Cooper Student Research and Creative Activity Fair March 2nd, 2018

2 Background: Gene Expression
DNA is regulated by transcription factors Transcription Factor (Regulator) DNA

3 Background: Gene Expression
Transcription Factor (Regulator) The Transcription Factor tells the genes when to make a protein Genes

4 Background: Gene Expression
This process is called gene expression Transcription Factor (Regulator) Proteins

5 Problem Volumes of gene expression data available
Goal of gene expression: Identify genes that cause disesase, aging…. Some methods are available (GSEA) But, sometimes these methods return no results or too many to study Hard to identify the functional differences of proteins among tissues

6 Background: Correlation Network
A correlation network is is built from gene expression data, and is used for examining gene co-expression Image source: Newman, MEJ. (2002). Assortative Mixing in Networks. Phys. Rev. Let., 89(20): Dempsey, K., Bonasera, S., Bastola, D., & Ali, H. (2011, January). A novel correlation networks approach for the identification of gene targets. In System Sciences (HICSS), th Hawaii International Conference on (pp. 1-8). IEEE.

7 Motivation Correlation networks have structural characteristics
Assortativity Gene connectedness Clustering coefficient Does a group form a cluster of edges? Image source: Newman, MEJ. (2002). Assortative Mixing in Networks. Phys. Rev. Let., 89(20):

8 Research Methodology Get data from Gene Expression Omnibus (GEO). Build the network based on their Pearson correlation coefficient Develop structural analysis pipeline by using igraph package in R Identify possible structural measurement ranges

9 Data Collection Gene expression networks built from Mus musculus
5 experiments (experiment = series) 35 networks Series Networks Tissues Sample per network GSE38531 7 blood 5 GSE41127 8 T-cell GSE38754 Heart, lung, kidney GSE24637 liver, pancreas, muscle, Adipose GSE27567 4 breast tumor 10

10 Network Creation Pearson Correlation coefficient
Filtered to only the highest of correlations From |0.7 to 1.0| Only correlations passing P-value < (Student’s T-test) Dempsey K, Thapa I, Bastola D, et al. Identifying modular function via edge annotation in gene correlation networks using Gene Ontology search[C]//Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on. IEEE, 2011:

11 Results: Assortativity Coefficient
Assortativity coefficient the Assortativity coefficient means the degree of co-expression between the hub genes in this network Hub Hub Hub Hub Assortative Disassortative

12 GSE38531 GSE41127 GSE38754 GSE24637 GSE27567

13 Results: Clustering coefficient
A Cluster is a group of nodes that are tightly interconnected. The Clustering coefficient is the degree to nodes which tend to be a cluster.

14 GSE38531 GSE41127 GSE38754 GSE24637 GSE27567

15 Results: Range Identification
All Mean Median Min Max # Networks Nodes 37,468.6 45,096 37,299 45,101 35 Edges 6,101,447 2,765,449 257,877 74,542,976 Assortativity 0.58 0.59 -0.36 0.95 Clustering Coefficient 0.56 0.62 0.30 0.79

16 Results: Range Identification

17 Conclusion Built 35 networks from 5 experiments for “meta” analysis
Clustering coefficient varies across experiments Significant variance in between-series clustering. No other measures vary consistently across experiments Clustering coefficient has significant variance between-series Next steps Add other parameter analysis such as degree distribution and hub node analysis. Control number of samples per networks. Add more data sets We need further testing, but if we result is correct, means this model is ready for model gene expression data. which means the cell is not randomly pick the functions.

18 Acknowledgement Kate Cooper GEO database Igraph package in R
Bioinformatics lab members

19 Questions?


Download ppt " Identification of optimal parameter ranges in building and assessing correlation networks built from gene expression. Qianran Li Supervisor:Dr. Kathryn."

Similar presentations


Ads by Google