Identification of optimal parameter ranges in building and assessing correlation networks built from gene expression. Qianran Li Supervisor:Dr. Kathryn.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
Gene Set Enrichment Analysis (GSEA)
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Data Analysis for High-Throughput Sequencing
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Lecture 9 Measures and Metrics. Structural Metrics Degree distribution Average path length Centrality Degree, Eigenvector, Katz, Pagerank, Closeness,
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Identification of network motifs in lung disease Cecily Swinburne Mentor: Carol J. Bult Ph.D. Summer 2007.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Huidi Liu, M.D. & Ph.D Genomics Research Centre Harbin Medical University, China Reduced expression of SOX7 in ovarian cancer: a novel.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Multi-scale network biology model & the model library 多尺度网络生物学模型 -- 兼论模型库的建立与应用 Jianghui Xiong 熊江辉
Lecture 9 Measures and Metrics. Cocitation and Bibliographic coupling 2.
David Amar, Tom Hait, and Ron Shamir
Finding associated genes in large collections of microarrays
Emily Pachunka ● Spring 2017
Calculating the correlation coefficient
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Optimizing Biological Data Integration
Network analysis for AML data
Biological networks CS 5263 Bioinformatics.
Principles of Network Analysis
Gene expression.
A meta-analysis for gene expression profiling in hepatocellular carcinoma (HCC) with and without compliances. Sakshi (1), Costantini S(1,2), Colonna G(3)
Measures of Central Tendency
Gene-set analysis Danielle Posthuma & Christiaan de Leeuw
Functional Genomics Analysis Reveals a MYC Signature Associated with a Poor Clinical Prognosis in Liposarcomas  Dat Tran, Kundan Verma, Kristin Ward,
Ashwani Kumar and Tiratha Raj Singh*
Mrs. Wharton’s Science Class
Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing  Graham Heimberg, Rajat.
Volume 5, Issue 1, Pages (October 2013)
Rasoul Godini, Hossein Fallahi
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
Gene-expression changes associated with stunting.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Schedule for the Afternoon
Modelling Structure and Function in Complex Networks
Hepatic hepcidin expression is decreased in cirrhosis and HCC
Volume 3, Issue 1, Pages (July 2016)
Single Cell Regulatory Variation
Volume 26, Issue 7, Pages e4 (February 2019)
Identification of aging-related genes and affected biological processes. Identification of aging-related genes and affected biological processes. (A) Experimental.
Multiple Organ detection in CT Volumes - Week 3
Lab-Specific Gene Expression Signatures in Pluripotent Stem Cells
Gene expression profiles of T cells.
Fig. 2 Tissue-specific transcriptomic alterations in response to acute sleep loss in healthy humans. Tissue-specific transcriptomic alterations in response.
MYC expression is correlated with dasatinib sensitivity in cancer cell lines and in vivo. MYC expression is correlated with dasatinib sensitivity in cancer.
Volume 28, Issue 4, Pages e6 (July 2019)
Presentation transcript:

 Identification of optimal parameter ranges in building and assessing correlation networks built from gene expression. Qianran Li Supervisor:Dr. Kathryn Cooper Student Research and Creative Activity Fair March 2nd, 2018

Background: Gene Expression DNA is regulated by transcription factors Transcription Factor (Regulator) DNA

Background: Gene Expression Transcription Factor (Regulator) The Transcription Factor tells the genes when to make a protein Genes

Background: Gene Expression This process is called gene expression Transcription Factor (Regulator) Proteins

Problem Volumes of gene expression data available Goal of gene expression: Identify genes that cause disesase, aging…. Some methods are available (GSEA) But, sometimes these methods return no results or too many to study Hard to identify the functional differences of proteins among tissues

Background: Correlation Network A correlation network is is built from gene expression data, and is used for examining gene co-expression Image source: Newman, MEJ. (2002). Assortative Mixing in Networks. Phys. Rev. Let., 89(20):208701. Dempsey, K., Bonasera, S., Bastola, D., & Ali, H. (2011, January). A novel correlation networks approach for the identification of gene targets. In System Sciences (HICSS), 2011 44th Hawaii International Conference on (pp. 1-8). IEEE.

Motivation Correlation networks have structural characteristics Assortativity Gene connectedness Clustering coefficient Does a group form a cluster of edges? Image source: Newman, MEJ. (2002). Assortative Mixing in Networks. Phys. Rev. Let., 89(20):208701.

Research Methodology Get data from Gene Expression Omnibus (GEO). Build the network based on their Pearson correlation coefficient Develop structural analysis pipeline by using igraph package in R Identify possible structural measurement ranges

Data Collection Gene expression networks built from Mus musculus 5 experiments (experiment = series) 35 networks Series Networks Tissues Sample per network GSE38531 7 blood 5 GSE41127 8 T-cell GSE38754 Heart, lung, kidney GSE24637 liver, pancreas, muscle, Adipose GSE27567 4 breast tumor 10

Network Creation Pearson Correlation coefficient Filtered to only the highest of correlations From |0.7 to 1.0| Only correlations passing P-value < 0.0005 (Student’s T-test) Dempsey K, Thapa I, Bastola D, et al. Identifying modular function via edge annotation in gene correlation networks using Gene Ontology search[C]//Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on. IEEE, 2011: 255-261.

Results: Assortativity Coefficient Assortativity coefficient the Assortativity coefficient means the degree of co-expression between the hub genes in this network Hub Hub Hub Hub Assortative Disassortative

GSE38531 GSE41127 GSE38754 GSE24637 GSE27567

Results: Clustering coefficient A Cluster is a group of nodes that are tightly interconnected. The Clustering coefficient is the degree to nodes which tend to be a cluster.

GSE38531 GSE41127 GSE38754 GSE24637 GSE27567

Results: Range Identification All   Mean Median Min Max # Networks Nodes 37,468.6 45,096 37,299 45,101 35 Edges 6,101,447 2,765,449 257,877 74,542,976 Assortativity 0.58 0.59 -0.36 0.95 Clustering Coefficient 0.56 0.62 0.30 0.79

Results: Range Identification

Conclusion Built 35 networks from 5 experiments for “meta” analysis Clustering coefficient varies across experiments Significant variance in between-series clustering. No other measures vary consistently across experiments Clustering coefficient has significant variance between-series Next steps Add other parameter analysis such as degree distribution and hub node analysis. Control number of samples per networks. Add more data sets We need further testing, but if we result is correct, means this model is ready for model gene expression data. which means the cell is not randomly pick the functions.

Acknowledgement Kate Cooper GEO database Igraph package in R Bioinformatics lab members

Questions?