RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU.

Slides:



Advertisements
Similar presentations
GoMiner: (Zeeberg et al., Genome Biology, March 2003) For Tour of GoMiner: Advance using forward arrow.
Advertisements

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
What makes an image memorable?
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Mutual Information Mathematical Biology Seminar
Using visualization and network analysis to assist function analysis of microarray data Hepatitis C Virus (HCV) Micorarray Data Function Analysis Current.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Introduction to Bioinformatics - Tutorial no. 12
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Introduction to Networked Graphics Part 4 of 5: Bandwidth Management & Scalability.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
Methodology Control (no treatment) Estrogen (5 uM) 4-nonylphenol (5 uM) Cultured Cells, Isolated RNA, RTed to cDNA Data analyzed by Spotfire software RT-PCR.
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,
Gene expression analysis
Presenting Results Laura Biggins v1.0 1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Surveying with the Global Positioning System Phase Observable.
GUI GoMiner and High-Throughput GoMiner Analysis of Alternative Splice Variants Barry Zeeberg, Ari Kahn, Michael Ryan, David Kane, Curtis Jamison, Hongfang.
More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Computing Co-Expression Relationships Wen-Dar Lin.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
A B Supporting Information Figure S1: Distribution of the density of expression intensities for the complete microarray dataset (A) and after removal of.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Methodology U937 Human Immune Cells Control (No treatment) (n=4) Estrogen (5 uM) (n=4) 4-nonylphenol (5 uM) (n=4) Cultured Cells, RNA Isolation, RT (to.
Figure S1 Correlation graph of manual and automatic counting results for day 5.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Cluster Analysis, an Overview Laurie Heyer. Why Cluster? Data reduction – Analyze representative data points, not the whole dataset Hypothesis generation.
Semantic (web) activity at Elsevier Marc Krellenstein VP, Search and Discovery Elsevier October 27, 2004
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Face Detection EE368 Final Project Group 14 Ping Hsin Lee
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Data Mining K-means Algorithm
CellExpress Examples A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
Functional Genomics Analysis Reveals a MYC Signature Associated with a Poor Clinical Prognosis in Liposarcomas  Dat Tran, Kundan Verma, Kristin Ward,
Image from Gene-Chips (Micorrrays) Statistics for microarray analysis (SMA)
Microarray Clustering
Christos Sotiriou, Chand Khanna, Amir A
Subspace Clustering for Microarray Data Analysis:
Functional analysis of duplicate pair CIK1–VIK1 (A) Genetic interaction profile similarity. Functional analysis of duplicate pair CIK1–VIK1 (A) Genetic.
Volume 2, Issue 4, Pages (April 2008)
(A) Hierarchical clustering was performed to identify groups of patients with similar RNASeq expression of 20 genes associated with reduced survivability.
Working with RNA-Seq Data
One SNP at a Time: Moving beyond GWAS in Psoriasis
Optimal gene expression analysis by microarrays
Volume 17, Issue 6, Pages (November 2016)
Predicting drug sensitivity and resistance
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Exploring and Presenting Results
Gene Expression Analysis
Expression profile of long noncoding rnas in osteoarthritis patients
Identification of aging-related genes and affected biological processes. Identification of aging-related genes and affected biological processes. (A) Experimental.
Comparison ofMyc-induced zebrafish liver tumors with different stages of human HCC and seven mouse HCC models. Comparison ofMyc-induced zebrafish liver.
Heat map of genes for which CR significantly altered expression versus AL. Cluster analysis of genes significantly changed by the CR intervention compared.
Pancreatic adenocarcinoma, chronic pancreatitis, and normal pancreas samples can be distinguished on the basis of gene expression profiling. Pancreatic.
Global analysis of the chemical–genetic interaction map.
Identification and characterization of activated integrin pathway module by signal transduction representation (A) and unsupervised hierarchical clustering.
Highly metastatic PDAC cells have a unique gene signature, which is not preserved in metastases but predicts poor patient outcome. Highly metastatic PDAC.
Characteristic gene expression patterns distinguish LCH cells from other immune cells present in LCH lesions. Characteristic gene expression patterns distinguish.
Presentation transcript:

RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU

Gene Ontology (GO) AmiGO browser Hierarchical organization of categories and mapped genes

High-Throughput GoMiner (HTGM)

Typical HTGM result clustered image map (CIM)

Redundancy problem Because of the hierarchical nature of GO structure, parent-child categories may contain partially redundant gene mappings This can “inflate” the number of categories in the CIM Thus obscure the core information content in the CIM The redundancy itself can be studied to look at fine detail nuanced associations of category clusters

RedundancyMiner (RM) is an attempt to solve that problem Remove the redundancy from the CIM –Redundancy cause the CIM to be inflated by e.g. 3-fold Place the redundancy into a META CIM –Study the redundancy as a nuanced themes of association of groups of GO categories

RM paradigm Similarity metric is probabilistic value based on the number of genes mapped in common to two GO categories Groups in the META CIM follow a “complete linkage” criterion for a selected threshold of p value

RM overcomes two problems of traditional hierarchical clustering All objects are put into one cluster or another, even if the object truly is an outlier Each object can appear in only one cluster, even though it may be related to several clusters

CIM after RM

META CIM

Additional example gene expression in NCI-60 cell lines NCI-60 is set of 60 well-studied cancer cell lines Composed of around 5 or 6 each of around 8 or 9 different cancer types

Problem Full CIM of 60 cell lines x 20,000 gene expression values is too dense to allow meaningful viewing Solution is to select sub-portion of CIM based on RM analysis

NCI-60 META CIM based on correlation threshold = 0.20

Sub-CIM of highest correlating genes from group 33 Gene expression values are adjusted z-scores Red = positive z score Green = negative z score

Sub-CIM of highest correlating genes from group 32

Conclusions RM can remove redundancy from the primary CIM RM can display the nuanced themes of redundancy structure in the META CIM The META CIM can be used as the basis of further investigation