Sai Moturu. Introduction Current approaches to microarray data analysis –Analysis of experimental data followed by a posterior process where biological.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Linear Models for Microarray Data
Mining Association Rules from Microarray Gene Expression Data.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Putting genetic interactions in context through a global modular decomposition Jamal.
Rich Probabilistic Models for Gene Expression Eran Segal (Stanford) Ben Taskar (Stanford) Audrey Gasch (Berkeley) Nir Friedman (Hebrew University) Daphne.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Bioinformatics Spring Jianping Zhou Extraction of functional information from large-scale gene expression data.
Evaluation and optimization of clustering in gene expression data analysis A. Fazel Famili, Ganming Liu and Ziying Liu National Research Council of Canada.
AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Fast Algorithms for Association Rule Mining
Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,
Research Project Mining Negative Rules in Large Databases using GRD.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Department of Computer Science and Engineering (CSE), BUET
ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.
Gene expression and the transcriptome II. SAGE SAGE = Serial Analysis of Gene Expression Based on serial sequencing of 15-bp tags that are unique to each.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Reconstruction of regulatory modules based on heterogeneous data sources Karen Lemmens PhD Defence September 29th 2008.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Analysis of Microarray Data 1.Scan the images 2.Quantify intensity of spots 3.Normalization 4.Analysis of data 5.Identification of genes of interest 6.Validation.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Reconstructing Gene Networks Presented by Andrew Darling Based on article  “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised.
Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
© 2005 by Genomatix Software GmbH Genomatix Microarray Evaluation for Gene Regulation Analysis Dr. Martin Seifert Genomatix Software GmbH Landsberger Strasse.
Inferring Function From Known Genes Naomi Altman Nov. 06.
A COMPREHENSIVE GENE REGULATORY NETWORK FOR THE DIAUXIC SHIFT IN SACCHAROMYCES CEREVISIAE GEISTLINGER, L., CSABA, G., DIRMEIER, S., KÜFFNER, R., AND ZIMMER,
Experimental summary Norwich DMSO treated versus Control Yeast Randy Bielski Hans Hulsebos Joe Smith Melissa Latshaw.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Using Frequent Pattern Mining to Find Co-mutated Genes in Breast Cancer Zachary Stanfield 4/7/2015.
Association Rule Mining
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Example of a Functional Genomics Study Molecular Ecology ,
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Module 5: Future 1 Canadian Bioinformatics Workshops
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
ABSTRACT First genomic scale data about gene expression have recently started to become available in addition to complete genome sequence data and annotations.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules.
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
Introduction to Oncomine Xiayu Stacy Huang. Oncomine is a cancer-specific microarray database and has a web-based data-mining platform aimed at facilitating.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Experimental summary Norwich DMSO treated versus Control Yeast
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Presentation transcript:

Sai Moturu

Introduction Current approaches to microarray data analysis –Analysis of experimental data followed by a posterior process where biological information is incorporated to make inferences Integrative analysis technique in this paper –Integrate gene annotation with expression data to discover intrinsic associations among both data sources based on co-occurrence patterns

Methods and Data –Association Rules Discovery –Gene expression data –Gene annotation: Gene ontology categories, metabolic pathways and transcriptional regulators –Applied to two previously studied experiments

Association Rules Discovery –Antecedent -> Consequent X -> Y –Measures of Quality Support: P(XυY) Confidence: P(Y|X) = P(XυY)/P(Y) Improvement: Confidence/Consequent = P(XυY)/(P(X)*P(Y))

Association Rules Discovery –Itemsets Genes and the set of experiments in which gene is over or underexpressed Gene characteristics –Constraint Antecedent needs to be gene annotation –Expression Thresholds Genes with log expression values >1 are overexpressed and <-1 are underexpressed (two fold)

Mining Association Rules –The association rules that we are interested in have low support values and high confidence values –A variant of the apriori algorithm is used that has helped previously with mining low support-high confidence biologically significant patterns

Filtering –Major drawback with association rules is the number of rules generated is huge –Also there is redundancy –This is taken care of with two filters Redundant filter Single antecedent filter

Diauxic shift dataset –Gene expression accompanying the metabolic shift from fermentation to respiration that occurs when fermenting yeast cells –Expression levels recorded at 7 time points –External information Metabolic pathways Transcriptional regulators

Results –Association rules among metabolic pathways and expression patterns 1126 out of over 6000 genes were annotated with at least one pathway Association rules with minimum support of 5, minimum confidence of 40% and minimum improvement of 1 Redundant and single antecedent filters applied 21 association rules

Results –Association rules among transcriptional regulators and expression patterns 3490 genes were annotated with at least one regulator Association rules with minimum support of 5, minimum confidence of 80% and minimum improvement of 1 Redundant filter applied 28 association rules

Results –Association rules among transcriptional regulators, metabolic pathways and expression patterns 3882 genes Association rules with minimum support of 5, minimum confidence of 80% and minimum improvement of 1 Redundant filter applied 37 association rules

Results

Serum stimulation dataset –Gene expression program of human fibroblast after serum exposure –External information Gene ontology terms

Results –Association rules among biological process annotation and expression patterns 4092 genes of over 8000 Support of 4, min confidence of 10% and min improvement of 1 Single antecedent and redundant filters applied 12 associations

Results –Association rules among terms from all GO categories 4630 genes of over 8000 Support of 4, min confidence of 10% and min improvement of 1 Redundant filter applied 31 associations

Results

Conclusions –Some of the biological implications matched the ones found experimentally –The others could be explored further –Integrative data analysis is very useful for meaningful discoveries using gene expression data