Storytelling and Clustering for Cellular Signaling Pathways M. Shahriar Hossain, Monika Akbar, Nicholas F. Polys Department of Computer Science, Virginia.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Graph Mining Laks V.S. Lakshmanan
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Mining for Tree-Query Associations in a Graph Jan Van den Bussche Hasselt University, Belgium joint work with Bart Goethals (U Antwerp, Belgium) and Eveline.
gSpan: Graph-based substructure pattern mining
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Mining Graphs.
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
An Interactive Visualization of Super-peer P2P Networks Peiqun (Anthony) Yu.
Frequent Item Based Clustering M.Sc Student:Homayoun Afshar Supervisor:Martin Ester.
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
Graph-Based Data Mining Diane J. Cook University of Texas at Arlington
Core Text Mining Operations 2007 년 02 월 06 일 부산대학교 인공지능연구실 한기덕 Text : The Text Mining Handbook pp.19~41.
Mining Tree-Query Associations in a Graph Bart Goethals University of Antwerp, Belgium Eveline Hoekx Jan Van den Bussche Hasselt University, Belgium.
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
2015/7/21 Incremental Clustering for Mining in a Data Warehousing Environment Martin Ester Hans-Peter Kriegel J.Sander Michael Wimmer Xiaowei Xu Proceedings.
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
A Short Introduction to Sequential Data Mining
What Is Sequential Pattern Mining?
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
K nearest neighbor classification Presented by Vipin Kumar University of Minnesota Based on discussion in "Intro to Data Mining" by Tan,
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
A Discrepancy Detector James Abello, CCICADA-DIMACS FACULTY ( Student: Nishchal Devanur CS Dept Rutgers Goal To detect the most influential.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
A Discrepancy Detector James Abello, CCICADA-DIMACS FACULTY ( Student: Nishchal Devanur CS Dept Rutgers Goal To detect the most influential.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu.
University of Nevada, Reno Resolving Anonymous Routers Hakan KARDES CS 790g Complex Networks.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Improving Parallelism in Structural Data Mining Min Cai, Istvan Jonyer, Marcin Paprzycki Computer Science Department, Oklahoma State University, Stillwater,
Fuzzy Set Approach for Improving Web Log Mining Sajitha Naduvil-Vadukootu Csc 8810 : Computational Intelligence Instructor: Dr. Yanqing Zhang Dec 4, 2006.
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
DATA MINING Spatial Clustering
Jian Pei and Runying Mao (Simon Fraser University)
Mining in Graphs and Complex Structures
Byung Joon Park, Sung Hee Kim
Mining Frequent Subgraphs
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
On Efficient Graph Substructure Selection
Association Rule Mining
Graph Database Mining and Its Applications
Visualization of Content Information in Networks using GlyphNet
Presentation transcript:

Storytelling and Clustering for Cellular Signaling Pathways M. Shahriar Hossain, Monika Akbar, Nicholas F. Polys Department of Computer Science, Virginia Tech, Blacksburg, VA

2 Objective STKE Dataset Cell interactions through chemical signals Discover relationships between the pathways Graph structure Subgraph discovery problem Pathways relationships Clustering Storytelling

Myocyte Adrenergic Pathway ( CMP_9043 )

4 Dataset properties

5 Design Pipeline Preprocessor Frequent Subgraph Discovery Pathway Graphs Frequent Subgraphs Clustering STKE Dataset NNStorytelling

6 Subsequent Candidate Generation Apriori – incremental approach [17] FSG [2] Generate a (k+1)-edge candidate subgraph by combining two k-edge subgraphs where these two k-edge subgraphs have a common core subgraph of (k-1)-edges. Cost of comparison between subgraphs (and core subgraphs) is reduced using hash-code of each subgraph object. m n o l p m n o p q l m n o p q

7 Subsequent Candidate Generation Instance: Number of 5-edge subgraphs: 21 Core subgraph comparisons for s 1 : 20 m n o l p q m n o p l q m n o p m n o l p m o p r n m o l p r n m n o l p m n o l p s m n o p s m n o l p m n o t z Not generated …………………………………………. ……………………………… ………………………………………….

Total Unique Nodes:1205 Total Relations:1376 Master Pathway Graph (MPG)

9 SEG - Subgraph Extension Generation Neighborhood Extension Neighborhood list : {q, r, s} Comparison is not required. Subgraph is extended from physical evidence m n o l p n m o l p s m n o l p q m n o l p r l m n o q p r s

10 Design Pipeline Preprocessor Frequent Subgraph Discovery Pathway Graphs Frequent Subgraphs Clustering STKE Dataset NNStorytelling

11 Subgraph Discovery k# of Subgraphs generated Time (sec.) 11,376Existing 25, , , , min_sup=2% What so novel about pruning edges?

12 ‘Importance Factor’ of a subgraph: sfipf Subgraph frequency, Inverse pathway frequency, For i-th subgraph j-th pathway:

13 Dataset Properties (sfipf) Number of edges in MPG=1376 Total pathways=50

14 Subgraph Discovery

15 Subgraph Discovery

16 Subgraph Discovery kNumber of Subgraphs Time Saved (%) Attempts Saved(%) Overall attempts saved = 89.52% Overall time saved = 99.39%

17 Subgraph Discovery

18 Clustering Hierarchical Agglomerative Clustering (HAC) k-means Unsupervised measure of clusters’ validity Average Silhouette Coefficient (ASC) [19]

19 Clustering

20 Clustering

21 Design Pipeline Preprocessor Frequent Subgraph Discovery Pathway Graphs Frequent Subgraphs Clustering STKE Dataset NNStorytelling

22 Pathway Relations (StoryTelling) Bidirectional Search Cover tree for NN S p1p1 p2p2 p3p3 T p7p7 p8p8 p9p9

Day-to-day life example From Roman Holiday From Terminator 3 From:Roman Holiday To:Terminator 3

24 Examples in STKE

25 Pathway Relations (StoryTelling)

26 Pathway Relations (StoryTelling)

27 Pathway Relations (StoryTelling)

28 Future Directions Compare our SEG graph methods with text based clustering and storytelling Examine costs and benefits for combining text and graph mining techniques

29 References [1] Science Signaling, The signal Transduction Knowledge Environment (STKE), "The Database of Cell Signaling", [2] Kuramochi, M. and Karypis, G., "An efficient algorithm for discovering frequent subgraphs", IEEE Transactions on KDE, Vol. 16(9), September 2004, pp [3] Breslin, T., Krogh, M., Peterson, C., and Troein, C., "Signal transduction pathway profiling of individual tumor samples", BMC Bioinformatics, June 29, [4] Kumar, D., Ramakrishnan, N., Helm, R. F., and Potts, M., "Algorithms for Storytelling", IEEE Transactions on KDE, Vol. 20(6), June 2008, pp [5] Ratprasartporn, N., Cakmak, A., and Ozsoyoglu, G., "On Data and Visualization Models for Signaling Pathways", 18th SSDBM, 2006, pp [6] Xu, X., and Yu, Y., "Modeling and Verifying WNT Signaling Pathway", 3rd Intl. Conf. on ICNC. 2007, Vol. 2, pp [7] Schreiber, F., "Comparison of metabolic pathways using constraint graph drawing", 1st Asia- Pacific bioinformatics Conf. on Bioinfo., Australia, Vol. 19, 2003, pp [8] Abello, J., van Ham, F., and Krishnan, N., "ASKGraphView: A Large Scale Graph Visualization System", IEEE Transactions on Visualization and Computer Graphics, Vol. 12(5), 2006, pp [9] Miyake, S., Tohsato, A., Takenaka, Y., and Matsuda, H. "A clustering method for comparative analysis between genomes and pathways", 8th Intl. Conf. on Database Systems for Advanced Applications, March 2003 pp

30 References [10] Yan, X., and Han, J. "gSpan: graph-based substructure pattern mining", IEEE ICDM, 2002, pp [11] Moti, C., and Ehud, G. "Diagonally Subgraphs Pattern Mining", 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2004, pp [12] Ketkar, N., Holder, L., Cook, D., Shah, R., and Coble, J. "Subdue: Compression-based Frequent Pattern Discovery in Graph Data", ACM KDD Workshop on Open-Source Data Mining, August 2005, pp [13] Zhang, T., Ramakrishnan, R., and Livny, M., "BIRCH: An Efficient Data Clustering Method for Very Large Databases", ACM SIGMOD Intl. Conf. on Management of Data, Canada, 1996, pp [14] Wagsta, K., Cardie, C., Rogers, S., and Schroedl, S., "Constrained K-means Clustering with Background Knowledge", ICML 2001, pp [15] Lin, F., and Hsueh, C. M., "Knowledge map creation and maintenance for virtual communities of practice", Intl. Journal of Information Processing and Management, ACM, Vol. 42(2), 2006, pp [16] Beygelzimer, A., Kakade, S., Langford, J., "Cover trees for nearest neighbor", ICML 2006, pp [17] Agrawal, R., and Srikant, R. "Fast Algorithms for Mining Association Rules", Intl. Conf. on Very Large Data Bases, Santiago, Chile, September 1994, pp [18] Agrawal, R., Mehta, M., Shafer, J., Srikant, R., Arning, A. and Bollinger, T. "The Quest Data Mining System", KDD'96, USA, 1996, pp [19] Tan, P. N., Steinbachm, M., and Kumar, V., "Introduction to Data Mining", Addison-Wesley, ISBN: , April 2005, pp [20]

31 Thank You