Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Predicting Genetic Regulatory Response Using Classification Us v. Them (“Them” being Manuel Middendorf, Anshul Kundaje, Chris Wiggins, Yoav Freund, and.
Multiple Criteria Decision Analysis with Game-theoretic Rough Sets Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Minimum Redundancy and Maximum Relevance Feature Selection
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Concept of Measurement
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Chapter 11 Multiple Regression.
7-2 Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Software Process and Product Metrics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Determining Sample Size
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Sections 6-1 and 6-2 Overview Estimating a Population Proportion.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
CellFateScout step- by-step tutorial for a case study Version 0.94.
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
1 Effective Feature Selection Framework for Cluster Analysis of Microarray Data Gouchol Pok Computer Science Dept. Yanbian University China Keun Ho Ryu.
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
CEN st Lecture CEN 4021 Software Engineering II Instructor: Masoud Sadjadi Monitoring (POMA)
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
26134 Business Statistics Tutorial 11: Hypothesis Testing Introduction: Key concepts in this tutorial are listed below 1. Difference.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
An Artificial Intelligence Approach to Precision Oncology
Chapter 6 Classification and Prediction
Classification and Prediction
Correlation and Regression
Chapter 9 Hypothesis Testing.
Loyola Marymount University
Classification and Prediction
Lecture Slides Elementary Statistics Twelfth Edition
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Presentation transcript:

Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan

 Gene expression profiling provides tremendous info to help unravel complexity cancer.  Selection of the most informative genes from huge noise for cancer classification has taken importance.  Wang and Gotoh [WnG] -> a novel Variable Precision Rough Sets-rooted robust soft computing method [VPRS] (by introducing an α depended degree).  It is a simple, efficient and straightforward method for accurate cancer classification using single genes or gene pairs and subsequently inferred the direct gene regulatory network.

U -> universe of discourse. R -> equivalence relation. The degree of dependency of a set of attributes Q on another set of attributes P is denoted by γ P (Q) is Where = size of the union of the lower approximation of each equivalence class in U/R(Q) on P in U, |U| = size of U (the set of samples). Q = decision attributes D, P = subset of condition attributes,  γ P (D) is depended degree = degree to which P can discriminate between the distinct classes of D m= classification power.  Greater γ P (D) => stronger classification ability of P (basis for selecting informative genes) Canonical depended degree -> excessively rigid definition => difficult to detect the discriminative features, high computational expense, uncertainty of predictive performance and non-uniqueness.

Method proposed: 1. To 1. To filter redundant info and retain the critical information (i.e. signal). 2. Followed by making decision rules based on core information and classifying the whole dataset. 3. To extract hidden meaningful rules, we sometimes need to lose some rigid definitions -> flexible α depended degree under soft computing consideration.  This allows some single genes [or gene pairs] to have strong class discriminatory power.  Interestingly, this also enables us to infer the networks and modules!  All the gene selection, classification and network construction processes in this method correlate well with biologically meaningful decision rules, such as:  tumor vs. normal cells,  up- vs. down-regulation, and  positive vs. negative regulation.

Solution: WnG introduced α depended degree, a generalization form of the depended degree sets in their VPRS model The α depended degree, given P and D is: where |*| => size of set * U/R() => set of equivalence classes induced by the equivalence relation R().  For the selection of high class-discrimination genes, lower limit of α = 0.7

Decision Rule: One decision rule was: “A ⇒ B” meaning “if A, then B”, A -> condition attributes and B -> decision attributes. The confidence of a decision rule A ⇒ B is defined as follows: where support (A) -> proportion of samples satisfying A and support (A ∧ B) -> that satisfying A and B simultaneously. T Confidence indicates reliability of the rule. For each determined α value, only the genes with γ P (D,α) = 1 were selected to build decision rules. Sufficient reliability was ensured by setting a high threshold for α.

Dataset SAGE breast cancer dataset having ~2.7 million tags and 27 samples. Each described as lymph node [LN(+)] and [LN(-)] primary breast tumors.Results

Results…  WnG have identified 7 highly discriminative (hub) genes.  All identified genes have high classification accuracy (under α = 0.8)  These seven hub genes are very interesting and informative for their biological relevance.  Example: It is well known that the role of the ATF2/AP1 complex and its network is at the hub of tumorigenesis.  This has been reflected by a high classification accuracy of 88.89%.

Inference of the Gene Regulatory Network 1. It is expected that a few highly class-discriminative hub genes could greatly enhance the authenticity and confidence of computed gene interaction networks. 2. WnG investigated the gene regulatory network by employing the following: o 1 gene [instead of a class] is used as the decision attribute. o If “GENEI” is substituted for “Class label” in a decision table, GENE-I is regarded as the decision attribute with two distinct values: up-regulation (UR) and down-regulation (DR), and a new derivative table can be obtained. o They implement the discretization of this derivative table to obtain another newly derived table. o Decision rule: if GENE-I is DR, then Gene-II is DR; if Gene-I is UR, then Gene-II is UR. o They are not necessarily true in reverse. Therefore, a directed regulatory relation of GENE-I to GENE-II, a positive one, is established.

Modularity of networks: 1. They use the Cytoscape plugin MCODE19 to analyze the network constructed. 2. Detected two significant modules, one forms a feed- forward loop. 3. They conclude that the co- regulation of multiple activators could be at least partly responsible for the occurrence of tumors. Some more observations: 1. Colon cancer dataset: they identified 18 discriminative hub genes for cancer of these (e.g. DES and ACTA2) belong to DR genes in a tumor, while 8 other genes (e.g. IL8, HSPD1, SRPK1) belong to UR genes in a tumor. 3. The UR genes are regulated by more genes than DR ones, while the DR genes regulate more genes than UR ones. 4. Tumor suppressors inhibit tumor activators and activate as many other tumor suppressors as possible. Whereas, tumor activators activate other tumor activators and inhibit as few tumor suppressors as possible. This method is a new option for cancer classification and direct gene regulatory network inference.

 User-friendly,  Simple  Biologically interpretable  Cost-effective in a clinical setting with single genes or gene pairs.  Relatively easy to understand and follow  Availability of programming codes with either open access or GNU general public license (GPL).