The Painter’s Feature Selection for Gene Expression Data Lyon, 23-26 August 2007 Daniele Apiletti, Elena Baralis, Giulia Bruno, Alessandro Fiori.

Slides:



Advertisements
Similar presentations
Multiple-choice question
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Yue Han and Lei Yu Binghamton University.
Minimum Redundancy and Maximum Relevance Feature Selection
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Mutual Information Mathematical Biology Seminar
Reduced Support Vector Machine
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection Getting Ready for Data Collection The Data Collection Process The Data Collection.
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Feature Selection Lecture 5
Central Tendency & Variability Dec. 7. Central Tendency Summarizing the characteristics of data Provide common reference point for comparing two groups.
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
Chapter 12: Analysis of Variance
An Evaluation of Gene Selection Methods for Multi-class Microarray Data Classification by Carlotta Domeniconi and Hong Chai.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis Presented by: Renikko Alleyne.
Which Test Do I Use? Statistics for Two Group Experiments The Chi Square Test The t Test Analyzing Multiple Groups and Factorial Experiments Analysis of.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Exploiting Domain Structure for Named Entity Recognition Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign.
Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Prostate Cancer CAD Michael Feldman, MD, PhD Assistant Professor Pathology University Pennsylvania.
Gene-Markers Representation for Microarray Data Integration Boston, October 2007 Elena Baralis, Elisa Ficarra, Alessandro Fiori, Enrico Macii Department.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Consensus Group Stable Feature Selection
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
NTU & MSRA Ming-Feng Tsai
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
A Discriminatively Trained, Multiscale, Deformable Part Model Yeong-Jun Cho Computer Vision and Pattern Recognition,2008.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,
Statistics Vocabulary. 1. STATISTICS Definition The study of collecting, organizing, and interpreting data Example Statistics are used to determine car.
Quiz.
Correlational Studies
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
Active learning The learning algorithm must have some control over the data from which it learns It must be able to query an oracle, requesting for labels.
Statistical Process Control
Pool-based learning via Weighted Information Gain Measurements
Statistics Standard: S-ID
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

The Painter’s Feature Selection for Gene Expression Data Lyon, August 2007 Daniele Apiletti, Elena Baralis, Giulia Bruno, Alessandro Fiori

2 Introduction Feature selection identifies a minimum set of relevant features is applied before a learning algorithm reduces computation costs increases the speed up of learning process increases the model interpretability improves the classification accuracy performance

3 Framework Painter Microarray labeled data Feature Selection Gene Rank Model Building labels Classification Model Microarray unlabeled data Filter

4 Feature selection From Painter’s Algorithm in Computer Graphics to paint shadows Overlap score to measure: common expression intervals in different classes gaps between expression intervals among classes Bonus to genes with largest gaps few overlapping classes gene expression value class1 class2 class3 class gene expression value class gap w1w1 w2w2 w4w4 w3w3 w5w5 w tot

5 Example overlapscore =(1*6)+(1*4)/11=10/11 0 = no overlapping overlapscore= (1*1)+(2*4)+(1*1)/6=10/6 1 overlapping

6 Filter Reason: noisy data outliers presence gene expression value

7 Weighted interval gene expression value

8 Experimental design 10-cross validation SVM kernel: Crammer and Singer (CS) degree: 1 cost: 100 Method compared: analysis of variance (ANOVA) signal-to-noise ratio in OVO (OVO) signal-to-noise ratio in OVR (OVR) ratio of variables between categories to within categories sum of squares (BW) DatasetsPatientsGenesClasses Tumors Brain Brain

9 Experimental results (1) 200 genes selected PainterOVOBWANOVAOVR Brain186.7 ± ± ± ± ±7.0 Brain270.7 ± ± ± ±22.461,8 ±15.7 Tumors971.0 ± ± ± ± ±20.2 Average76.1 ± ± ± ± ±14.3

10 Experimental results (2) trend on Brain2 dataset

11 Conclusion New method: multi-class approach based on new criterion of gene relevance self adaptation to the datasets distribution density based filter smoothing outliers effect Results robustness of the algorithm can be applied to any dataset with continuous valued features Future work: investigation of features groups with the same distiminating power comparison with more feature selection techniques on other datasets

12 Thanks for the attention!