Making the Most of Small Sample High Dimensional Micro-Array Data Allan Tucker, Veronica Vinciotti, Xiaohui Liu; Brunel University Paul Kellam; Windeyer.

Slides:



Advertisements
Similar presentations
Regulation of Consumer Tests in California AAAS Meeting June 1-2, 2009 Beatrice OKeefe Acting Chief, Laboratory Field Services California Department of.
Advertisements

Random Forest Predrag Radenković 3237/10
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
Mathematical Analysis of Robustness Sensitivity analysis allows the linking of robustness to network structure. However, it yields only local properties.
Minimum Redundancy and Maximum Relevance Feature Selection
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
1 A Framework for Modelling Short, High-Dimensional Multivariate Time Series: Preliminary Results in Virus Gene Expression Data Analysis Paul Kellam 1,
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
Mapping Between Taxonomies Elena Eneva 30 Oct 2001 Advanced IR Seminar.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Extending Evolutionary Programming to the Learning of Dynamic Bayesian Networks Allan Tucker Xiaohui Liu Birkbeck College University of London.
Classification with reject option in gene expression data Blaise Hanczar and Edward R Dougherty BIOINFORMATICS Vol. 24 no , pages
Learning Dynamic Bayesian Networks with Changing Dependencies Allan Tucker Xiaohui Liu IDA 2003.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
Who am I and what am I doing here? Allan Tucker A brief introduction to my research
Introduction to Bioinformatics - Tutorial no. 12
Planning operation start times for the manufacture of capital products with uncertain processing times and resource constraints D.P. Song, Dr. C.Hicks.
Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
Bayesian Classification and Forecasting of Visual Field Deterioration Allan Tucker, Xiaohui Liu; Brunel University David Garway-Heath; Moorfield’s Eye.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Genetic Algorithm What is a genetic algorithm? “Genetic Algorithms are defined as global optimization procedures that use an analogy of genetic evolution.
Robust Bayesian Classifier Presented by Chandrasekhar Jakkampudi.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
ENN: Extended Nearest Neighbor Method for Pattern Recognition
Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens
Document Categorization Problem: given –a collection of documents, and –a taxonomy of subject areas Classification: Determine the subject area(s) most.
Kaihua Zhang Lei Zhang (PolyU, Hong Kong) Ming-Hsuan Yang (UC Merced, California, U.S.A. ) Real-Time Compressive Tracking.
The Broad Institute of MIT and Harvard Classification / Prediction.
Combining heterogeneous data to reverse engineer regulatory networks Allan Tucker School of Information Systems Computing and Mathematics, Brunel University,
1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Data analysis tools Subrata Mitra and Jason Rahman.
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
An Automated Classification Algorithm for Multi-wavelength Data Yanxia Zhang, Ali Luo,Yongheng Zhao National Astronomical Observatories, China ,
An Effective Hybridized Classifier for Breast Cancer Diagnosis DISHANT MITTAL, DEV GAURAV & SANJIBAN SEKHAR ROY VIT University, India.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Hybrid Ant Colony Optimization-Support Vector Machine using Weighted Ranking for Feature Selection and Classification.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Combining heterogeneous data to reverse engineer regulatory networks
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Evaluating Classifiers
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Statistical Techniques
Project 2 k-NN 2018/11/10.
Synthetic Gene Circuits Learn to Classify
Extra Tree Classifier-WS3 Bagging Classifier-WS3
CSE P573 Applications of Artificial Intelligence Bayesian Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Classification with CART
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

Making the Most of Small Sample High Dimensional Micro-Array Data Allan Tucker, Veronica Vinciotti, Xiaohui Liu; Brunel University Paul Kellam; Windeyer Institute

MicroArray Data High dimensional Small number of samples Need to identify predictive genes E.g. classification Rate confidence on genes based upon predictive ability / classification

Identifying Predictive Genes We use Naïve Bayes Classifier Well established Minimises parameters Feature selection using SA Repeated 10 times Apply cross validation

Identifying Predictive Genes Identify genes robustly Data perturbed during CV Repeats of stochastic SA search Assign confidence based upon the frequencies of genes being selected Limit maximum number of links

Effect of Model Complexity

Classification Accuracy Generally RSN performs best SA global search better than local Anomaly with B-Cell? Synthetic data supports global over local

Confidence Scores Relatively small number of genes Identified with high confidence Consistency between runs

Identified Genes

Conclusions When micro-array data only has small samples: Simple models with small parameters best Global search for parameters better Proposed RSN successfully identifes genes of interest paving way for further biological analysis Need to explore different parameters