Random Sets Approach and its Applications Basic iterative feature selection, and modifications. Tests for independence & trimmings (similar to HITON algorithm).

Slides:



Advertisements
Similar presentations
IB Portfolio Tasks 20% of final grade
Advertisements

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Lecture 07 Segmentation Lecture 07 Segmentation Mata kuliah: T Computer Vision Tahun: 2010.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.
2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Statistics for Managers Using Microsoft® Excel 5th Edition
Feature Selection for Regression Problems
Ensemble Learning: An Introduction
Lecture 6: Causal Discovery Isabelle Guyon
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Semi-Supervised Learning
Active Learning for Class Imbalance Problem
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Intelligent Control Methods Lecture 14: Neuronal Nets (Part 2) Slovak University of Technology Faculty of Material Science and Technology in Trnava.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
10. Decision Trees and Markov Chains for Gene Finding.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Ensemble Classifiers.
Trees, bagging, boosting, and stacking
Classification of unlabeled data:
Basic machine learning background with Python scikit-learn
Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Text Categorization Berlin Chen 2003 Reference:
A Data Partitioning Scheme for Spatial Regression
Lecture 16. Classification (II): Practical Considerations
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Random Neural Network Texture Model
Presentation transcript:

Random Sets Approach and its Applications Basic iterative feature selection, and modifications. Tests for independence & trimmings (similar to HITON algorithm). Experimental results with some comments. Concluding remarks. Introduction: input data, objectives and main assumptions. 1 Random sets approach. Vladimir Nikulin, Suncorp, Australia

Training data: where is binary label and is a vector of m features: Introduction 2 In practical situation the label y may be hidden, and the task is to estimate it using vector of features: Area under receiver operating curve (AUC) will be used as an evaluation and optimisation criterion.

3 Causal relations X1X2 Y X6 X3X4 X7 X8X9 Main assumption: direct features have stronger influence on the target variable and, therefore, are more likely to be selected by the FS-algorithms. Manipulations are actions or experiments performed by an external agent on a system, whose effect disrupts the natural functioning of the system. By definition, all direct features can not be manipulated.

4 Basic iterative FS-algorithm 5: Transfer feature f from S to Z. 6: stop if there are no improvement. Alternatively, goto Step 4. 3: Set Z = []. 4: Select feature f according to D and g. 1: Input: Y-target variable; S –set of features. 2: Select evaluation criterion D and algorithm g.

5 BIFS: behaviour of the target function CINA LUCAP MARTI REGED

6 RS-algorithm 5: Select range of repeats for detailed investigation. 6: Apply some trimming (tests for independence). 3: Select block B of top/worst performing sets of features. 4: Compute for any feature number of repeats in B. 1: Evaluate long sequence of RS using CV. 2: Sort results in increasing order.

7 RS(10000, 40), MARTI case 10%

8 Test for independence (or trimming) 1: Input: Z – subset of features; Δ – threshold parameter. 2: Compute: 3: Z:=Z \ f and goto Step 2 if α < Δ; stop procedure, alternatively.

9 Data# Train (positive)# TestDimension Method Software LUCAS 2000 (1443) neural+gentleboostMATLAB-CLOP LUCAP 2000 (1443) neural+gentleboostMATLAB-CLOP REGED 500 (59) SVM-RBFC SIDO (452) binaryRFC CINA (3939) adaBoost R MARTI 500 (59) svc+standardizeMATLAB-CLOP Base models and software

10 Final results (first 4 lines) DataSubmissionCASE0CASE1CASE2MeanRank REGED vn SIDO vn CINA vn14a MARTI vn LUCAS vn validation LUCAP vn10b+vn validation CINA vn all features CINA vn CE

11 Data Submission # features Fscore TrainAUC TestAUC REGED1 vn REGED1 vn11d REGED1 vn REGED1 vn MARTI1 vn12c MARTI1 vn MARTI1 vn MARTI1 vn SIDO0 vn SIDO0 vn9a SIDO0 vn SIDO0 vn Some particular results

12 Behaviour of linear filtering coefficients, MARTI-set

13 CINA-set: AdaBoost, plot of one solution against another

14 SIDO, RF(1000, 70, 10)

15 Some comments In practical applications we are dealing not with pure probability distributions, but with mixtures of distributions, which reflect changing in time trends and patterns. Accordingly, it appears to be more natural to form training set as an unlabeled mixture of subsets derived from different (manipulated) distributions, for example, REGED1, REGED2,..,REGED9. As a distribution for the test set we can select any “pure” distribution. Proper validation is particularly important in the case when training and test sets have different distributions. Respectively, it will be good to apply traditional strategy: split randomly available test-set into 2 parts 50/50 where one part will be used for validation, second part for testing.

16 Concluding remarks Random sets approach has heuristic nature and has been inspired by the growing speed of computations. It is general method, and there are many ways for further developments. Performance of the model depends on the particular data. Definitely, we can not expect that one method will produce good solutions for all problems. Probably, it was necessary to apply more aggressive FS-strategy in the case of Causal Discovery competition. Our results against all unmanipulated and all validation sets are in line with top results.