Gene Selection for Microarray-based Cancer Classification Using Genetic Algorithm 이 정문 2003/04/01 BI Lab.

Slides:



Advertisements
Similar presentations
1. Find the cost of each of the following using the Nearest Neighbor Algorithm. a)Start at Vertex M.
Advertisements

Chromosome Disorders. Classification of genetic disorders  Single-gene disorders (2%)  Chromosome disorders (
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
Data classification based on tolerant rough set reporter: yanan yean.
Discrimination Methods As Used In Gene Array Analysis.
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
Generate Affy.dat file Hyb. cRNA Hybridize to Affy arrays Output as Affy.chp file Text Self Organized Maps (SOMs) Functional annotation Pathway assignment.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
Whole Genome Expression Analysis
Biomarker and Classifier Selection in Diverse Genetic Datasets J AMES L INDSAY 1 E D H EMPHILL 2 C HIH L EE 1 I ON M ANDOIU 1 C RAIG N ELSON 2 U NIVERSITY.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Basic Data Mining Technique
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
The Broad Institute of MIT and Harvard Classification / Prediction.
The Generational Control Model This is the control model that is traditionally used by GP systems. There are a distinct number of generations performed.
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006.
1 Computing in High Energy and Nuclear Physics, February 2006, Mumbai, India.
Evolutionary Computation Dean F. Hougen w/ contributions from Pedro Diaz-Gomez & Brent Eskridge Robotics, Evolution, Adaptation, and Learning Laboratory.
Review.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Round Off!! Test Your Rounding Skills. Category 1 for 1 Question: Nearest Tens Category 1 Nearest Tens.
LO: to round to the nearest 10 Which ten is the following number nearest to?
Evolutionary Design of the Closed Loop Control on the Basis of NN-ANARX Model Using Genetic Algoritm.
Rounding To the nearest 10,100,1000. Round to the nearest 10 T 27 UH 27 tens units 27 1) Draw a line to the right of the tens 2) Is the number on the.
David Amar, Tom Hait, and Ron Shamir
EQTLs.
36 LO: to round to the nearest 10
Classification with Gene Expression Data
Genetic-Algorithm-Based Instance and Feature Selection
Results for all features Results for the reduced set of features
Statistical Applications in Biology and Genetics
Evaluating Techniques for Image Classification
Objective - To round whole numbers.
Claim 1 Smarter Balanced Sample Items Grade 5 - Target C
Luminal A normal-like Figure S12: KNN graph analysis showed that the cancer data consists of a series of connected, bifurcating clusters. luminal B normal.
Gene expression.
Gene Expression Classification
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Molecular Classification of Cancer
Data Mining (and machine learning)
Comparing Numbers.
Nearest-Neighbor Classifiers
Correlation of log-transformed signal intensity from two Affymetrix microarray hybridizations using platelet RNA. Plotted are those probesets with an average.
Schedule for the Afternoon
Lecture 7: Simple Classifier (KNN)
Round off 38 to the nearest ten.
Place Value.
Machine Learning: UNIT-4 CHAPTER-2
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Genetic algorithms: case study
Chromosome 8 cDNA microarray gene expression profile of the amplified regions of 8p11–12 in SUM-44, -52, and -225 cells versus MCF10A HME control cells.
HER-2/neu mRNA detection by gene expression profiling
Comparing Numbers.
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Talk outline Brief history of gene-expression profiling for cancer type classification Current commercially available tests - development and performance.
A, unsupervised hierarchical clustering of the expression of probe sets differentially expressed in the oral mucosa of smokers versus never smokers. A,
Presentation transcript:

Gene Selection for Microarray-based Cancer Classification Using Genetic Algorithm 이 정문 2003/04/01 BI Lab

Introduction Microarray can be used for cancer classification based on gene expression. Selection of informative genes for sample discrimination can improve the cancer classification. I use the genetic algorithm (GA) and k-nearest neighbor to find informative genes in multi-class microarray cancer data .

Gene Expression Data sample1 sample2 sample3 sample4 sample5 … 1 0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49 0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10 0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.06 1.06 1.35 1.09 -1.09 ... Genes 3 Gene expression level of gene i in mRNA sample j Tens or hundreds of samples Vs. Thousands of genes => Need to select informative genes

Rank-based selection methods For each gene, Signal-to-noise = (1 - 2) / ( 1 +  2) BSS/WSS = Are good at identifying genes which are strongly correlated with the target phenotype class distinction but ignore the interaction between genes

GA/kNN method(Leping Li,2001) Initial chromosomes consisting of d genes (In this case d = 5) G1 G35 G7 G21 G3 G32 G5 G1 G21 G10 G6 G3 For each chromosome, assign fitness (the number of samples correctly classified by kNN) G1 G35 G7 G21 G3 G32 G5 G23 G10 G6 Replacement Selection G1 G21 G10 G6 G3 Mutation G1 G23 G10 G6 G3 Is termination criterion met? no yes Save the chromosome

Datasets GCM NCI60 Ramaswamy et al, 2001 14 classes 190 samples (144 training set + 46 test set) 16,063 genes NCI60 Ross et al, 2000 9 classes 60 cancer cell lines 9,703 genes

Issues Choice of termination criterion Computationally intensive One-Vs-All classification : build n classifier for each n class Whether to use crossover Lamarckian GA (?)