Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.

Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain

Terminology  Genes  DNA, mRNA  Gene expression  Microarrays Microarrays

Microarray output

Gene Selection  Large number of irrelevant genes introduce “biological noise”  Analysis of results can be simplified by selecting only relevant genes for study  Two categories of gene selection –Filter approach selection –Wrapper approach selection

Gene Selection

Classifier  What is a classifier used for?  Mapping of label pairs to {0,1,?}  Golub-Slonim classifier  Positive value = class 1, negative value = class 2

Ranking based gene selection methods  GS-correlation  Genes with most positive and negative correlation values are selected.  Tends to not select genes for which class values have large standard deviations with respect to training data (some of them may be most relevant and informative).

Ranking with disorder  This method doesn’t use the actual expression levels.  Ng_I represents the set of indices that belong to class I and h(x) is the indicator function.

Need for subset ranking  Individual ranking may not always result in selection of informative genes.  They ignore the relationships between genes by solely relying on individual scores.  Thus we need to explore subsets of genes to find the optimal subset for classification.

Genetic Algorithm  What is a genetic algorithm? –“Genetic Algorithms are defined as global optimization procedures that use an analogy of genetic evolution of biological organisms.” –Basically genetic algorithms tend to find the best solution to a problem by following an evolutionary process.

Basic Genetic Algorithm

Parallel Genetic Algorithm  For large population sizes, G.A. is computationally infeasible.  Hence the use of Parallel Genetic Algorithms.

Parallel Genetic Algorithm

Model and Encoding  Island Model -: Each processor runs a G.A. on a subset of the population and there is periodic migration.  Fixed Length Binary String Encoding-: Here if gene is included in the subset then value is 1 else 0.

Fitness Evaluation  Two Different Criteria –Classification Accuracy –Size of the subset fitness(x) = w 1 * accuracy(x) + w 2 *(1 – dimensionality(x))  Here, –accuracy(x) = test accuracy of the classifier built with the gene subset represented by x –dimensionality(x)  [0,1] = the dimension of the subset

Fitness Evaluation –w 1 = weight assigned to accuracy –w 2 = weight assigned to dimensionality  High classification accuracy and low dimension has high fitness.

Data Sets Used

Test Parameters  The tests were run on two processors.  The parameters of G.A. in each processor were set as -: –Population Size : 1000 –Trials : 400000 –Crossover probability: 0.6 –Mutation probability: 0.001

Test Parameters –Selection Strategy: Elitist –Migration Probability: 0.002  Crossover probability of average level to get different subpopulation with good traits of the parents.  Mutation Probability low to avoid randomness of selection.  Selection Strategy is Elitist which ensures that the best individuals are kept and hence leads to more accurate subsets of genes.

Results

 Leukemia Data Set –Subset with 29 Genes found –Classifies 36/38 training instances correctly –Classifies 30/34 test instances correctly  Colon Data Set –Subset with 30 genes found –92% accuracy on the training data set

Results Comparison  Results better than other algorithms such as G-S and NB algorithms which have accuracies less than 90% and gene numbers varying from 10 to 500.

Average Performance Graphs

Conclusion  Method does well in finding smaller gene subsets and better accuracies.  Fitness function needs to be something more sophisticated than the simple one used right now to ensure a final compact subset every time.

Questions Thank You.

Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.

Similar presentations

Presentation on theme: "Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.

Similar presentations

Presentation on theme: "Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain."— Presentation transcript:

Similar presentations

About project

Feedback