Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel chi-square test

Similar presentations


Presentation on theme: "Parallel chi-square test"— Presentation transcript:

1 Parallel chi-square test
Usman Roshan

2 Chi-square test The chi-square test is a popular feature selection method when we have categorical data and classification labels as opposed to regression In a feature selection context we would apply the chi-square test to each feature and rank them chi-square values (or p-values) A parallel solution is to calculate chi-square for all features in parallel at the same time as opposed to one at a time if done serially

3 Chi-square test Contingency table We have two random variables:
Label (L): 0 or 1 Feature (F): Categorical Null hypothesis: the two variables are independent of each other (unrelated) Under independence P(L,F)= P(D)P(G) P(L=0) = (c1+c2)/n P(F=A) = (c1+c3)/n Expected values E(X1) = P(L=0)P(F=A)n We can calculate the chi-square statistic for a given feature and the probability that it is independent of the label (using the p-value). Features with very small probabilities deviate significantly from the independence assumption and therefore considered important. Observed=c4 Expected=X4 Observed=c3 Expected=X3 Label=1 Observed=c2 Expected=X2 Observed=c1 Expected=X1 Label=0 Feature=B Feature=A

4 Parallel GPU implementation of chi-square test in CUDA
The key here is to organize the data to enable coalescent memory access We define a kernel function that computes the chi-square value for a given feature The CUDA architecture automatically distributes the kernel across different GPU cores to be processed simultaneously.


Download ppt "Parallel chi-square test"

Similar presentations


Ads by Google