Zoom 30 x 200 Matrix for Red: R(red) G(green) B(blue) Frequency for red: 0 0 1
30 x 200 Matrix
CUT
for i=1:n for j=1:m if x(i,j)==1 for i1=(i+1):n k=abs(i-i1); for j1=(j+1):m if x(i1,j1)==1 dist(k+abs(j-j1)) =dist(k+abs(j-j1))+1; end n x m matrix: Calculate distance distribution : into array dist(100)
Dataset
Consider only two letters for now data bb; set aa; if ccc in ("1","O"); run; proc princomp out=pred; var x25-x66; run; proc gplot; plot prin1*prin2=ccc; run;
1 and O
1 and A
1 and I
Logistic regression distance distribution proc logistic data=bb; model ccc=x25-x66; output out=pred1 p=p; run; data pred2; set pred1; if p>0.5 then a="1"; else a="A"; run; proc freq; table ccc*a; run;
Using distance distribution Predicted by logistic model True:
Using distance distribution Predicted by logistic model True:
More statistics n x m matrix: Block Sums (4 sums) Column Sums (use first 20 sums) x1-x4 x5-x24
Using sums and distance %macro testtwo(char1, char2); data bb; set aa; if ccc in ("&char1","&char2"); run; proc princomp out=pred; var x&start-x66; run; proc gplot; plot prin1*prin2=ccc; run; proc logistic data=bb; model ccc=x&start-x66; output out=pred1 p=p; run; data pred2; set pred1; if p>0.5 then a="&char1"; else a="&char2"; run; proc freq; table ccc*a; run; %mend; %let start=1; %testtwo(1,I);
Using distance distribution True: Using both
More than two letters 26 letters + 10 digits = 36 categories Two stages: Stage 1 - Nominal Responses: Baseline- Category Logit Model Stage 2 – if predicted value belong to, for example, 1 or I, then using two-level logistic regression to further classify it.
Stage 2: based on the error transition probability 1 and I 4, A, and V D, H, M, W S -> 8 Q -> O 6, 9, and 0
proc DISCRIM data=aa out=discout method=normal outstat=distat; class ccc; var x1-x66; run; proc freq data=discout; table ccc*_into_/nofreq nocol norow; run;