Download presentation
Presentation is loading. Please wait.
1
SAS Tip: split data before proc transpose
Bill Howells, MS Washington Univ School of Medicine, St Louis
2
Conflicts of Interest None No financial interest
. . .duh, I’m an analyst!
3
Genomic data in long format
Genomic data: ids in columns, genotypes in rows, 70 million known SNPs, 500K to 1M in a dataset from a microarray chip But, all our data processing relies on row operations, eg. subsetting subject ids, eg. for consented subjects in medical research, HIPAA What happens when you run out of memory with proc transpose?
4
IDs DATA Out of Memory !
5
IDs DATA DATA split rows DATA
6
DATA IDs transpose each chunk IDs IDs
7
DATA IDs subset rows IDs IDs
8
IDs back to columns within chunk
DATA transpose IDs back to columns within chunk DATA DATA
9
IDs DATA append rows
10
X command: Linux tail | split
%macro chunkit( chunksize=5000 ); x "tail -n +2 &dir/&indata | split -l &chunksize -d -a 3 - &dir/body"; x "ls -l &dir/body???"; %mend chunkit; %chunkit( chunksize=3000 );
11
X command: Linux cat header
%do i=0 %to &nfiles ; *** append the header to each body ***; x "cat &dir/head1_nofids.txt &dir/body&z3i > &dir/headbody&z3i"; %end;
12
SAS proc transpose each chunk
%do i=0 %to &nfiles ; *** read text file into SAS (not shown) proc transpose data=DOSETEST_NOFIDS out=DOSETEST_NOFIDS_TR name=SAGE_ID ; var &listvar; id snp; run; quit; *** write trans SAS to text file (not shown)***; %end;
13
X command: Linux cat, rm ************************************************; * cat the body files into one master body ************************************************; x "cat &dir/bodysubset??? > &dir/bodymaster"; ************************************************; * clean up temp text files ************************************************; x "rm &dir/bodysubset???";
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.