Download presentation
Presentation is loading. Please wait.
Published byColleen Barber Modified over 9 years ago
1
Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008
2
NCBI (http://www.ncbi.nlm.nih.gov) Contains a large number of databases Most important are: - GenBank - PubMed - RefSeq - Online Mendelian Inheritance in Man (OMIM) - dbSNP
3
dbSNP Database
4
NCBI dbSNP Contains information about SNPs Submitted data is given an ss number (e.g. ss52079780) If data meets criteria a reference SNP is created which had an rs number (e.g. rs530)
5
dbSNP Data (1) - Each record with various lines and each line with various lengths
6
dbSNP Data (2)
7
dbSNP Data (3)
8
Various uses of the SCAN, INDEX functions to assist in reading data (1) data ncbisnp ; length rs $12 ; infile din firstobs=1 missover pad; input snpline $132. ; if index(snpline,"updated")>0 then do; rs=compress(scan(snpline,1,"|")); output; end; run;
9
Various uses of the SCAN, INDEX functions to assist in reading data (2) if index(snpline,"alleles=")>0 then do; alleles=substr(compress(scan(snpline,2,"|")),9); output; end; if index(snpline,"assembly=reference")>0 then do chrom=input(substr(compress(scan(snpline,3,"|")),5),8.); posc=compress(scan(snpline,4,"|")); output; end;
10
Use RETAIN statement - cause a variable to keep its value from one iteration of the DATA step to the next. retain markname rs alleles;
11
dbSNP Data (4)
12
Output SAS Dataset
13
Readings: Kim L Kolbe etc., SUGI 22: “Advanced Techniques for Reading Difficult and Unusual Flat Files”. Clinton S Rickards, SUGI 24: “Reading External Files Using SAS ® Software ”.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.