Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.

Slides:



Advertisements
Similar presentations
How SAS implements structured programming constructs
Advertisements

The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Next Presentation: Presenter: Arthur Tabachneck Copy and Paste from Word or Excel to SAS Art holds a PhD from Michigan State University, has been a SAS.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
MICB 405 Bioinformatics Mini-Lab #1 – NCBI’s Entrez Dr. Joanne Fox We gratefully acknowledge the funding for the development of these.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
On line (DNA and amino acid) Sequence Information Lecture 7.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Integrating dbSNP with P. falciparum genome resources.
Fatchiyah, PhD Dept Biology UB Fatchiyah.lecture.ub.ac.id
Single Nucleotide Polymorphisms Jennifer Lyon Eskind Biomedical Library May 1, 2009 CRC Workshop Series.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Basic And Advanced SAS Programming
An Introduction to Bioinformatics Molecular Biology Databases.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Creating SAS® Data Sets
Topics in Data Management SAS Data Step. Combining Data Sets I - SET Statement Data available on common variables from different sources. Multiple datasets.
Internet Databases I Bert Gold, Ph.D., F.A.C.M.G..
DbSNP: the NCBI database of genetic variation S. T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K. Sirotkin, Nucleic Acids.
On line (DNA and amino acid) Sequence Information
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
Problem Set I review BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
RTSUG 04Feb2014: Beyond Directory Listings in SAS By: Jim Worley.
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
Using Advanced INPUT Techniques Peter Cosette Dave Hall Amy Dunn-Ruiz Eric Lyon.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Korea BioInformation Center Byoung-Chul Kim
CANDID: A candidate gene identification tool Part 2 Janna Hutz March 26, 2007.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
NCBI Literature Databases: PubMed
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Writing and Reading XML files with SAS (Statistical Analysis System) What is SAS ? SAS Institute (or SAS, pronounced "sass") is an American developer of.
Search Functions Simple Search Advanced Search.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Midday: A Librarian’s Guide to NCBI DeDe Leshy, MLIS, MS June 19, 2013.
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FVGWAS: Fast Voxelwise Genome Wide Association Analysis of Large-scale Imaging Genetic Data Tutorial: pipeline,
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
Introduction to SAS®.
شاخصهای عملکردی بیمارستان
Agenda Test next Week! SI or no SI? File Update Techniques – Review.
ID Mapping tools: Converting Accessions between Databases
فرق بین خوب وعالی فقط اندکی تلاش بیشتر است
Searching the NCBI Databases
Biological Databases BI420 – Introduction to Bioinformatics
By Stitziel, Tseng, Pervouchine, Goddeau, Kasif, Liang
Genome Biology & Applied Bioinformatics Mehmet Tevfik DORAK, MD PhD
Genome Biology & Applied Bioinformatics Mehmet Tevfik DORAK, MD PhD
Gene Safari (Biological Databases)
Presentation transcript:

Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008

NCBI (  Contains a large number of databases  Most important are: - GenBank - PubMed - RefSeq - Online Mendelian Inheritance in Man (OMIM) - dbSNP

dbSNP Database

NCBI dbSNP  Contains information about SNPs  Submitted data is given an ss number (e.g. ss )  If data meets criteria a reference SNP is created which had an rs number (e.g. rs530)

dbSNP Data (1) - Each record with various lines and each line with various lengths

dbSNP Data (2)

dbSNP Data (3)

 Various uses of the SCAN, INDEX functions to assist in reading data (1) data ncbisnp ; length rs $12 ; infile din firstobs=1 missover pad; input snpline $132. ; if index(snpline,"updated")>0 then do; rs=compress(scan(snpline,1,"|")); output; end; run;

 Various uses of the SCAN, INDEX functions to assist in reading data (2) if index(snpline,"alleles=")>0 then do; alleles=substr(compress(scan(snpline,2,"|")),9); output; end; if index(snpline,"assembly=reference")>0 then do chrom=input(substr(compress(scan(snpline,3,"|")),5),8.); posc=compress(scan(snpline,4,"|")); output; end;

 Use RETAIN statement - cause a variable to keep its value from one iteration of the DATA step to the next. retain markname rs alleles;

dbSNP Data (4)

Output SAS Dataset

Readings:  Kim L Kolbe etc., SUGI 22: “Advanced Techniques for Reading Difficult and Unusual Flat Files”.  Clinton S Rickards, SUGI 24: “Reading External Files Using SAS ® Software ”.