Essential BioPython Retrieving Sequences from the Web MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez
The Entrez eFetch Function Fetching a single Genbank Sequence from the Network from Bio import SeqIO from Bio import Entrez #Please use your REAL email address below: Entrez.email="youremail@yourdomain.edu" handle = Entrez.efetch(db="nucleotide",rettype="gb",id="NM_000518") sr = SeqIO.read(handle,"genbank") print sr.id print sr.seq handle.close()
Main eFetch Function Parameters Name Req’d Default Description with Options db Y N/A Database to search** (e.g. nucleotide, protein, structure) id Single of comma separated list of unique IDs** retmode N db specific Data format for records returned* (text, xml or asn.1.) rettype Data layout for records returned* (e.g. fasta, gp) retstart Sequential index of first record to be retrieved Retmax 10,000 Max number of records to retrieve seq_start 1 First sequence base to retrieve seq_stop length Last sequence base to retrieve *See http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?report=objectonly for a complete list of available retmode/rettype combinations **See http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.chapter2_table1/?report=objectonly for a complete list of avalable Entrez databases and their corresponding unique identifiers.
The Entrez eFetch Function Fetch a single Genbank Sequence from the Network and Save from Bio import SeqIO from Bio import Entrez Entrez.email="youremail@yourdomain.edu" InHandle = Entrez.efetch(db="nucleotide",rettype="gb",id="NM_000518") sr = SeqIO.read(InHandle, "genbank") OutHandle = open("NM_000518.gb","w") SeqIO.write(sr,OutHandle,"genbank") InHandle.close() OutHandle.close()
The Entrez eFetch Function Fetch a single Genbank Sequence from the Network and Save as Fasta from Bio import SeqIO from Bio import Entrez Entrez.email="youremail@yourdomain.edu" InHandle = Entrez.efetch(db="nucleotide",rettype="gb",id="NM_000518") sr = SeqIO.read(InHandle, "genbank") OutHandle = open("NM_000518.fasta","w") SeqIO.write(sr,OutHandle,"fasta") InHandle.close() OutHandle.close()
The Entrez eFetch Function Fetch multiple Genbank Sequences from the Network and Save as Fasta from Bio import SeqIO from Bio import Entrez accessions="NM_000518, AJ131351" Entrez.email="youremail@yourdomain.edu" InHandle = Entrez.efetch(db="nucleotide",rettype="gb",id=accessions) seqRecords = SeqIO.parse(InHandle, "genbank") OutHandle = open("myseqs.fasta","w") SeqIO.write(seqRecords,OutHandle,"fasta") InHandle.close() OutHandle.close()
The ExPASy and SwissProt Packages Fetch SwissProt Sequences from ExPASy and Save as Fasta >>> from Bio import ExPASy >>> from Bio import SeqIO >>> accessions = ["O23729", "O23730", "O23731"] >>> records = [] >>> for accession in accessions: ... handle = ExPASy.get_sprot_raw(accession) ... record = SeqIO.read(handle, "swiss") ... records.append(record) >>> outHandle=open("mysequences.fasta", "w") >>> SeqIO.write(records,outHandle,"fasta") >>> outHandle.close()