Download presentation
Presentation is loading. Please wait.
Published byLinette Palmer Modified over 8 years ago
1
E-utilities: Short course
2
The Entrez Query System at NCBI
3
Search one or all of 31 databases. Generate brief “document summaries” for a list of records. Link from one list of records to another. Perform boolean operations on lists of records. Format records for display and download. Entrez Functions
4
Each record in an Entrez database is assigned an integer called a UID, or “unique identifier”. Entrez transactions are performed on lists of UIDs. Transactions include boolean operations and the tracking of links within and between database records. Entrez Transactions
5
Entrez supports text searches with field restrictions, boolean operators (sometimes implicit), and term grouping Field restrictions vary among the databases Term-mapping happens Explicitly fielded searches are not term- mapped Quoted phrases are searched as a unit Entrez Database Queries
6
PubMed : "chronic obstructive pulmonary disease"[Text Word] OR "pulmonary disease, chronic obstructive"[MeSH Terms] OR ("common cold"[TIAB] NOT Medline[SB]) OR "common cold"[MeSH Terms] OR "cold"[MeSH Terms] OR cold[Text Word] PMC : "pulmonary disease, chronic obstructive"[MeSH Terms] OR "common cold"[MeSH Terms] OR "cold"[MeSH Terms] OR cold[Text Word] Nucleotide : cold[All Fields] Taxonomy : cold[All Names] Term: cold
7
PubMed : ("mice"[TIAB] NOT Medline[SB]) OR "mice"[MeSH Terms] OR mouse[Text Word] PMC : "mice"[MeSH Terms] OR mouse[Text Word] Nucleotide : "Mus musculus"[Organism] OR mouse[All Fields] Taxonomy : mouse[All Names] Genome : "Mus musculus"[Organism] OR mouse[All Fields] Term: mouse
10
Viewing Indexed Terms on the Web Preview-Index Tab
11
miller baker: miller[All Fields] AND baker[All Fields] miller j baker m: miller j[Author] AND baker m[Author] AF123456, P12243,555 : direct retrieval of record PubMed, PMC, Nucleotide, Protein, Structure and others All Databases Patterns are Recognized
12
Separate search history is maintained for each database. Previous searches can be recalled and combined using a query key and a cookie, called a “WebEnv”. Available on the Web under the 'History Tab' Search History
13
Brief summaries of database records are generated quickly on frontend servers. Full records are retrieved from backend machines. DocSums
14
A set of eight server-side programs. Support a uniform URL syntax. Translate a standard set of URL-encoded input parameters for the array of programs comprising the Entrez system. Eutilities
15
Searches: esearch.fcgi DocSums: esummary.fcgi Links: elink.fcgi Uploads: epost.fcgi Downloads: efetch.fcgi Global Query: egquery.fcgi Spelling: espell.fcgi Information: einfo.fcgi Entrez Functions and EUtils
16
The Base URL http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ esearch.fcgi? egquery.fcgi? esummary.fcgi? efetch.fcgi? einfo.fcgi?elink.fcgi? epost.fcgi? eutil.fcgi?
17
URL Parameters esearch.fcgi?BASE/ db=nucleotide&term=mouse[orgn] Parameters are separated by & symbols db = nucleotide term = mouse[orgn] We need to know the following: 1.What parameters are available 2.What values they accept
18
A Docsum via esummary.fcgi and via the Web
19
A Simple Eutilities Pipeline
22
An Esearch Followed by Multiple Rounds of Efetch http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?usehistory=y&db=gene&term=mammalia[orgn] Elapsed time: 0 seconds 0%, 0 records of 161815 retrieved. Tue Jan 25 20:46:32 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=0&rettype=native&WebEnv=0ImHxGDH2z I93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 40 seconds 0.3%, 500 records of 161815 retrieved. Tue Jan 25 20:47:09 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=500&rettype=native&WebEnv=0ImHxGDH 2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 79 seconds 0.61%, 1000 records of 161815 retrieved. Tue Jan 25 20:47:48 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=1000&rettype=native&WebEnv=0ImHxGD H2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 118 seconds 0.92%, 1500 records of 161815 retrieved. Tue Jan 25 20:48:27 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=1500&rettype=native&WebEnv=0ImHxGD H2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 158 seconds 1.23%, 2000 records of 161815 retrieved. Tue Jan 25 20:49:07 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=2000&rettype=native&WebEnv=0ImHxGD H2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 204 seconds 1.54%, 2500 records of 161815 retrieved. Tue Jan 25 20:49:53 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=2500&rettype=native&WebEnv=0ImHxGD H2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xml
23
A Download of 161825 Mammalian Entrez Gene Records Efetch calls SECONDSSECONDS
24
EFetch Retrieves formatted data records matching a set of UIDs INPUT db Entrez database to search OUTPUT Varied Formatted data records efetch.fcgi?BASE/ db=nucleotide&id=49619226,49615287 id Set of UIDs To download data records Why us it?
25
Databases that Support EFetch Literature PubMed Journals PubMed Central OMIM Sequences Nucleotide Protein Genome Popset SNP Other Gene Taxonomy PC Substance PC Compound Unique queuing interface!
26
EFetch Formatting Parameters rettype retmode Determines the type of data record returned (flat file, FASTA, EST, accession, etc.) Determines the format (mode) of data record returned (text, HTML, XML) Be warned! These settings are very dependent on the database These settings interact with one another Not all possible combinations are supported
27
The Entrez History Server Entrez History Server Stores UID lists resulting from previous searches ESearch EPost The History Server represents the location of stored UID sets with two parameters: WebEnv query_key A string specifying a cookie assigned by the History Server An integer equivalent to the History number on the web ELink
28
EPost Stores a list of UIDs on the History Server INPUT db Entrez database containing UIDs OUTPUT XML epost.fcgi?BASE/ &db=nucleotide&id=49619226,49615287 id List of UIDs WebEnvquery_key To upload a large file or set of UIDs Why use it? WebEnv Pre-existing WebEnv to use
29
Using ESearch to Post Results db=nucleotide&term=mouse[orgn]&usehistory=y WebEnv query_key
30
Accessing the History Entrez History Server EPost ESearch usehistory=y ELink cmd=neighbor_history ESearch ESummary ELink EFetch WebEnv query_key
31
The Big Picture ESearch EPost ESummary EFetch ELink Entrez History Server UID List Entrez query WebEnv query_key UID List usehistory=y cmd=neighbor_history
32
ELink Retrieves UIDs in database B linked to a set of UIDs in database A INPUT db Entrez database(s) to link to; can be a list! OUTPUT XML Set(s) of linked UIDs elink.fcgi?BASE/ dbfrom=nucleotide&db=protein&id=49619226 id List of UIDs dbfrom Entrez database to link from cmd ELink command mode (default = neighbor) To find related data in another database To find neighbors within a database Why use it?
33
Computational Neighbors in ELink Retrieves UIDs linked to other UIDs in the same database dbdbfrom = elink.fcgi?BASE/ dbfrom=protein&db=protein&id=15718680 term Entrez query that ELink uses to limit the set of neighbors Supported databases: pubmedcdd nucleotidegeo proteingds domains
34
Link Names All possible link names for a database are given by EInfo Link names for a given call are given in the ELink XML output gene_protein Links from gene to protein protein_gene Links from protein to gene Links from gene to snp gene_snp gene_snp_genegenotype Links from gene to snps that have genotype data genome_nucleotide_comp_mrna Links from a chromosome to all mRNAs transcribed by genes on that chromosome
35
Passing One UID Set to ELink G1G2G3 P1P2P3P4P5P6 dbfrom=gene&db=protein&id=G1,G2,G3
36
Passing Multiple UID Sets to ELink G1G2G3 P1P2P3P4P5P6 dbfrom=gene&db=protein&id=G1&id=G2&id=G3
37
Passing Multiple UID Sets to ELink G1G2G3 P1P2P3P4P5P6 dbfrom=gene&db=protein&id=G1,G2&id=G3
38
Finally, Now for Your UID! Please use both of these parameters in your URLs in case there are problems tool email a unique name for your software package your email address, so we can contact you… &tool=mr.gene&email=funwithgenes@big.genomics.com
39
Accessing Entrez links –Hard links between databases –Computational links within a database –Filtering according to the existence of links
40
Entrez Links for GI 4680720 Microarray datasets for M17755 Gene annotation based on M17755 DNA/RNA sequences similar to M17755 Human phenotypes involving TPO Protein translation of M17755 Literature abstracts about M17755 Sequence polymorphisms in M17755 Source organism of M17755 STS markers in the TPO gene TPO links beyond NCBI Full text online articles about M17755 All polymorphisms in the TPO gene Graphical view of TPO gene annotation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.