Download presentation
Presentation is loading. Please wait.
1
Directly Upload Data From An ELN Into PubChem
Ben Shoemaker U.S. National Center for Biotechnology Information: NCBI / NLM / NIH
2
Maximize the impact of your research … with little effort
PubChem is a global resource for open chemistry Data sources in PubChem found by internet search Open Access mandates satisfied with PubChem Data formats and web interfaces can impede upload Programmatic access to data uploads facilitates ELN integration
3
PubChem Mission PubChem is an open archive and a public resource with the primary aim to provide information on the biological activity of chemical substances
4
Unique chemical structure content of PubChem
PubChem is an archive Submitted; SID accession Derived; CID accession Substance without structure is not in Compound Substance records keep provenance clear Unique chemical structure content of PubChem Compound helps to group Substance records Submitted; AID accession
5
Why does a user come to PubChem?
Search result from Google/Yahoo/Baidu Purchase decision for molecule ‘X’ Publications about molecule/concept Patents/Biological activities What is known about the molecule? Physical properties Pharmacology Biological activity Safety information Spectroscopy Toxicity Pathways Etc. Launching pad for associations to related databases Image credit:
6
Chemical information is everywhere now
PubChem is helping to improve accessibility to chemical information
7
PubChem growth Sustained growth over 12 years of: Contributors
Chemical substance descriptions Biological testing data Usage Top-10 chemistry website (#5?) ~1.5M monthly unique users at peak Heavy programmatic usage ~5% of unique IPs per month (~70K) Serve millions of web hits per day 2M-12M on average (0.5M interactive)
8
Benefit from PubChem by uploading data
Minimal startup time Flexible interface Spreadsheet data accepted via file or web interface
9
Upload chemicals Draw or load structures Enter annotations & synonyms
Link back to your site
10
Upload screening results
Spreadsheet load File or web Include all test results E.g. an article table
11
Upload screening results
Add annotations Specify targets Database links
12
Annotate with controlled vocabularies
Include ontology terms such as from BAO, GO, MESH
13
How can data loading be improved?
This works well, but… Issues: Web interfaces and file formats must be learned Open access data requirements add yet another step to lengthy and time-sensitive publishing process FTP uploads can be automated, but require custom scripts difficult for single-use
14
How can data loading be improved?
Ideas: Ideally, a single “Make Public!” button would be added to existing end-user software This ‘publish’ button would require a standard implementation to make it simple to add Electronic Lab Notebooks (ELNs) would be good candidates for such functionality Great, so how do we that?
15
Build on public data: Programmatic access
Outside websites create novel platforms for increased exposure REST: Easy, predictable access for research analysis
16
PubChem Upload REST Extend programmatic access to “pushing” data
Open suite of operations for loading data Create standard syntax to simplify interface Use secure login and key to restrict access
17
PubChem Upload REST The URL path Domain Operation
ad/<domain specification>/<operation specification>/ [?<operation_options>] <domain specification> = substance | assay | account login, upload, set_record, get_record, pending, list_records, commit, export_file, get_sidlist, list_archived, get_viewcode, set_viewcode, delete_viewcode
18
PubChem Upload REST Example
Let’s say that you have structure and annotation information for three chemicals including: Unique identifiers CAS registry numbers Common names SMILES Tag list found on help page: SDF, CSV and Excel accepted PUBCHEM_EXT_DATASOURCE_REGID PUBCHEM_SUBSTANCE_SYNONYM PUBCHEM_EXT_DATASOURCE_SMILES my_sub1 D-Glucose, anhydrous C(C1C(C(C(C(O1)O)O)O)O)O my_sub2 CCOC1=CC=CC=C1NC(=O)C2=CC3=CC=CC=C3C=C2O my_sub3 C1=CC=CC=C1
19
PubChem Upload REST Example
Authenticate: Provide user credentials Security key returned for subsequent operations unix> curl -c cookie1.txt " ?login=MyLogin&password=test-password" { "Response": { "ResponseCode": "Pass", "UserId": "999" } } Base Domain Operation Arguments pubchem../rest/upload account login login,password
20
PubChem Upload REST Example
Upload From File unix> curl -b cookie1.txt -F " Base Domain Operation Arguments Input pubchem../rest/upload substance upload process SDF
21
PubChem Upload REST Example
Upload from a URL-encoded string unix> curl --cookie "deposit_ses_key=8F565CD7-46E CB5-B3449C5B70A5" -d "data= PUBCHEM_EXT_DATASOURCE_REGID%2CPUBCHEM_SUBSTANCE_SYNONYM%2CPUBCHEM_SUB STANCE_SYNONYM%2CPUBCHEM_EXT_DATASOURCE_SMILES%0Amy_sub1%2C %2C%22D- Glucose%2C%20anhydrous%22%2CC%28C1C%28C%28C%28C%28O1%29O%29O%29O%29O%29 O%0Amy_sub2%2C%2C%2CCCOC1%3DCC%3DCC%3DC1NC%28%3DO%29C2%3DCC3%3DCC%3D CC%3DC3C%3DC2O%0Amy_sub3%2C%2Cbenzene%2CC1%3DCC%3DCC%3DC1%0A" " Base Domain Operation Arguments Input pubchem../rest/upload substance upload process CSV
22
PubChem Upload REST Example
Check the status of your pending submissions unix> curl -b cookie1.txt " {"Response": {"ResponseCode": "Pass","PendingSubmissions": [{"UploadId": "40637","Date": "2016/02/08 16:25","Status": "V1","DataSet": "form-data.sdf","Records": "3"},{"UploadId": "40638","Date": "2016/02/08 17:06","Status": "V1","DataSet": "form-data.sdf","Records": "3"}]}} Base Domain Operation Arguments pubchem../rest/upload substance pending
23
PubChem Upload REST Example
Commit your submission into the public PubChem database unix> curl -b cookie1.txt " {"Response": {"ResponseCode": "Pass","OperationStatus": [{"UploadId": "40637","CommitStatus": "Pass"}]}} Base Domain Operation Arguments pubchem../rest/upload substance commit upload_id
24
Maximize the impact of your research … with little effort
PubChem is a global resource for open chemistry 12 years of growth Top 10 chemistry website Data sources in PubChem found by internet search Uploading is easy for small and large submissions Programmatic access to data uploads facilitates ELN integration Leverage PubChem’s impact on chemistry
25
Acknowledgments: The PubChem Team
Evan Bolton Jie Chen Tiejun Cheng Gang Fu Renata Geer Asta Gindulyte Lianyi Han Jane He Steve Bryant (PI) Siqian He Sunghwan Kim Paul Thiessen Jiyao Wang Yanli Wang Bo Yu Leonid Zaslavsky Jian Zhang All research supported by the Intramural Research Program of the NIH, National Library of Medicine. Special thanks: NCBI Help Desk and past PubChem group members.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.