Download presentation
Presentation is loading. Please wait.
1
Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project
2
Overview Introduction Raising general quality in Wikipedia Validating chemical data in Wikipedia Recent developments in Wikipedia Chemistry The future? Questions?
3
INTRODUCTION What is Wikipedia – and what is it not?
4
Wikipedia is… An encyclopedia A useful resource for chemistry Written by volunteers Editable by anyone Free to be copied, re-used Free as in “no cost” Wikipedia is not… A database A place to publish original research An authoritative resource for chemistry Written mainly by kids, or by paid professionals Free to re-use without attribution Run by a corporation
5
Types of chemistry article WIKIPROJECT CHEMISTRY Chemical concepts Chemical reactions & processes Chemists WIKIPROJECT ELEMENTS Chemical elements WIKIPROJECT CHEMICALS Chemical substances WIKIPROJECT PHARMACOLOGY Pharmaceuticals WIKIPROJECT CELL & MOLECULAR BIOLOGY Molecular biology
6
WikiProject Chemistry
7
General chemistry content Reactions & processes, concepts, chemists’ biographies, etc.
8
WikiProject Chemicals ~60 members (~20 active) Collaborates on writing quality articles and standards for: –developing data boxes for articles –chemical naming, structure drawing –article assessment Data validation Collaboration with CAS Wim Van Dorst, a Dutch member of WP:Chem since March 2005.
9
Most articles have a Chembox Chembox is designed to be machine readable and “database friendly”
10
WikiProject Pharmacology
11
Most articles have a Drugbox
12
Traffic can be very high….
13
Even for specialized topics
14
RAISING GENERAL QUALITY IN WIKIPEDIA
15
WMF: Long term strategy Expand the “virtuous circle” DiagramDiagram by User:Randomran – Creative Commons license
16
Article assessment – by editors
17
Assessment guides article improvement priorities
18
Article ratings – by users
19
Pending changes (flagged revisions) “Articles under PC protection are open for editing, but changes will be visible to readers who are not logged in only after being checked for obvious vandalism and clear errors.”
20
WikiTrust Downloadable as an extension to Firefox, this adds a tab above the article:
22
VALIDATION OF WIKIPEDIA CHEMICAL DATA
23
How I use the key terms Validation => “How I can be sure the data are correct?” Curation = fixing errors
24
Content validation In 2008 a data validation drive was initiated for basic chemical identifiers Led to a collaboration with CAS, to ensure Wikipedia CAS registry nos. are correct Now around 3500 substances have been validated against CAS Common Chemistry, as having correct name, structure & CAS RN Other fields now being validated Validated content indicated with a check mark
25
CommonChemistry Launched in April 2009 Came about as a result of a collaboration between CAS & Wikipedia Offered as a free service for CAS RNs for members of the public.
26
Organized by WP:Chemicals Moderate participation from members of WP:Pharmacology
27
The approach to validation Every old version (called a RevID) of an article is preserved (for all) for posterity, and can potentially serve as a permanent record of a validated version.
28
Protecting validated fields PROBLEM: This is “the encyclopedia anyone can edit” – so anyone can change the BP of water to 200 o C. SOLUTION: A bot patrols the pages, and watches for edits to key fields. Any dubious edits are flagged with a red X (next to the data), and logged. System developed by Dirk Beetstra (Eindhoven University of Technology). It is the only such tool on Wikipedia.
29
Validation protected by bot If anyone tries to vandalize a validated field, this will be flagged by a bot soon afterwards. –This example received a red X 11 minutes after it was vandalized.
30
Validated revisionIDs
31
Checking structures IN 2008-2010, around 3000 chemical structures were informally checked against CAS Common Chemistry PROBLEM: Structures are loaded from an external file on Wikimedia Commons, which can be “invisibly” changed
32
Since fall 2010 Now the bot has been modified to watch changes to the RevID of the Wikimedia Commons structure image A few hundred images now validated
33
Drugboxes Drugboxes are patrolled by the bot, but at present WP:PHARM not active in formal validation. Most work done by Dirk Beetstra, using official lists from data sources (e.g., ChEBI).
34
THE FUTURE?
35
Validation of melting points Physical properties are much harder – require human validation Collaboration beginning with JC Bradley (Drexel) & A Lang (Oral Roberts) on MPs.
36
Supplementary data pages
37
Supplementary data pages can host MP validation sources These pages have room to list all sources with linked refs – providing a “paper trail” to original sources
38
Other future developments New formats for content – books, for cellphones (Kiwix, Wikipock, Okawix) Offline versions that use quality checks and vandalism checks– for use in schools, developing countries, etc. More validated data fields, with “paper trails” and real-time checks Mashups with other sites Integration with lab instrumentation, lab notebooks, etc?
39
Acknowledgements Antony Williams (RSC ChemSpider) Dirk Beetstra (Tech Univ Eindhoven) User:Physchim62 and many other Wikipedians JC Bradley and Andrew Lang
40
ANY QUESTIONS? Thank you for your attention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.