Presentation is loading. Please wait.

Presentation is loading. Please wait.

Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project.

Similar presentations


Presentation on theme: "Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project."— Presentation transcript:

1 Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project

2 Overview Introduction Raising general quality in Wikipedia Validating chemical data in Wikipedia Recent developments in Wikipedia Chemistry The future? Questions?

3 INTRODUCTION What is Wikipedia – and what is it not?

4 Wikipedia is… An encyclopedia A useful resource for chemistry Written by volunteers Editable by anyone Free to be copied, re-used Free as in “no cost” Wikipedia is not… A database A place to publish original research An authoritative resource for chemistry Written mainly by kids, or by paid professionals Free to re-use without attribution Run by a corporation

5 Types of chemistry article WIKIPROJECT CHEMISTRY Chemical concepts Chemical reactions & processes Chemists WIKIPROJECT ELEMENTS Chemical elements WIKIPROJECT CHEMICALS Chemical substances WIKIPROJECT PHARMACOLOGY Pharmaceuticals WIKIPROJECT CELL & MOLECULAR BIOLOGY Molecular biology

6 WikiProject Chemistry

7 General chemistry content Reactions & processes, concepts, chemists’ biographies, etc.

8 WikiProject Chemicals ~60 members (~20 active) Collaborates on writing quality articles and standards for: –developing data boxes for articles –chemical naming, structure drawing –article assessment Data validation Collaboration with CAS Wim Van Dorst, a Dutch member of WP:Chem since March 2005.

9 Most articles have a Chembox Chembox is designed to be machine readable and “database friendly”

10 WikiProject Pharmacology

11 Most articles have a Drugbox

12 Traffic can be very high….

13 Even for specialized topics

14 RAISING GENERAL QUALITY IN WIKIPEDIA

15 WMF: Long term strategy Expand the “virtuous circle” DiagramDiagram by User:Randomran – Creative Commons license

16 Article assessment – by editors

17 Assessment guides article improvement priorities

18 Article ratings – by users

19 Pending changes (flagged revisions) “Articles under PC protection are open for editing, but changes will be visible to readers who are not logged in only after being checked for obvious vandalism and clear errors.”

20 WikiTrust Downloadable as an extension to Firefox, this adds a tab above the article:

21

22 VALIDATION OF WIKIPEDIA CHEMICAL DATA

23 How I use the key terms Validation => “How I can be sure the data are correct?” Curation = fixing errors

24 Content validation In 2008 a data validation drive was initiated for basic chemical identifiers Led to a collaboration with CAS, to ensure Wikipedia CAS registry nos. are correct Now around 3500 substances have been validated against CAS Common Chemistry, as having correct name, structure & CAS RN Other fields now being validated Validated content indicated with a check mark

25 CommonChemistry Launched in April 2009 Came about as a result of a collaboration between CAS & Wikipedia Offered as a free service for CAS RNs for members of the public.

26 Organized by WP:Chemicals Moderate participation from members of WP:Pharmacology

27 The approach to validation Every old version (called a RevID) of an article is preserved (for all) for posterity, and can potentially serve as a permanent record of a validated version.

28 Protecting validated fields PROBLEM: This is “the encyclopedia anyone can edit” – so anyone can change the BP of water to 200 o C. SOLUTION: A bot patrols the pages, and watches for edits to key fields. Any dubious edits are flagged with a red X (next to the data), and logged. System developed by Dirk Beetstra (Eindhoven University of Technology). It is the only such tool on Wikipedia.

29 Validation protected by bot If anyone tries to vandalize a validated field, this will be flagged by a bot soon afterwards. –This example received a red X 11 minutes after it was vandalized.

30 Validated revisionIDs

31 Checking structures IN 2008-2010, around 3000 chemical structures were informally checked against CAS Common Chemistry PROBLEM: Structures are loaded from an external file on Wikimedia Commons, which can be “invisibly” changed

32 Since fall 2010 Now the bot has been modified to watch changes to the RevID of the Wikimedia Commons structure image A few hundred images now validated

33 Drugboxes Drugboxes are patrolled by the bot, but at present WP:PHARM not active in formal validation. Most work done by Dirk Beetstra, using official lists from data sources (e.g., ChEBI).

34 THE FUTURE?

35 Validation of melting points Physical properties are much harder – require human validation Collaboration beginning with JC Bradley (Drexel) & A Lang (Oral Roberts) on MPs.

36 Supplementary data pages

37 Supplementary data pages can host MP validation sources These pages have room to list all sources with linked refs – providing a “paper trail” to original sources

38 Other future developments New formats for content – books, for cellphones (Kiwix, Wikipock, Okawix) Offline versions that use quality checks and vandalism checks– for use in schools, developing countries, etc. More validated data fields, with “paper trails” and real-time checks Mashups with other sites Integration with lab instrumentation, lab notebooks, etc?

39 Acknowledgements Antony Williams (RSC ChemSpider) Dirk Beetstra (Tech Univ Eindhoven) User:Physchim62 and many other Wikipedians JC Bradley and Andrew Lang

40 ANY QUESTIONS? Thank you for your attention


Download ppt "Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project."

Similar presentations


Ads by Google