Presentation is loading. Please wait.

Presentation is loading. Please wait.

Author Generated JATS XML Markup

Similar presentations


Presentation on theme: "Author Generated JATS XML Markup"— Presentation transcript:

1 Author Generated JATS XML Markup
Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com

2 How We Started Co-Founded Worldwide Cars Online in 1990
Sent images of cars and car parts via Compuserve s (modem speed 7kb/sec) No official Internet Closed the company in 1994 Created online content while at Baylor in 1994 Netscape goes public in 1995 Officially launched 1st online journal in 1995

3 How We Continued Started with The Internet Journal of Anesthesiology
Added more journal over time All were open access from the beginning no registration required as reader) Some of the first articles were submitted in print via mail and I retyped them with Word Articles were then submitted to me via (attached as Word document)

4 How We Continued Initially used a Mosaic Browser tool and then a Netscape Browser tool to create HTML for the web pages Then used 1st version of FrontPage to create a more complex web site We decided in 1997 to convert Word documents into SGML data sets and then to use XML in 1998

5 What We Are Today We currently publish 82 titles (online medical journals) at We use our own article submission system (home-grown) at We just implemented a new backend for article submissions and article flow We decided to have authors generate much of the markup

6 And Now Lets Get Technical
Author Generated JATS XML Markup by Andy Gajetzki

7 What is our JATS editor? Represents a move to author generated markup for our XML Based on a customizable and reusable PHP component Symfony2 – popular PHP framework Easy to use Form based, WYSYWIG and linear workflow

8 Our old workflow How we used to do things:
Three separate workflows for each article: Header generation Body markup Conversion from proprietary XML to JATS as the last step

9 Word Macros

10

11 Problems with our current method
Time consuming Delays in publishing Error prone Data entry is performed by programmers Authors don’t like the delay to publish and the delay to correct errors

12 Design Rational We can’t support the whole spec.
How did we determine what to support? Statistical analysis of most markup in our current article corpus How can we offset as much markup to the author as possible but still have a clean and intelligible end product?

13 What is supported NLM Blue 3.0 Two separate support levels
Inline-level Block-level Our level of JATS support is determined by each level.

14 Inline Level Italics, bold, and all other presentation layer markup supported

15 Block level Single level sections only as WYSIWYG editor is based on the HTML DOM Other tools providing a more XML approach are expensive, and more difficult for the author to use General structure is <sec> <title> <xyz> <Sec> > Boxed-text, fig, graphic, preformat, table-wrap, p, list

16 Titles Support of presentational elements with, for the most part, a non-mixed content-type

17 Contributors Flexible Single / collaborative authors
Most JATS <contrib-group> markup supported Inline-level formatting in block elements

18 Keywords Keywords should be based on MeSH entries
Validation constraints can be applied based on that

19 Other article-meta Article ID’s Author notes Supplemental content
Funding/grants Article history Permissions

20 Abstract / Body / Appendices
Currently a moving target MathML is not currently supported Current subset of JATS covers 99% of our cases, but we will always try to expand coverage

21 If no mapping is possible, another method must be devised
WYSIWYG HTML Editor Utilize a specific subset of HTML that we can unambiguously map to JATS via data transformations XSLT regexp If no mapping is possible, another method must be devised

22 Images / Table Capture / Media
Images / Figures are handled via out-of-band file upload on a separate page Authors are requested to upload highest quality format that they can Tables can either be captured as an image, or inserted via a Word style table creation tool Other media types have not been implemented yet

23 Endnote Handling – Document references
JavaScript annotation tool Endnote number / reference is highlighted in the text and a resolution is made to a back-matter citation entry

24 Supported Back Matter Acknowledgments Appendices Biography Glossary’s
Citations Notes Content-type attribute of note element supported

25 Citation Handling – Back matter
One citation per line Regular expression search for meta-data service identifiers at PMC and Crossref If a match is found, correct metadata is pulled from the service Simple JavaScript annotation tool to tokenize citation string Before submission, author must resolve all endnote problems

26 Citation Tokenization Example

27 From browser to JATS XML
The block level components operate on the HTML DOM CSS classes are added to elements to distinguish content types Through various transformations, we interpret the resultant DOM and produce the JATS XML HTML  mapping  JATS XML

28 Validation When things go wrong 1) XSD Validation
Intervention required by staff 2) Style/presentation problems Intervention required by author/staff 3) Copy editing 4) Peer review

29 Amazon Mechanical Turk
For predictable failures, Amazon Mechanical Turk, a platform for “human intelligence tasks”, can be used For a small price, work units are created and human workers get paid to perform the task 24x7 availability

30 Summary

31 Contact For Questions Technical questions: General questions:
Andy Gajetzki General questions: Olivier Wenker, MD, MBA


Download ppt "Author Generated JATS XML Markup"

Similar presentations


Ads by Google