Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum.

Similar presentations


Presentation on theme: "Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum."— Presentation transcript:

1 Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum CEO Inera Incorporated Journal Article Tag Suite Conference, 1 November 2010

2 Copyright  2010 Inera Incorporated. All Rights Reserved Remember When…

3 Copyright  2010 Inera Incorporated. All Rights Reserved Scholarly DTDs, Circa 2001 ISO 12083 Elsevier 1.1.0 Elsevier 2.1.1 Elsevier 3.0.0 Elsevier 4.1 Blackwell 2.2 Blackwell 3.0 Blackwell 4.0 Keton Camdus Capital City Charlesworth Alden Highwire 4.2.8 PMC 1.0 AIP UCP Wiley IEEE Nature BioOne U Chicago Press Cambridge Univeristy Press American GeoPhysical American Medical New England Journal American Chemical National Resarch Canada Academic Press Oxford University Press Academic Press Springer Lkuwer Academic

4 Copyright  2010 Inera Incorporated. All Rights Reserved Scholarly DTDs 2010  NLM DTD  Elsevier DTD  Springer DTD  Wiley-Blackwell DTD  And a few others…  No longer a grand mess, but…  NLM DTD Suite applications vary  Specific tagging practices meet publisher-specific requirements

5 Copyright  2010 Inera Incorporated. All Rights Reserved Data and Methodology  Data from 25 eXtyles and refXpress implementations since 2003  Not a scientific survey  However useful to show NLM DTD usage variations  Supplier requirements differ from publishers  Serve multiple publishers who deliver to different platforms

6 Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Adoption By Year OrganizationDTDYearVersionPrior XML Publisher 1Archive *2003 3.0 † No Publisher 2Archive20052.0No Publisher 3Archive20052.3No Publisher 4Archive20062.2No Publisher 5Archive20062.3Yes Publisher 6Publish20062.3Yes Publisher 7Publish & book20062.3No Publisher 8Book *20072.2No Publisher 9Publish20072.3No Publisher 10Archive20072.3No Publisher 11Publish20072.3No Publisher 12Publish20072.3No Publisher 13Publish20082.3Yes Publisher 14Publish & book20082.3No Publisher 15Publish20082.3No Publisher 16Book20092.3No Publisher 17Publish20092.3No Publisher 18Publish20102.3No Publisher 19Publish *20103.0No Publisher 20Archive20103.0No Publisher 21Publish *20103.0Yes JATS-conAuthoring20103.0Yes Supplier 1Publish20082.3Yes Supplier 2Publish20072.3No Supplier 3Book20103.0Yes * Customized version of DTD beyond OASIS-CALS addition † Upgraded from 1.0 to 3.0 in 2010

7 Copyright  2010 Inera Incorporated. All Rights Reserved Year of DTD Adoption  Few implementations prior to 2006  Mostly related to PMC deposit  Adoption rate grows in 2006 and later  Maturity of version 2.0 in August 2004  Greater public awareness by 2006  Freely available and modifiable  Flexible  Not just for life science content  More off-the-shelf tool support from NCBI and others  3.0 upgrade not automatic; not fully backwards compatible

8 Copyright  2010 Inera Incorporated. All Rights Reserved Prior Markup Experience  Most had not used full-text XML or SGML  Driven to NLM DTD for:  More modern XML-based workflow  Desire for full-text to drive HTML and archive needs  PMC deposit  Those with SGML experience  SGML to XML conversion choice  Convert existing DTD to XML  Adopt NLM DTD

9 Copyright  2010 Inera Incorporated. All Rights Reserved DTD Selection  Most adopters use Journal Publishing (blue) DTD  Early adopters chose Archive and Interchange (green) DTD  Blue was too restrictive prior to 2.0  ISSN optional in green; hosts non-serial publications without modification  Book DTD use growing in recent years  Not as mature as journals, but useful

10 Copyright  2010 Inera Incorporated. All Rights Reserved Implementation Characteristics OrganizationChar EncodingMathTablesList LabelsRef PCDATA Publisher 1ISOMathMLHTMLDROP Publisher 2ISOGraphicHTMLDROP Publisher 3UnicodeGraphicHTMLDROPKEEP Publisher 4ISOMathMLCALSDROPKEEP Publisher 5ISOMathMLHTMLDROPKEEP Publisher 6ISOTeXCALSDROPKEEP Publisher 7UnicodeGraphicHTMLDROP Publisher 8ISOGraphicCALSKEEP Publisher 9ISOMathMLHTMLDROP Publisher 10UnicodeGraphicHTMLDROP Publisher 11UnicodeMathMLHTMLDROP Publisher 12UnicodeMathMLHTMLKEEP Publisher 13UnicodeGraphicCALSKEEP Publisher 14UnicodeGraphicCALSKEEP Publisher 15UnicodeGraphicHTMLDROP Publisher 16UnicodeGraphicHTMLKEEP Publisher 17UnicodeGraphicHTMLDROPKEEP Publisher 18UnicodeGraphicHTMLDROPKEEP Publisher 19UnicodeGraphicCALSKEEPNA Publisher 20UnicodeNA KEEP Publisher 21UnicodeMathMLCALSKEEP JATS-conUnspecifiedMathMLHTMLDROPKEEP Supplier 1UnicodeMathMLHTMLKEEP Supplier 2ISOTeXCALSKEEP Supplier 3UnicodeMathML+graphicCALSKEEP

11 Copyright  2010 Inera Incorporated. All Rights Reserved Character Encoding  Most implementations use Unicode entities (e.g., β)  Quasi-human readable (unlike UTF-8)  Some use ISO entities (e.g. β)  Most human-readable  But Transform required for HTML

12 Copyright  2010 Inera Incorporated. All Rights Reserved Generated and Boilerplate text  Generated Text:  Inconsequential, formulaic, or stereotypical text, punctuation, and formatting omitted from an XML file, which is applied to content by a style sheet when an XML file is rendered  Boilerplate Text:  Inconsequential, formulaic, or stereotypical text, punctuation, and formatting that could have been omitted but which the publisher has chosen to keep in the XML file rather than to generate with a style sheet

13 Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Structure  NLM DTD is flexible  Permits generated or boilerplate text  Degree varies by tag set  Green DTD allows greatest degree of Boilerplate Text  Includes the element  Hypothesis: Flexibility of generated versus boilerplate text increased NLM DTD adoption

14 Copyright  2010 Inera Incorporated. All Rights Reserved List Labels  List-type attribute carries format information  Most publishers don’t keep list label  Possibly because HTML excludes list label  Books are an exception  List label useful for dis-continuous lists (e.g. items 1 to 4, intervening text, then items 5 to 8)

15 Copyright  2010 Inera Incorporated. All Rights Reserved Early Reference Models  Versions 1.0 through version 2.3 had the and elements  allowed PCDATA and any element order  allowed only elements in proscribed order  No way to restrict PCDATA without enforcing element order  Problematic when mixing parsed and unparsed references (e.g. gray literature)

16 Copyright  2010 Inera Incorporated. All Rights Reserved Reference Tagging 3.0  and  Former allows PCDATA  Latter allows only semantic elements  Neither proscribes order

17 Copyright  2010 Inera Incorporated. All Rights Reserved Reference Tagging  Most publishers keep PCDATA  All suppliers keep PCDATA  Reasons  Less style sheet setup (PDF, HTML, etc.)  PCDATA can easily be dropped  Suppliers: multiple publisher styles require less setup

18 Copyright  2010 Inera Incorporated. All Rights Reserved PCDATA Correlations  All element-citation users drop list labels  Some mixed-citation users drop list labels  Publishers decide on boilerplate text on per- element basis, not global all or nothing

19 Copyright  2010 Inera Incorporated. All Rights Reserved Math & Tables by Comp Application OrganizationComposition ApplicationMathTables Publisher 83B2GraphicCALS Publisher 213B2MathMLCALS Publisher 63B2TeXCALS Supplier 23B2TeXCALS Publisher 13B2MathMLHTML Supplier 13B2MathMLHTML Publisher 53B2 & InDesignMathMLHTML Publisher 11Antenna HouseMathMLHTML Publisher 4FrameMathMLCALS Publisher 19InDesignGraphicCALS Publisher 2InDesignGraphicHTML Publisher 3InDesignGraphicHTML Publisher 15InDesignGraphicHTML Publisher 16InDesignGraphicHTML Publisher 18InDesignGraphicHTML Publisher 13InDesign/TypefiGraphicCALS Publisher 14InDesign/TypefiGraphicCALS Supplier 3InDesign/TypefiMathML+graphicCALS Publisher 7InDesign/TypefiGraphicHTML JATS-conNAMathMLHTML Publisher 20NA Publisher 17PDF from WordGraphicHTML Publisher 9PDF from WordMathMLHTML Publisher 12PDF from WordMathMLHTML Publisher 10VenturaGraphicHTML

20 Copyright  2010 Inera Incorporated. All Rights Reserved Table Markup  XHTML is default NLM DTD model  CALS requires DTD modification  CALS has cell borders and table groups  InDesign & Frame support CALS, but not XHTML tables  3B2 users seem to prefer CALS tables  Must be converted to XHTML for online delivery  Theory: publishers adopt CALS when more appropriate for PDF/print composition systems

21 Copyright  2010 Inera Incorporated. All Rights Reserved Math Markup  NLM DTD permits MathML, TeX, pointers to graphic files  MathML is native XML markup, but…  MathML has limited browser support  Firefox is good; Safari is OK; IE has no MathML support  Most publishers deliver online math as images  MathML has limited composition support  InDesign does not have native MathML rendering  3B2 native rendering is TeX  Math model driven by PDF creation requirements

22 Copyright  2010 Inera Incorporated. All Rights Reserved Composition and Hosting OrganizationComp ApplicationComp LocationOnlinePMC Publisher 13B2OutsourceSelf-hostedNo Publisher 2InDesignIn-HouseSelf-hostedYes Publisher 3InDesignIn-HouseSelf-hostedYes Publisher 4FrameIn-HouseSelf-hostedNo Publisher 53B2 & InDesignOutsourceHighwireYes Publisher 63B2OutsourceHighwireYes Publisher 7InDesign/TypefiIn-HouseSelf-hostedYes Publisher 83B2OutsourceSelf-hostedNo Publisher 9PDF from WordIn-HouseSelf-hostedYes Publisher 10VenturaIn-HouseSelf-hostedNo Publisher 11Antenna HouseIn-HouseSelf-hostedYes Publisher 12PDF from WordIn-HouseSelf-hostedYes Publisher 13InDesign/TypefiIn-HouseHighwireYes Publisher 14InDesign/TypefiIn-HouseSelf-hostedNo Publisher 15InDesignIn-HouseSelf-hostedNo Publisher 16InDesignIn-HouseSelf-hostedNo Publisher 17PDF from WordIn-HouseSelf-hostedYes Publisher 18InDesignIn-HouseSelf-hostedYes Publisher 19InDesignIn-HouseSelf-hostedNo Publisher 20NA Self-hostedNo Publisher 213B2In-HouseSelf-hostedSome JATS-conNA Self-hostedNo Supplier 13B2SupplierVariousSome Supplier 23B2SupplierVariousNo Supplier 3InDesign/TypefiSupplierVariousNo

23 Copyright  2010 Inera Incorporated. All Rights Reserved Composition and Online Hosting  Majority of users  Typeset in-house  Self-host online version  PMC delivery requirement for half of users  However… this correlation may be significant only among organizations that have chosen to create XML in-house

24 Copyright  2010 Inera Incorporated. All Rights Reserved Conclusions  NLM DTD flexibility led to broader adoption  Application of DTD can be adjusted to meet needs of specific publishing requirements or tools  NLM DTD standard facilitates in-house XML implementation  Eliminates R&D requirement to create a DTD  Customizable off-the-shelf tools available  Cost-effective solution for small and medium-size publishers

25 Copyright  2010 Inera Incorporated. All Rights Reserved Questions? Bruce Rosenblum Inera Incorporated +1 (617) 932 - 1932 brosenblum@inera.com www.inera.com


Download ppt "Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum."

Similar presentations


Ads by Google