Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum.

Slides:



Advertisements
Similar presentations
NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
1. Content – Collective term for all text, images, videos, etc. that you want to deliver to your audience. 2. Structure – How the content is placed on.
NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM Journal Archiving Vocabulary.
EXtyles, Typéfi, and the Journal Publishing DTD Louise Adam, FASS Chandi Perera, Typéfi Systems.
NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs.
PubMed Central Mahyar Ahmadpour-B. Kowsar Publicatin Corp. Kowsar Editorial Meeting 1 September 19th, 2013 Tehran, Iran.
Evan Owens Chief Information Officer, Publishing American Institute of Physics JATS Conference 2 November 2010 The Evolving Information Ecosystem of Publishing.
History Leading to XHTML
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2004.
Presenter : Mohit Pabby Product Trainer Elsevier & Web 2.0.
NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM Journal Archiving Vocabulary.
Bookshelf Leafing through XML NLM Journal Article Tag Suite Conference 2010 Martin Latterner and Marilu Hoeppner National Center for Biotechnology Information.
Tutorial 9 Working with XHTML
Contents and Formats Existing Digital Sources Gertraud Griepke Cornell University, July 26th 2002.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
XML – Extensible Markup Language Sivakumar Kuttuva & Janusz Zalewski.
Subcommittee 3D DATA SETS FOR LIBRARIES. SC 3D Exchange of dictionary data Cape Town, (Cape Town/Radley)3 Donald Radley Chairman, SC3D.
Slide 1Copyright  2003 Inera Incorporated. All Rights Reserved Using XML Presented by Bruce D. Rosenblum CEO Inera Incorporated InfoToday – May 7, 2003.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XP The University of Akron Summit College Business Technology Department Computer Information Systems 2440: 140 Internet Tools Instructor: Enoch E. Damson.
November 1&2, Are we there yet? YES What to expect along the way A Brief History Some Jargon you may need to know First Detour: NLM DTD vs PMC.
XML Technologies Surekha Akula
Open Textbooks and Electronic Publishing Formats/Standards Arctic Virtual Learnng Tools
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
April 30, 2003CENDI Workshop, Wash. DC XML for Technical Reports Kurt Maly, M. Zubair Old Dominion University.
(the NLM DTDs) Update on the NLM Journal Article Tag Suite Jeffrey Beck
Introduction. Document Structure Overview  XML declaration (prolog)  Document type declaration  Root element (namespace)  Document header  Document.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Publisher’s Perspective: Digitization of print resources, and archiving of digital resources Judy Best, June 13, 2006.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
September 26 & 27, No Smoking! (yes, they are serious about this)
10/18/2015 NORTEL NETWORKS CONFIDENTIAL – FOR TRAINING PURPOSES ONLY Global Documentation Evolution System Overview and End-to-End Process Training.
 Whether using paper forms or forms on the web, forms are used for gathering information. User enter information into designated areas, or fields. Forms.
XHTML By Trevor Adams. Topics Covered XHTML eXtensible HyperText Mark-up Language The beginning – HTML Web Standards Concept and syntax Elements (tags)
Accessing journals by via PubMed Note the link to find articles through HINARI/PubMed. Using this option will be covered in later in the Short Course.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
Andy Dawson– University College London 1 EABH SUMMER SCHOOL Web Page Construction Andy Dawson Department of Information Studies, UCL.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Partner Publishers’ Websites From the Partner publisher services dropdown menu, click on the Elsevier Science - Science Direct website. Note that this.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
From Access to Archive Transforming Scholars Portal into an E-Journal Archive.
WORLD CONSORTIUM Welcome to. An overview by Phil Elliott Satzconcept Skandinavia a.s.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
May 6, 2009 Browser Compatibility Testing Definition It is a non functional type of testing where web based applications are tested on various browsers(IE.
Linda Schmandt Structured Text & XML in Medicine 16 Jan 2004.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Scientific Markup Languages Birds of a Feather Brief Overview of MathML Timothy W. Cole Mathematics Librarian & Professor of Library.
Use of XML in the Publications Office: Critical issues for publishing Dr. Holger Bagola Publications Office DIR/R 5 “IT Projects” section “Formats & Linguistic.
Jabin White, Elsevier ScienceGetting XML from a non-XML Workforce XML 2001 Getting XML from a non-XML Workforce Jabin White Executive Director, Electronic.
21 October 2000 MathML & Math on the Web Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web Timothy W. Cole.
NATIONAL LIBRARY OF MEDICINE PubMed Central, an XML-based Archive of Life Sciences Journal Articles (at the US National Library of Medicine) Jeff Beck.
© 2005 KPIT Cummins Infosystems Limited We value our relationship XML Publisher Prafulla Kauthalkar RJTSB – Oracle Apps Consultant We value our relationship.
Kynn Bartlett 11 April 2001 STC San Diego The HTML Writers Guild Copyright © 2001 XML, XHTML, XSLT, and other X-named specifications.
Tutorial 9 Working with XHTML
Tutorial 9 Working with XHTML
Using XML, XSLT, and CSS in a Digital Library
Markup Languages Gilok Choi 9/17/2018
eXtensible Markup Language
Tutorial 9 Working with XHTML
XML Problems and Solutions
Using Cascading Style Sheets (CSS)
Allyson Falkner Spokane County ISD
Presentation transcript:

Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum CEO Inera Incorporated Journal Article Tag Suite Conference, 1 November 2010

Copyright  2010 Inera Incorporated. All Rights Reserved Remember When…

Copyright  2010 Inera Incorporated. All Rights Reserved Scholarly DTDs, Circa 2001 ISO Elsevier Elsevier Elsevier Elsevier 4.1 Blackwell 2.2 Blackwell 3.0 Blackwell 4.0 Keton Camdus Capital City Charlesworth Alden Highwire PMC 1.0 AIP UCP Wiley IEEE Nature BioOne U Chicago Press Cambridge Univeristy Press American GeoPhysical American Medical New England Journal American Chemical National Resarch Canada Academic Press Oxford University Press Academic Press Springer Lkuwer Academic

Copyright  2010 Inera Incorporated. All Rights Reserved Scholarly DTDs 2010  NLM DTD  Elsevier DTD  Springer DTD  Wiley-Blackwell DTD  And a few others…  No longer a grand mess, but…  NLM DTD Suite applications vary  Specific tagging practices meet publisher-specific requirements

Copyright  2010 Inera Incorporated. All Rights Reserved Data and Methodology  Data from 25 eXtyles and refXpress implementations since 2003  Not a scientific survey  However useful to show NLM DTD usage variations  Supplier requirements differ from publishers  Serve multiple publishers who deliver to different platforms

Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Adoption By Year OrganizationDTDYearVersionPrior XML Publisher 1Archive * † No Publisher 2Archive No Publisher 3Archive No Publisher 4Archive No Publisher 5Archive Yes Publisher 6Publish Yes Publisher 7Publish & book No Publisher 8Book * No Publisher 9Publish No Publisher 10Archive No Publisher 11Publish No Publisher 12Publish No Publisher 13Publish Yes Publisher 14Publish & book No Publisher 15Publish No Publisher 16Book No Publisher 17Publish No Publisher 18Publish No Publisher 19Publish * No Publisher 20Archive No Publisher 21Publish * Yes JATS-conAuthoring Yes Supplier 1Publish Yes Supplier 2Publish No Supplier 3Book Yes * Customized version of DTD beyond OASIS-CALS addition † Upgraded from 1.0 to 3.0 in 2010

Copyright  2010 Inera Incorporated. All Rights Reserved Year of DTD Adoption  Few implementations prior to 2006  Mostly related to PMC deposit  Adoption rate grows in 2006 and later  Maturity of version 2.0 in August 2004  Greater public awareness by 2006  Freely available and modifiable  Flexible  Not just for life science content  More off-the-shelf tool support from NCBI and others  3.0 upgrade not automatic; not fully backwards compatible

Copyright  2010 Inera Incorporated. All Rights Reserved Prior Markup Experience  Most had not used full-text XML or SGML  Driven to NLM DTD for:  More modern XML-based workflow  Desire for full-text to drive HTML and archive needs  PMC deposit  Those with SGML experience  SGML to XML conversion choice  Convert existing DTD to XML  Adopt NLM DTD

Copyright  2010 Inera Incorporated. All Rights Reserved DTD Selection  Most adopters use Journal Publishing (blue) DTD  Early adopters chose Archive and Interchange (green) DTD  Blue was too restrictive prior to 2.0  ISSN optional in green; hosts non-serial publications without modification  Book DTD use growing in recent years  Not as mature as journals, but useful

Copyright  2010 Inera Incorporated. All Rights Reserved Implementation Characteristics OrganizationChar EncodingMathTablesList LabelsRef PCDATA Publisher 1ISOMathMLHTMLDROP Publisher 2ISOGraphicHTMLDROP Publisher 3UnicodeGraphicHTMLDROPKEEP Publisher 4ISOMathMLCALSDROPKEEP Publisher 5ISOMathMLHTMLDROPKEEP Publisher 6ISOTeXCALSDROPKEEP Publisher 7UnicodeGraphicHTMLDROP Publisher 8ISOGraphicCALSKEEP Publisher 9ISOMathMLHTMLDROP Publisher 10UnicodeGraphicHTMLDROP Publisher 11UnicodeMathMLHTMLDROP Publisher 12UnicodeMathMLHTMLKEEP Publisher 13UnicodeGraphicCALSKEEP Publisher 14UnicodeGraphicCALSKEEP Publisher 15UnicodeGraphicHTMLDROP Publisher 16UnicodeGraphicHTMLKEEP Publisher 17UnicodeGraphicHTMLDROPKEEP Publisher 18UnicodeGraphicHTMLDROPKEEP Publisher 19UnicodeGraphicCALSKEEPNA Publisher 20UnicodeNA KEEP Publisher 21UnicodeMathMLCALSKEEP JATS-conUnspecifiedMathMLHTMLDROPKEEP Supplier 1UnicodeMathMLHTMLKEEP Supplier 2ISOTeXCALSKEEP Supplier 3UnicodeMathML+graphicCALSKEEP

Copyright  2010 Inera Incorporated. All Rights Reserved Character Encoding  Most implementations use Unicode entities (e.g., β)  Quasi-human readable (unlike UTF-8)  Some use ISO entities (e.g. β)  Most human-readable  But Transform required for HTML

Copyright  2010 Inera Incorporated. All Rights Reserved Generated and Boilerplate text  Generated Text:  Inconsequential, formulaic, or stereotypical text, punctuation, and formatting omitted from an XML file, which is applied to content by a style sheet when an XML file is rendered  Boilerplate Text:  Inconsequential, formulaic, or stereotypical text, punctuation, and formatting that could have been omitted but which the publisher has chosen to keep in the XML file rather than to generate with a style sheet

Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Structure  NLM DTD is flexible  Permits generated or boilerplate text  Degree varies by tag set  Green DTD allows greatest degree of Boilerplate Text  Includes the element  Hypothesis: Flexibility of generated versus boilerplate text increased NLM DTD adoption

Copyright  2010 Inera Incorporated. All Rights Reserved List Labels  List-type attribute carries format information  Most publishers don’t keep list label  Possibly because HTML excludes list label  Books are an exception  List label useful for dis-continuous lists (e.g. items 1 to 4, intervening text, then items 5 to 8)

Copyright  2010 Inera Incorporated. All Rights Reserved Early Reference Models  Versions 1.0 through version 2.3 had the and elements  allowed PCDATA and any element order  allowed only elements in proscribed order  No way to restrict PCDATA without enforcing element order  Problematic when mixing parsed and unparsed references (e.g. gray literature)

Copyright  2010 Inera Incorporated. All Rights Reserved Reference Tagging 3.0  and  Former allows PCDATA  Latter allows only semantic elements  Neither proscribes order

Copyright  2010 Inera Incorporated. All Rights Reserved Reference Tagging  Most publishers keep PCDATA  All suppliers keep PCDATA  Reasons  Less style sheet setup (PDF, HTML, etc.)  PCDATA can easily be dropped  Suppliers: multiple publisher styles require less setup

Copyright  2010 Inera Incorporated. All Rights Reserved PCDATA Correlations  All element-citation users drop list labels  Some mixed-citation users drop list labels  Publishers decide on boilerplate text on per- element basis, not global all or nothing

Copyright  2010 Inera Incorporated. All Rights Reserved Math & Tables by Comp Application OrganizationComposition ApplicationMathTables Publisher 83B2GraphicCALS Publisher 213B2MathMLCALS Publisher 63B2TeXCALS Supplier 23B2TeXCALS Publisher 13B2MathMLHTML Supplier 13B2MathMLHTML Publisher 53B2 & InDesignMathMLHTML Publisher 11Antenna HouseMathMLHTML Publisher 4FrameMathMLCALS Publisher 19InDesignGraphicCALS Publisher 2InDesignGraphicHTML Publisher 3InDesignGraphicHTML Publisher 15InDesignGraphicHTML Publisher 16InDesignGraphicHTML Publisher 18InDesignGraphicHTML Publisher 13InDesign/TypefiGraphicCALS Publisher 14InDesign/TypefiGraphicCALS Supplier 3InDesign/TypefiMathML+graphicCALS Publisher 7InDesign/TypefiGraphicHTML JATS-conNAMathMLHTML Publisher 20NA Publisher 17PDF from WordGraphicHTML Publisher 9PDF from WordMathMLHTML Publisher 12PDF from WordMathMLHTML Publisher 10VenturaGraphicHTML

Copyright  2010 Inera Incorporated. All Rights Reserved Table Markup  XHTML is default NLM DTD model  CALS requires DTD modification  CALS has cell borders and table groups  InDesign & Frame support CALS, but not XHTML tables  3B2 users seem to prefer CALS tables  Must be converted to XHTML for online delivery  Theory: publishers adopt CALS when more appropriate for PDF/print composition systems

Copyright  2010 Inera Incorporated. All Rights Reserved Math Markup  NLM DTD permits MathML, TeX, pointers to graphic files  MathML is native XML markup, but…  MathML has limited browser support  Firefox is good; Safari is OK; IE has no MathML support  Most publishers deliver online math as images  MathML has limited composition support  InDesign does not have native MathML rendering  3B2 native rendering is TeX  Math model driven by PDF creation requirements

Copyright  2010 Inera Incorporated. All Rights Reserved Composition and Hosting OrganizationComp ApplicationComp LocationOnlinePMC Publisher 13B2OutsourceSelf-hostedNo Publisher 2InDesignIn-HouseSelf-hostedYes Publisher 3InDesignIn-HouseSelf-hostedYes Publisher 4FrameIn-HouseSelf-hostedNo Publisher 53B2 & InDesignOutsourceHighwireYes Publisher 63B2OutsourceHighwireYes Publisher 7InDesign/TypefiIn-HouseSelf-hostedYes Publisher 83B2OutsourceSelf-hostedNo Publisher 9PDF from WordIn-HouseSelf-hostedYes Publisher 10VenturaIn-HouseSelf-hostedNo Publisher 11Antenna HouseIn-HouseSelf-hostedYes Publisher 12PDF from WordIn-HouseSelf-hostedYes Publisher 13InDesign/TypefiIn-HouseHighwireYes Publisher 14InDesign/TypefiIn-HouseSelf-hostedNo Publisher 15InDesignIn-HouseSelf-hostedNo Publisher 16InDesignIn-HouseSelf-hostedNo Publisher 17PDF from WordIn-HouseSelf-hostedYes Publisher 18InDesignIn-HouseSelf-hostedYes Publisher 19InDesignIn-HouseSelf-hostedNo Publisher 20NA Self-hostedNo Publisher 213B2In-HouseSelf-hostedSome JATS-conNA Self-hostedNo Supplier 13B2SupplierVariousSome Supplier 23B2SupplierVariousNo Supplier 3InDesign/TypefiSupplierVariousNo

Copyright  2010 Inera Incorporated. All Rights Reserved Composition and Online Hosting  Majority of users  Typeset in-house  Self-host online version  PMC delivery requirement for half of users  However… this correlation may be significant only among organizations that have chosen to create XML in-house

Copyright  2010 Inera Incorporated. All Rights Reserved Conclusions  NLM DTD flexibility led to broader adoption  Application of DTD can be adjusted to meet needs of specific publishing requirements or tools  NLM DTD standard facilitates in-house XML implementation  Eliminates R&D requirement to create a DTD  Customizable off-the-shelf tools available  Cost-effective solution for small and medium-size publishers

Copyright  2010 Inera Incorporated. All Rights Reserved Questions? Bruce Rosenblum Inera Incorporated +1 (617)