Reducing Costs and Expanding XML Submissions with PDF to JATS Conversion by Keishi KATOH ( 加藤圭志 ) DIGITAL COMMUNICATIONS Co Ltd
Agenda JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 2 About J-STAGE Service overview Positioning of Bibliographic XML creation tool Bibliographic XML creation tool Tool workflow Conversion from PDF to JATS XML Demonstration of the tool Conversion results analysis and future improvements
Brief introduction for J-STAGE and bibliographic XML creation tool JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS3
About J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 4 J-STAGE = “Japan Science and Technology Information Aggregator, Electronic” The major e-journal publishing platforms of Japan provided by Japan Science and Technology Agency (JST) 1,684 titles, 2.4M articles (Oct 2012) J-STAGE3 the new platform was launched in May 2012 With JATS XML submission (full text / bibliographic info)
Service positioning of J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 5 Copyright ©2012 Japan Science and Technology Agency The brand names and product names are registered trademarks of respective companies.
Bibliographic XML creation tool in J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 6 J-STAGE Academic Society Internet Article PDF Article PDF JATS bib XML JATS bib XML Bibliographic XML creation tool J-STAGE public system J-STAGE registration system Users access from the internet Here
The tool with reasons JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 7 Is XML easy? XML spec is simple JATS tag suite is easily understood Domain specific light-weight tag set Easy structures and attributes Easily created from author’s data!! Difficulty for authors to create papers in XML format Many various tools used for writing the papers Printing / production process from writing to publishing Printing company’s capabilities to work with XML Higher skills required using XML
Why from PDF? JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 8 Various tools and formats in publication For writing: Word, TeX… For printing: DTP Tools - InDesign, FrameMaker Automated publishing systems - 3B2/APP, AH Formatter For distributing: PDF, HTML, XML… Almost all academic societies have PDFs
Conversion workflow JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS9
Workflow with two phases JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 10 Phase 1: Template pattern creation Phase 2: Registration of PDF and conversion to XML Phase 1: Template pattern creation Phase 2: XML conversion Sample Article PDF Sample Article PDF Automatic Analyze Template Pattern Template Pattern Article PDF Article PDF XML Conversion JATS XML JATS XML Article PDF Article PDF Article PDF Article PDF Article PDF Article PDF JATS XML JATS XML JATS XML JATS XML JATS XML JATS XML Automatic Analyze Details are shown in a demonstration
Sources & Outputs JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 11 Source: PDF ver. 1.3~1.5 Fonts are embedded, not rasterized and scanned PDF Without security permission flag Output: JATS valid XML With J-STAGE’s XML submission guideline compliant Bibliographic elements
Demonstration JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS12
Demo contents JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 13 Create new template Select sample PDF for template Set page margin Setting of template pattern Select the ‘block’ Assign ‘pseudo-JATS’ elements to blocks About Japanese-English contents PDFs Conversion using template pattern Converting process XML Editing (Empty template)
practices in 30 sec JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 14 山山 mountain 木木 tree 鳥鳥 bird 魚魚 fish 亀亀 tortoise
Create a new template JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 15 Go to Create new template function Select sample PDF and submit Set page margin
Analyzing PDF JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 16 Header / Footer region to next page Contents flow order Contents region
Template settings JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 17 Select ‘Block’ for extracting information Assign Pseudo-JATS item to block
Selecting block JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 18 Block type Paragraphs with heading Paragraphs only Selecting methods Font name, size, bold/italic Text pattern Page range, region on the page Block continues until other selection settings’ block
Assign a pseudo-JATS item JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 19 Pseudo-JATS items denotes ‘Not single xml element of JATS’ trans-title and title kwd-group and kwd Items for English and Japanese
Configure pseudo-JATS item JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 20 Content region Whole block Select by condition With heading With inline heading Pseudo-JATS specific setting Dividing keywords contrib-author to institution
Preview of conversion JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 21 Preview with design of J-STAGE public system Some XML structure information
Workflow with two phases (again) JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 22 Phase 1: Template pattern creation Phase 2: Registration of PDF and conversion to XML Phase 1: Template pattern creation Phase 2: XML conversion Sample Article PDF Sample Article PDF Automatic Analyze Template Pattern Template Pattern Article PDF Article PDF XML Conversion JATS XML JATS XML Article PDF Article PDF Article PDF Article PDF Article PDF Article PDF JATS XML JATS XML JATS XML JATS XML JATS XML JATS XML Automatic Analyze Details are shown in a demonstration
Convert and edit articles JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 23 Upload PDFs and select the template Wait a seconds Check and edit extracted data Get XML!!
Conversion results JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 24 Conversion accuracy with 10 journals, about 10 articles JournalLanguageAutomatic recognition rate Avg Min Max Number of articles ELJ/E91%58%100%10 JOJ/E97%89%100%10 JEJ/E98%95%99%10 CLE93%86%100%10 TRE90%50%100%10 JIJ/E91%83%96%8 NIJ91%83%100%10 BUJ/E93%75%98%8 ADE100%97%100%7 PJE98%90%100%9 Errata / essays are excluded from the evaluation. Recognizing failures in references and keywords
Future improvements JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 25 Improvement of PDF analyzer engine Recognition of text blocks Columns and sequence of text flow Reconstruction algorithms with text content Dehyphenation and space insertion JATS context recognizing ability Template setting pattern Additional Bibliographic elements For full text into JATS XML Extract images, vector graphics Equations *details are undecided at this time.
Conclusion JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 26 Bibliographic XML creation tool is provided. Easy settings, easy editing But need more improvements Utilization trend of bibliographic XML creation tool From access analysis, Some societies are using the tool with publication interval (monthly / bi-monthly) 790 articles with 33 journals are registered in 4 months
Contacts JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 27 J-STAGE services Japan Science and Technology Agency Technical questions DIGITAL COMMUNICATIONS Co., Ltd. Antenna House, Inc. International sales