Reducing Costs and Expanding XML Submissions with PDF to JATS Conversion by Keishi KATOH ( 加藤圭志 ) DIGITAL COMMUNICATIONS Co Ltd.

Slides:



Advertisements
Similar presentations
1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
Advertisements

Using CAB Abstracts to Search for Articles. Objectives Learn what CAB Abstracts is Know the main features of CAB Abstracts Learn how to conduct searches.
Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
Japan Science and Technology Agency (JST) 7 th International Conference on Grey Literature 6 December, 2005 J-STAGE: System for Publishing and Linking.
EPrints Web Configuratio n Management. SQL database Web server Scripts to configure repository activities Configuration files EPrints - the Administrator's.
Publisher: Name of service: License in place: within Service Type:
African Journals Online (AJOL). Publisher: Various Name of service: African Journals Online (AJOL) Tables of contents and abstracts available to all users.
The Advanced, Enterprise Publishing Environment for Cross-media Output to Print & Web.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
Chapter 2 HTML Basics Key Concepts
1 Module 7: Science. Objectives 2 Welcome to the Cayuse424 Science module. In this module you will learn:  Cayuse424 Basic Template Concepts.  How to.
Soichi Tokizane Aichi University
® Copyright 2008 Adobe Systems Incorporated. All rights reserved. ADOBE® ACCESSIBILITY Achieving Accessibility with PDF Greg Pisocky Accessibility Specialist.
INSERT BOOK COVER 1Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Access 2010 by Robert Grauer, Keith Mast,
SAE INTERNATIONAL Copyright (c) 2015 SAE International and Data Conversion Laboratory. Further use or distribution is not permitted without permission.
© 2011 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Kiran Kaja | Accessibility Engineer Ensuring Accessibility in Document Conversion.
HINARI website interface, journals, and other full text resources (module 2)
Extraction of text data and hyperlink structure from scanned images of mathematical journals Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Bookshelf Leafing through XML NLM Journal Article Tag Suite Conference 2010 Martin Latterner and Marilu Hoeppner National Center for Biotechnology Information.
IURC Digital Mapping Update Electric Utilities Task Force April 6, 2004.
Author Generated JATS XML Markup
© Copyright 2003, Binomial International Inc. Phoenix Business Continuity and Disaster Recovery Planning Software Recovery Planning Software Tools Recovery.
Copyright AdventSoft Technologies, Inc. – 2002 AdventSoft Legacy Document Conversion Services AdventSoft Technologies, Inc. High Quality, Fast Turn-around.
Accessible PDFs Mark Hale SCIT 6/20/2013. Agenda PDF issues Bottom Line on PDFs How to Triage PDFs Your PDF Plan 2.
Today’s Agenda Bill Presentment Overview Demo. Tailoring Your Invoices with Oracle’s Bill Presentment Architecture March 7, 2005.
Creating Accessible PDF’s in Adobe Acrobat Professional 7.0.
Understanding HTML Style Sheets. What is a style?  A style is a rule that defines the appearance and position of text and graphics. It may define the.
Tuesday, Februray 3, Patent E-commerce at the United States Patent and Trademark Office Presented by: Edmund Crump, Mary Small and Bill Stryjewski.
XP Practical PC, 3e Chapter 10 1 Writing and Printing Documents.
IT Introduction to Website Development Welcome!
WorkPlace Pro Utilities.
Managing journals: challenges and opportunities How to get started (with OJS) Jackie Proven.
ULI101 – XHTML Basics (Part II) What is Markup Language? XHTML vs. HTML General XHTML Rules Block Level XHTML Tags XHTML Validation.
UNMC RESEARCH USER GROUP (RUG) GRANT DOCUMENTS & DEVELOPMENT Converting Word Documents to PDFs.
Recent Activity in JST Information Service Miho HORIUCHI Department of Planning and Coordination Office of Science and Technology Information.
Collaborative Approach to Open Access: Experience from Bioline International Leslie Chan Associate Director Bioline International University of Toronto.
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 Exploring Microsoft Office Word 2007 Chapter 8 Word and the Internet Robert Grauer, Keith.
Chapter 2 HTML Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D 1.
ICT Key Skills Level 3 Strategy of Achievement. Strategy Diagnose for Level 2 –Diagnostic Assessment 5 to 6 hours –Feedback to learner Verbal Immediate.
Peoplesoft XML Publisher Integration with PeopleTools -Jayalakshmi S.
Usage log analysis of the contents of institutional repositories: user domains, types of referrals and content attributes Usage log analysis of the contents.
11/2/2003 Diane Mueller Sr. Program Manager, XML Content Solutions XBRL/Seattle Internal Reporting Session Streamlining Financial.
J-STAGE, NOW NEXT STAGE large scale scholarly e-journal platform of Japan.
Enabling High-Quality Printing in Web Applications
Enricher Converter Analyzer Parser & Renderer UNIVERSAL, FAST AND RELIABLE.
1 J-STAGE Electronic Journal Publication & Dissemination Center
Copyright 2003 © NEXT SOLUTION CO., LTD. NEXT SOLUTION CO., LTD. DEMONSTRATION OF DSSSL PROCESSOR 4/30/2003.
Esri UC2013. Technical Workshop. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Supporting High-Quality.
SPRINGER ONLINE
© 2011 Pearson Education, Inc., publishing as Longman Publishers. 1 Chapter 13 Designing Pages and Documents Technical Communication, 12 th Edition John.
® Copyright 2008 Adobe Systems Incorporated. All rights reserved. ADOBE® ACCESSIBILITY PDF Accessibility – Best Practices for Authoring Pete DeVasto Greg.
WORLD CONSORTIUM Welcome to. An overview by Phil Elliott Satzconcept Skandinavia a.s.
1 Prints: The Language of Industry. Learning Objectives Identify the importance of prints. Discuss historical processes and technologies related to prints.
Accurate  Consistent  Compliant Contact: i4i the structured content company the structured content company.
Designing Accessible Documents for Everyone Carolyn Kelley Klinger February 18, 2010 Carolina Chapter, Society for Technical Communication Note: The slides.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Lesson 7 Working with Themes, Style Sets, Backgrounds, Quick Parts, and Text Boxes MOAC.
ABSTRACT This is the template for preparing posters for the Electrical Safety Workshop (ESW). It is intended to define the required format for printing.
Moshe Shechter | Alma Product Manager
Learning Resource Management and Development System
Journal of Mountain Science (JMS)
Title of paper for IPIC 2017 Conference
Development of infographic Service for journals and articles
Gain Global Exposure: Partner with EBSCO to Promote your Scholarship
CONCERT 2001 October 3, 2001 Maurice Kwong Springer Verlag Hong Kong
NIMAC for Publishers & Vendors: Delivering Files
Lars Ballieu Christensen Advisor, Ph.D., M.Sc. Tanja Stevns
Word Processing and Desktop Publishing Software
Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Presentation transcript:

Reducing Costs and Expanding XML Submissions with PDF to JATS Conversion by Keishi KATOH ( 加藤圭志 ) DIGITAL COMMUNICATIONS Co Ltd

Agenda JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 2  About J-STAGE  Service overview  Positioning of Bibliographic XML creation tool  Bibliographic XML creation tool  Tool workflow  Conversion from PDF to JATS XML  Demonstration of the tool  Conversion results analysis and future improvements

Brief introduction for J-STAGE and bibliographic XML creation tool JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS3

About J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 4  J-STAGE = “Japan Science and Technology Information Aggregator, Electronic”  The major e-journal publishing platforms of Japan provided by Japan Science and Technology Agency (JST)  1,684 titles, 2.4M articles (Oct 2012)   J-STAGE3 the new platform was launched in May 2012  With JATS XML submission (full text / bibliographic info)

Service positioning of J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 5 Copyright ©2012 Japan Science and Technology Agency The brand names and product names are registered trademarks of respective companies.

Bibliographic XML creation tool in J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 6 J-STAGE Academic Society Internet Article PDF Article PDF JATS bib XML JATS bib XML Bibliographic XML creation tool J-STAGE public system J-STAGE registration system Users access from the internet Here

The tool with reasons JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 7  Is XML easy?  XML spec is simple  JATS tag suite is easily understood  Domain specific light-weight tag set  Easy structures and attributes  Easily created from author’s data!!  Difficulty for authors to create papers in XML format  Many various tools used for writing the papers  Printing / production process from writing to publishing  Printing company’s capabilities to work with XML  Higher skills required using XML

Why from PDF? JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 8  Various tools and formats in publication  For writing: Word, TeX…  For printing:  DTP Tools - InDesign, FrameMaker  Automated publishing systems - 3B2/APP, AH Formatter  For distributing: PDF, HTML, XML…  Almost all academic societies have PDFs

Conversion workflow JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS9

Workflow with two phases JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 10  Phase 1: Template pattern creation  Phase 2: Registration of PDF and conversion to XML Phase 1: Template pattern creation Phase 2: XML conversion Sample Article PDF Sample Article PDF Automatic Analyze Template Pattern Template Pattern Article PDF Article PDF XML Conversion JATS XML JATS XML Article PDF Article PDF Article PDF Article PDF Article PDF Article PDF JATS XML JATS XML JATS XML JATS XML JATS XML JATS XML Automatic Analyze Details are shown in a demonstration

Sources & Outputs JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 11  Source: PDF  ver. 1.3~1.5  Fonts are embedded, not rasterized and scanned PDF  Without security permission flag  Output: JATS valid XML  With J-STAGE’s XML submission guideline compliant  Bibliographic elements

Demonstration JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS12

Demo contents JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 13  Create new template  Select sample PDF for template  Set page margin  Setting of template pattern  Select the ‘block’  Assign ‘pseudo-JATS’ elements to blocks  About Japanese-English contents  PDFs Conversion using template pattern  Converting process  XML Editing  (Empty template)

practices in 30 sec JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 14 山山  mountain 木木  tree 鳥鳥  bird 魚魚  fish 亀亀  tortoise

Create a new template JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 15  Go to Create new template function  Select sample PDF and submit  Set page margin

Analyzing PDF JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 16 Header / Footer region to next page Contents flow order Contents region

Template settings JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 17  Select ‘Block’ for extracting information  Assign Pseudo-JATS item to block

Selecting block JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 18  Block type  Paragraphs with heading  Paragraphs only  Selecting methods  Font name, size, bold/italic  Text pattern  Page range, region on the page  Block continues until other selection settings’ block

Assign a pseudo-JATS item JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 19  Pseudo-JATS items denotes ‘Not single xml element of JATS’  trans-title and title  kwd-group and kwd  Items for English and Japanese

Configure pseudo-JATS item JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 20  Content region  Whole block  Select by condition  With heading  With inline heading  Pseudo-JATS specific setting  Dividing keywords  contrib-author to institution

Preview of conversion JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 21  Preview with design of J-STAGE public system  Some XML structure information

Workflow with two phases (again) JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 22  Phase 1: Template pattern creation  Phase 2: Registration of PDF and conversion to XML Phase 1: Template pattern creation Phase 2: XML conversion Sample Article PDF Sample Article PDF Automatic Analyze Template Pattern Template Pattern Article PDF Article PDF XML Conversion JATS XML JATS XML Article PDF Article PDF Article PDF Article PDF Article PDF Article PDF JATS XML JATS XML JATS XML JATS XML JATS XML JATS XML Automatic Analyze Details are shown in a demonstration

Convert and edit articles JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 23  Upload PDFs and select the template  Wait a seconds  Check and edit extracted data  Get XML!!

Conversion results JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 24  Conversion accuracy with 10 journals, about 10 articles JournalLanguageAutomatic recognition rate Avg Min Max Number of articles ELJ/E91%58%100%10 JOJ/E97%89%100%10 JEJ/E98%95%99%10 CLE93%86%100%10 TRE90%50%100%10 JIJ/E91%83%96%8 NIJ91%83%100%10 BUJ/E93%75%98%8 ADE100%97%100%7 PJE98%90%100%9 Errata / essays are excluded from the evaluation. Recognizing failures in references and keywords

Future improvements JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 25  Improvement of PDF analyzer engine  Recognition of text blocks  Columns and sequence of text flow  Reconstruction algorithms with text content  Dehyphenation and space insertion  JATS context recognizing ability  Template setting pattern  Additional Bibliographic elements  For full text into JATS XML  Extract images, vector graphics  Equations *details are undecided at this time.

Conclusion JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 26  Bibliographic XML creation tool is provided.  Easy settings, easy editing  But need more improvements  Utilization trend of bibliographic XML creation tool  From access analysis, Some societies are using the tool with publication interval (monthly / bi-monthly)  790 articles with 33 journals are registered in 4 months

Contacts JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 27 J-STAGE services Japan Science and Technology Agency Technical questions DIGITAL COMMUNICATIONS Co., Ltd. Antenna House, Inc. International sales