Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reducing Costs and Expanding XML Submissions with PDF to JATS Conversion by Keishi KATOH ( 加藤圭志 ) DIGITAL COMMUNICATIONS Co Ltd.

Similar presentations


Presentation on theme: "Reducing Costs and Expanding XML Submissions with PDF to JATS Conversion by Keishi KATOH ( 加藤圭志 ) DIGITAL COMMUNICATIONS Co Ltd."— Presentation transcript:

1 Reducing Costs and Expanding XML Submissions with PDF to JATS Conversion by Keishi KATOH ( 加藤圭志 ) DIGITAL COMMUNICATIONS Co Ltd

2 Agenda JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 2  About J-STAGE  Service overview  Positioning of Bibliographic XML creation tool  Bibliographic XML creation tool  Tool workflow  Conversion from PDF to JATS XML  Demonstration of the tool  Conversion results analysis and future improvements

3 Brief introduction for J-STAGE and bibliographic XML creation tool JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS3

4 About J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 4  J-STAGE = “Japan Science and Technology Information Aggregator, Electronic”  The major e-journal publishing platforms of Japan provided by Japan Science and Technology Agency (JST)  1,684 titles, 2.4M articles (Oct 2012)  www.jstage.jst.go.jp  J-STAGE3 the new platform was launched in May 2012  With JATS XML submission (full text / bibliographic info)

5 Service positioning of J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 5 Copyright ©2012 Japan Science and Technology Agency The brand names and product names are registered trademarks of respective companies.

6 Bibliographic XML creation tool in J-STAGE JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 6 J-STAGE Academic Society Internet Article PDF Article PDF JATS bib XML JATS bib XML Bibliographic XML creation tool J-STAGE public system J-STAGE registration system Users access from the internet Here

7 The tool with reasons JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 7  Is XML easy?  XML spec is simple  JATS tag suite is easily understood  Domain specific light-weight tag set  Easy structures and attributes  Easily created from author’s data!!  Difficulty for authors to create papers in XML format  Many various tools used for writing the papers  Printing / production process from writing to publishing  Printing company’s capabilities to work with XML  Higher skills required using XML

8 Why from PDF? JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 8  Various tools and formats in publication  For writing: Word, TeX…  For printing:  DTP Tools - InDesign, FrameMaker  Automated publishing systems - 3B2/APP, AH Formatter  For distributing: PDF, HTML, XML…  Almost all academic societies have PDFs

9 Conversion workflow JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS9

10 Workflow with two phases JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 10  Phase 1: Template pattern creation  Phase 2: Registration of PDF and conversion to XML Phase 1: Template pattern creation Phase 2: XML conversion Sample Article PDF Sample Article PDF Automatic Analyze Template Pattern Template Pattern Article PDF Article PDF XML Conversion JATS XML JATS XML Article PDF Article PDF Article PDF Article PDF Article PDF Article PDF JATS XML JATS XML JATS XML JATS XML JATS XML JATS XML Automatic Analyze Details are shown in a demonstration

11 Sources & Outputs JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 11  Source: PDF  ver. 1.3~1.5  Fonts are embedded, not rasterized and scanned PDF  Without security permission flag  Output: JATS valid XML  With J-STAGE’s XML submission guideline compliant  Bibliographic elements

12 Demonstration JATS-Con 2012Copyright ©2012 DIGITAL COMMUNICATIONS12

13 Demo contents JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 13  Create new template  Select sample PDF for template  Set page margin  Setting of template pattern  Select the ‘block’  Assign ‘pseudo-JATS’ elements to blocks  About Japanese-English contents  PDFs Conversion using template pattern  Converting process  XML Editing  (Empty template)

14 practices in 30 sec JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 14 山山  mountain 木木  tree 鳥鳥  bird 魚魚  fish 亀亀  tortoise

15 Create a new template JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 15  Go to Create new template function  Select sample PDF and submit  Set page margin

16 Analyzing PDF JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 16 Header / Footer region to next page Contents flow order Contents region

17 Template settings JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 17  Select ‘Block’ for extracting information  Assign Pseudo-JATS item to block

18 Selecting block JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 18  Block type  Paragraphs with heading  Paragraphs only  Selecting methods  Font name, size, bold/italic  Text pattern  Page range, region on the page  Block continues until other selection settings’ block

19 Assign a pseudo-JATS item JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 19  Pseudo-JATS items denotes ‘Not single xml element of JATS’  trans-title and title  kwd-group and kwd  Items for English and Japanese

20 Configure pseudo-JATS item JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 20  Content region  Whole block  Select by condition  With heading  With inline heading  Pseudo-JATS specific setting  Dividing keywords  contrib-author to institution

21 Preview of conversion JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 21  Preview with design of J-STAGE public system  Some XML structure information

22 Workflow with two phases (again) JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 22  Phase 1: Template pattern creation  Phase 2: Registration of PDF and conversion to XML Phase 1: Template pattern creation Phase 2: XML conversion Sample Article PDF Sample Article PDF Automatic Analyze Template Pattern Template Pattern Article PDF Article PDF XML Conversion JATS XML JATS XML Article PDF Article PDF Article PDF Article PDF Article PDF Article PDF JATS XML JATS XML JATS XML JATS XML JATS XML JATS XML Automatic Analyze Details are shown in a demonstration

23 Convert and edit articles JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 23  Upload PDFs and select the template  Wait a seconds  Check and edit extracted data  Get XML!!

24 Conversion results JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 24  Conversion accuracy with 10 journals, about 10 articles JournalLanguageAutomatic recognition rate Avg Min Max Number of articles ELJ/E91%58%100%10 JOJ/E97%89%100%10 JEJ/E98%95%99%10 CLE93%86%100%10 TRE90%50%100%10 JIJ/E91%83%96%8 NIJ91%83%100%10 BUJ/E93%75%98%8 ADE100%97%100%7 PJE98%90%100%9 Errata / essays are excluded from the evaluation. Recognizing failures in references and keywords

25 Future improvements JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 25  Improvement of PDF analyzer engine  Recognition of text blocks  Columns and sequence of text flow  Reconstruction algorithms with text content  Dehyphenation and space insertion  JATS context recognizing ability  Template setting pattern  Additional Bibliographic elements  For full text into JATS XML  Extract images, vector graphics  Equations *details are undecided at this time.

26 Conclusion JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 26  Bibliographic XML creation tool is provided.  Easy settings, easy editing  But need more improvements  Utilization trend of bibliographic XML creation tool  From access analysis, Some societies are using the tool with publication interval (monthly / bi-monthly)  790 articles with 33 journals are registered in 4 months

27 Contacts JATS-Con 2012 Copyright ©2012 DIGITAL COMMUNICATIONS 27 J-STAGE services Japan Science and Technology Agency contact@jstage.jst.go.jp www.jstage.jst.go.jp Technical questions DIGITAL COMMUNICATIONS Co., Ltd. dc-eigyou@sgml-xml.jp www.sgml-xml.jp Antenna House, Inc. International sales info@antennahouse.com +1 302-427-2456


Download ppt "Reducing Costs and Expanding XML Submissions with PDF to JATS Conversion by Keishi KATOH ( 加藤圭志 ) DIGITAL COMMUNICATIONS Co Ltd."

Similar presentations


Ads by Google