An exercise in conversion Dirk eHumanities 2012-01-26.

Slides:



Advertisements
Similar presentations
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Advertisements

Understanding in reading By Jocelyne GIASSON Ch. 2 : A model of teaching for understanding in reading Teaching explicitly. De Boeck, 1996 and 2008.
Standard Grade Notes General Purpose Packages. These are Software packages which allow the user to solve a range of problems.
Authoring Languages and Web Authoring Software 4.01 Examine web page development and design.
Selection and procurement of material Selection Analysis of the material + planning of workflow Analysis Scanning Digital photography Image manipulation.
CIS-100 Chapter 3—The Ribbon. The Ribbon When you first open Word 2007, you may be surprised by its new look. Most of the changes are in the Ribbon, the.
Web Usability By Chao Liang Based on Jakob Nielsen’s Book “ Designing Web Usability”
Aletheia Apostolos Antonacopoulos PRImA Lab, The University of Salford, United Kingdom
Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.
Data Representation in Computers
BizTalk Deployment using Visual Studio Release Management
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Dale & Lewis Chapter 3 Data Representation
Working with Files CSC 161: The Art of Programming Prof. Henry Kautz 11/9/2009.
1 ADVANCED MICROSOFT WORD Lesson 15 – Creating Forms and Working with Web Documents Microsoft Office 2003: Advanced.
Wireless Innovations for the Production Floor Visual Management Software Virtual Panel III Software Visual Messaging Software Factory Floor Communication.
Computer Science 101 Introduction to Programming.
Bits & Bytes: How Computers Represent Data
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
HTML (HyperText Markup Language)
3.2 Data Checking.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
1 Lesson 3 Power Techniques HTML and JavaScript BASICS, 4 th Edition Barksdale / Turner.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Programming Languages Meeting 13 December 2/3, 2014.
1 Functions Lecfture Abstraction abstraction is the process of ignoring minutiae and focusing on the big picture in modern life, we are constantly.
Computers and Scientific Thinking David Reed, Creighton University Functions and Libraries 1.
Meridium EPiServer Premium Partner EPiMore Partner EPiServer is a major focus area Founded in employees 2.
LYNN BRADSHAW CREATING WEB SITES WITH XARA WEB DESIGNER 7.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
XML for Text Markup An introduction to XML markup.
Data Representation Conversion 24/04/2017.
THE BINARY SYSTEM.
Data Types Lesson 4. Skills Matrix Table A table stores your data. Tables are relational in that they are organized as rows and columns (a matrix). Each.
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
OCR AS Applied ICT Business Documents. Big picture.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
Using CSS to Create Some Style Module 5: Beyond the Basics with Expression Web LESSON 5.
Why PACKZ? Innovation No real innovations in pre-press for years Offers new approach using standard open file formats Technology is moving fast, we are.
Prepare and present a substance abuse awareness class BackNext Provide Training Enabling Learning Objective.
CS Class 04 Topics  Selection statement – IF  Expressions  More practice writing simple C++ programs Announcements  Read pages for next.
Extend Office clients across platforms using web technologies. Office Add-ins.
Binary a. express numbers in binary, binary-coded decimal (BCD), octal and hexadecimal;
Boxes. boxes- learning targets o I will be able to display buttons (boxes) o I will be able to organize boxes o I will be able to create an animation.
Lesson 16-Templates and Wizards. Overview Use Word templates. Create new templates. Attach templates to documents. Modify templates. Use the Organizer.
Medway: Here we David Whiting SEPHIG, 16 June, 2016.
ECS – Storyboarding and Introduction to Web Design
UNIT 2 – LESSON 3 Encoding B&W Images.
Evaluating Algebraic Expressions
Bits & Bytes How Computers Represent Data
Providing Instruction & Explanation
Functions CIS 40 – Introduction to Programming in Python
Folders out, planners out…
TYPO3 - Introduction.
LEARNING OBJECTIVE: CONSOLIDATE OUR KNOWLEDGE OF DESCRIBING MY HOUSE AND MY OWN ROOM. SUCCESS CRITERIA: TO BE ABLE TO REMEMBER KEY WORDS FROM MEMORY FOR.
Use proper case (ie Caps for the beginnings of words)
The John Donne Society’s Digital Prose Project
Data Representation Conversion 05/12/2018.
PHP.
Activating Prior Knowledge –
Introduction to Binary
Desktop Publishing (DTP)
Laura Bright David Maier Portland State University
Half Term 1 Please type your name here:.
A Picture Says A Thousand Words
POD #30 1/31/19 Write the rule for the following tables:
CSC1018F: Functional Programming (Tutorial)
Batch Setup.
Presentation transcript:

An exercise in conversion Dirk eHumanities

 the task  the method  the lessons  the result ◦demo

JapAM Descartes Correspondence ca. 700 letters 69,237 lines 600 formulas 4.2 MB (without the 311 pictures)

CKCC corpus Descartes XML : Text Encoding Initiative (TEI) ~ 35,000 elements, of which 7,200 metadata 7,700 paragraphs 6,200 formulas 6,000 text-formattings 4,200 structure 2,900 page-breaks 538 images

observation non-algorithmic changes consolidation proofs

use digital equipment: -your text-editor -your scripting language -your regular expressions

replace =(.*?)$ by match1 ???

...formulasmetaclosers... conversion process canonicalinitialcorrectedimprovedchecked metadata combining

convert.pl 100 KB of program code text = 25 densely typed pages = 3427 lines of which 2175 real code lines Code/Input = 1/32

1/3 of the tasks need 2/3 of the code formulas: (2)37 % headers, openers, closers:(3)16 % meta and images: (3)11 % run time of same tasks formulas:(2)29 % headers, openers, closers:(3) 6 % meta and images(3)10 % total run time(25)40 sec

1. Unicode is your friend 2. Split into many subtasks 3. task = configuration + workflow 4. Count and check 5. Performance matters 6. Do not give up automation

(2a) that can be run separately (2b) that can be reordered easily

was 30+ seconds is now 2.07 seconds many new subtasks based on same template (gain = 15 * 30 = 7.5 min per run) many, many runs before everything is OK (gain = 100 * 7.5 = 12.5 hours CPU-time)

we used a lot of expert knowledge which has all been transferred to - the source - consolidated extra inputs so the conversion is still repeatable and modifiable sourceformulasmetaclosersresults corrections hints CKCC conversion program