Presentation is loading. Please wait.

Presentation is loading. Please wait.

An exercise in conversion Dirk eHumanities 2012-01-26.

Similar presentations


Presentation on theme: "An exercise in conversion Dirk eHumanities 2012-01-26."— Presentation transcript:

1 An exercise in conversion Dirk Roorda @ eHumanities 2012-01-26

2  the task  the method  the lessons  the result ◦demo

3 JapAM Descartes Correspondence ca. 700 letters 69,237 lines 600 formulas 4.2 MB (without the 311 pictures)

4

5 CKCC corpus Descartes XML : Text Encoding Initiative (TEI) ~ 35,000 elements, of which 7,200 metadata 7,700 paragraphs 6,200 formulas 6,000 text-formattings 4,200 structure 2,900 page-breaks 538 images

6

7

8

9 observation non-algorithmic changes consolidation proofs

10 use digital equipment: -your text-editor -your scripting language -your regular expressions

11

12 replace =(.*?)$ by match1 ??? Aargh!#@\€]

13

14

15

16

17 ...formulasmetaclosers... conversion process canonicalinitialcorrectedimprovedchecked metadata combining

18

19

20

21

22 convert.pl 100 KB of program code text = 25 densely typed pages = 3427 lines of which 2175 real code lines Code/Input = 1/32

23

24 1/3 of the tasks need 2/3 of the code formulas: (2)37 % headers, openers, closers:(3)16 % meta and images: (3)11 % run time of same tasks formulas:(2)29 % headers, openers, closers:(3) 6 % meta and images(3)10 % total run time(25)40 sec

25 1. Unicode is your friend 2. Split into many subtasks 3. task = configuration + workflow 4. Count and check 5. Performance matters 6. Do not give up automation

26

27 (2a) that can be run separately (2b) that can be reordered easily

28

29

30 was 30+ seconds is now 2.07 seconds many new subtasks based on same template (gain = 15 * 30 = 7.5 min per run) many, many runs before everything is OK (gain = 100 * 7.5 = 12.5 hours CPU-time)

31 we used a lot of expert knowledge which has all been transferred to - the source - consolidated extra inputs so the conversion is still repeatable and modifiable sourceformulasmetaclosersresults corrections hints CKCC conversion program


Download ppt "An exercise in conversion Dirk eHumanities 2012-01-26."

Similar presentations


Ads by Google