Download presentation
Presentation is loading. Please wait.
Published byNeil Hicks Modified over 9 years ago
1
An exercise in conversion Dirk Roorda @ eHumanities 2012-01-26
2
the task the method the lessons the result ◦demo
3
JapAM Descartes Correspondence ca. 700 letters 69,237 lines 600 formulas 4.2 MB (without the 311 pictures)
5
CKCC corpus Descartes XML : Text Encoding Initiative (TEI) ~ 35,000 elements, of which 7,200 metadata 7,700 paragraphs 6,200 formulas 6,000 text-formattings 4,200 structure 2,900 page-breaks 538 images
9
observation non-algorithmic changes consolidation proofs
10
use digital equipment: -your text-editor -your scripting language -your regular expressions
12
replace =(.*?)$ by match1 ??? Aargh!#@\€]
17
...formulasmetaclosers... conversion process canonicalinitialcorrectedimprovedchecked metadata combining
22
convert.pl 100 KB of program code text = 25 densely typed pages = 3427 lines of which 2175 real code lines Code/Input = 1/32
24
1/3 of the tasks need 2/3 of the code formulas: (2)37 % headers, openers, closers:(3)16 % meta and images: (3)11 % run time of same tasks formulas:(2)29 % headers, openers, closers:(3) 6 % meta and images(3)10 % total run time(25)40 sec
25
1. Unicode is your friend 2. Split into many subtasks 3. task = configuration + workflow 4. Count and check 5. Performance matters 6. Do not give up automation
27
(2a) that can be run separately (2b) that can be reordered easily
30
was 30+ seconds is now 2.07 seconds many new subtasks based on same template (gain = 15 * 30 = 7.5 min per run) many, many runs before everything is OK (gain = 100 * 7.5 = 12.5 hours CPU-time)
31
we used a lot of expert knowledge which has all been transferred to - the source - consolidated extra inputs so the conversion is still repeatable and modifiable sourceformulasmetaclosersresults corrections hints CKCC conversion program
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.