Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ancient Greek OCR with Gamera and the Google/Perseus Greek and Latin Collection Bruce Robertson, Mount Allison University.

Similar presentations


Presentation on theme: "Ancient Greek OCR with Gamera and the Google/Perseus Greek and Latin Collection Bruce Robertson, Mount Allison University."— Presentation transcript:

1 Ancient Greek OCR with Gamera and the Google/Perseus Greek and Latin Collection Bruce Robertson, Mount Allison University

2 ἀλήθεια truth Ἀλήθεια ‘Breathing’ marks on vowels at beginning of a word Accents possible on all vowels

3 Diversity of Greek Fonts in 19 th C.

4 Other Examples

5 Greek OCR With Gamera Dalitz and Brandt provide an experimental framework – I added splitting, grouping, sql output, etc. Teams of undergraduates making multiple classifiers – Based on families of fonts – Comparing strategies of composite characters, splitting, etc. – Must also train for Latin scripts used Not yet working on post-processing

6 Good Results

7

8 Systematic Approach to Automated Greek OCR Remove the curator from the loop – especially important for journals, monographs, etc. – Assign classifier by computation means Using: – Federico Boschetti’s ground-truth-less Greek text evaluator – Atlantic Computational Excellence Network, Atlantic Canada’s parallel computing network

9 Process 160 Greek-heavy texts chosen Of these, random samples of 10 pages were taken Each was processed with each of the 20 classifiers made this summer The result were evaluated and given a ‘Boschetti score’ from 0 – 1

10

11

12

13

14 Google/ABBYY Line Splitting

15 Gamera’s Text Line Finding(bbox_merging)

16 Replaced with runlength_smearing

17 Two-step processing

18 Future Work Combining and re-optimizing classifiers? Assign classifier based on Latin text – Is ‘Oxford’, ‘Clarendon’ or ‘Oxonii’ in the first pages of output? Align with Google’s output, and provide Google with corrected Greek Implement line-splitting from other OCR engines Discover badly OCR’d Greek in others’ output Implement OCR correction frameworks described here

19 Common Problems Assessments of pre-processing strategies and tools Schemas for page description

20 Thanks Colleagues in Dynamic Variorum Editions: – Greg Crane at Perseus / Tufts – Brian Fuchs at Imperial College Federico Boschetti AceNet, especially tech. support of Sergiy Khan


Download ppt "Ancient Greek OCR with Gamera and the Google/Perseus Greek and Latin Collection Bruce Robertson, Mount Allison University."

Similar presentations


Ads by Google