Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digitizing Arabic Text: Where are we today?

Similar presentations


Presentation on theme: "Digitizing Arabic Text: Where are we today?"— Presentation transcript:

1 Digitizing Arabic Text: Where are we today?
Elizabeth A. S. Beaudin Yale University Library Project AMEEL

2

3 اللـُغَة العَرَبـِـيَّة
اللغة العربية the Arabic language اللـُغَة العَرَبـِـيَّة

4

5

6

7

8

9 Victor 42n56 78w44 Alexandria 31n12, 29e54 New Haven 41n18, 72w56 Ann Arbor 42n17, 83w45 Halle 51n29, 11e58 Leiden 52n10, 4e30

10 Outsourcing Kirtas APT Bookscan 2400

11 In house … Indus 5002 Book Scanner

12 Processing (Image enhancement)
after before

13 Sakhr vs. 9 Interface

14 Sakhr -- Increasing accuracy

15 VERUS vs. 2 – interface

16 VERUS – increasing accuracy

17 comparison of features and uses
Sakhr Character by character Font libraries Output choices: Unicode text, proprietary, html Learning approach to improving accuracy Use of dongle for copyright Handles batch processing well Desktop and SDK versions VERUS Word by word Custom dictionary Output choices: searchable PDF or UTF-8 text Good with degraded documents Use of dongle to track quantity Handles mix of languages better API version for customization

18 Accuracy? Averages Conditions Improvement Decisions

19 Software suite Scanning: proprietary to Indus and Kirtas
Digitization Workflow Scanning: proprietary to Indus and Kirtas Processing: PhotoShop, ScanFix, ACDSee, Unifier, SuperEdi OCR: Sakhr OCR-Gold and font libraries, VERUS v2 Staging and Archiving: ACDSee, customized scripts Workflow control: MS Access, customized scripts Repository Development Fedora Repository Framework, PHP, MySQL, REST, Java

20

21

22 FEDORA framework Indexed and searchable in Arabic
Full Text repository FEDORA framework Indexed and searchable in Arabic

23


Download ppt "Digitizing Arabic Text: Where are we today?"

Similar presentations


Ads by Google