Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digitizing Arabic Text: Where are we today?

Similar presentations


Presentation on theme: "Digitizing Arabic Text: Where are we today?"— Presentation transcript:

1 Digitizing Arabic Text: Where are we today?
Elizabeth A. S. Beaudin Yale University Library Project AMEEL MELCOM International Oxford

2 MELCOM International 2008 Oxford

3 اللغة العربية اللـُغَة العَرَبـِـيَّة the Arabic language
اللـُغَة العَرَبـِـيَّة MELCOM International Oxford

4 MELCOM International 2008 Oxford

5 MELCOM International 2008 Oxford

6 MELCOM International 2008 Oxford

7 MELCOM International 2008 Oxford

8 MELCOM International 2008 Oxford

9 MELCOM International 2008 Oxford
Outsourcing Kirtas APT Bookscan 2400 MELCOM International Oxford

10 MELCOM International 2008 Oxford
In house … Indus 5002 Book Scanner MELCOM International Oxford

11 Processing (Image enhancement)
after before MELCOM International Oxford

12 MELCOM International 2008 Oxford
Sakhr vs. 9 Interface MELCOM International Oxford

13 Sakhr -- Increasing accuracy
MELCOM International Oxford

14 MELCOM International 2008 Oxford
VERUS vs. 2 – interface MELCOM International Oxford

15 VERUS – increasing accuracy
MELCOM International Oxford

16 comparison of features and uses
Sakhr Character by character Font libraries Output choices: Unicode text, proprietary, html Learning approach to improving accuracy Use of dongle for copyright Handles batch processing well Desktop and SDK versions VERUS Word by word Custom dictionary Output choices: searchable PDF or UTF-8 text Good with degraded documents Use of dongle to track quantity Handles mix of languages better API version for customization MELCOM International Oxford

17 MELCOM International 2008 Oxford
Accuracy? Averages Conditions Improvement Decisions MELCOM International Oxford

18 MELCOM International 2008 Oxford
Software suite Digitization Workflow Scanning: proprietary to Indus and Kirtas Processing: PhotoShop, ScanFix, ACDSee, Unifier, SuperEdi OCR: Sakhr OCR-Gold and font libraries, VERUS v2 Staging and Archiving: ACDSee, customized scripts Workflow control: MS Access, customized scripts Repository Development Fedora Repository Framework, PHP, MySQL, REST, Java MELCOM International Oxford

19 MELCOM International 2008 Oxford

20 MELCOM International 2008 Oxford

21 FEDORA framework Indexed and searchable in Arabic
Full Text repository FEDORA framework Indexed and searchable in Arabic MELCOM International Oxford

22 MELCOM International 2008 Oxford
MELCOM International Oxford


Download ppt "Digitizing Arabic Text: Where are we today?"

Similar presentations


Ads by Google