Digitizing Arabic Text: Where are we today? Elizabeth A. S. Beaudin Yale University Library Project AMEEL http://www.library.yale.edu/ameel/ MELCOM International 2008 Oxford
MELCOM International 2008 Oxford
اللغة العربية اللـُغَة العَرَبـِـيَّة the Arabic language اللـُغَة العَرَبـِـيَّة MELCOM International 2008 Oxford
MELCOM International 2008 Oxford
MELCOM International 2008 Oxford
MELCOM International 2008 Oxford
MELCOM International 2008 Oxford
MELCOM International 2008 Oxford
MELCOM International 2008 Oxford Outsourcing Kirtas APT Bookscan 2400 MELCOM International 2008 Oxford
MELCOM International 2008 Oxford In house … Indus 5002 Book Scanner MELCOM International 2008 Oxford
Processing (Image enhancement) after before MELCOM International 2008 Oxford
MELCOM International 2008 Oxford Sakhr vs. 9 Interface MELCOM International 2008 Oxford
Sakhr -- Increasing accuracy MELCOM International 2008 Oxford
MELCOM International 2008 Oxford VERUS vs. 2 – interface MELCOM International 2008 Oxford
VERUS – increasing accuracy MELCOM International 2008 Oxford
comparison of features and uses Sakhr Character by character Font libraries Output choices: Unicode text, proprietary, html Learning approach to improving accuracy Use of dongle for copyright Handles batch processing well Desktop and SDK versions VERUS Word by word Custom dictionary Output choices: searchable PDF or UTF-8 text Good with degraded documents Use of dongle to track quantity Handles mix of languages better API version for customization MELCOM International 2008 Oxford
MELCOM International 2008 Oxford Accuracy? Averages Conditions Improvement Decisions MELCOM International 2008 Oxford
MELCOM International 2008 Oxford Software suite Digitization Workflow Scanning: proprietary to Indus and Kirtas Processing: PhotoShop, ScanFix, ACDSee, Unifier, SuperEdi OCR: Sakhr OCR-Gold and font libraries, VERUS v2 Staging and Archiving: ACDSee, customized scripts Workflow control: MS Access, customized scripts Repository Development Fedora Repository Framework, PHP, MySQL, REST, Java MELCOM International 2008 Oxford
MELCOM International 2008 Oxford
MELCOM International 2008 Oxford
FEDORA framework Indexed and searchable in Arabic Full Text repository FEDORA framework Indexed and searchable in Arabic http://oacistest.library.yale.edu:8080/fedoragsearch/restAmeel MELCOM International 2008 Oxford
MELCOM International 2008 Oxford http://www.library.yale.edu/ameel/ MELCOM International 2008 Oxford