Digitizing Arabic Text: Where are we today?

Slides:



Advertisements
Similar presentations
End-to-end document capture, indexation, OCR to Microsoft SharePoint
Advertisements

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets Raman Ganguly Computer Center University of Vienna.
From books to bytes: accelerating digitization at TTU Libraries with Kirtas BookScan APT 2400 Jessica Lu Donell Callender Texas Tech University Libraries.
From books to bytes: accelerating digitization at TTU Libraries with Kirtas BookScan APT 2400 Jessica Lu Donell Callender Texas Tech University Libraries.
Preservation of the Texas Agricultural Experiment Station Bulletin in the Digital Repository By Dr. Rob McGeachin Texas A&M University Libraries June,
EMu New Features 2013 Bernard Marshall KE Software.
Capacity Building Passing on the Experience Dr. Noha Adly World Digital Library Arab Peninsula Regional Group meeting.
OCLC Online Computer Library Center Connexion Client 1.30 for Multiscripts Cataloging CJK User Group Meeting, Chicago April 2, 2005 David Whitehair and.
Version 6.1. Old Vs New Input Format - PDF Meta-Scan & Structured Folder Output Features & Benefits Examples – Other Verticals Content.
MacKenzie Smith Associate Director for Technology MIT Libraries.
Enterprise Integration Solutions SharePoint Imaging.
Overwhelmed by Large-scale Digitization Projects
Client Lunch & Learn (12:15). Association for Information & Image Management Nov Research Scanner Utilization.
Got Paper? Thinking about going paperless or at least as paperless as possible? NAMVBC-2013.
From paper to SharePoint… with IRISPowerscan™ A document capture, indexation, OCR and Microsoft SharePoint integration solution.
OCR and making your publications accessible A practical guide.
Records Services New Pilot Service ReBorn Digital – Joe Arthur.
ImageFiler Version 1.7 ImageFiler Software. Main Features Digital documents archiving Filmed documents retrieval Mixed documents management Networked.
Unicode: The Right Tools, but How to Use Them? Presentation to the Digital Library Federation Fall Forum November 18, 2003 Elizabeth A.S. Beaudin, OACIS.
Librarian Spoken Here? Elizabeth Beaudin, Technical Administrator OACIS Project Yale University Library 25 th Internationalization.
Ottoman Historical Dictionary In-Process Presentation II.
IRISDocument Server IRISPowerscan IRISCapture Pro/X4D Alone or together to meet your needs Alejandro Grüssi VAR / OEM Account Manager.
Introduction to PaperStream
Our Vision To be the leader of the Arab Business Solutions Software.
Premier Accessibility Suite Software for Reading and Writing.
Advanced Workgroup System. RED Advanced Workgroup Systems: Scan Features Copy Print Scan DNSG Software Our Customers Documents Our Customers Documents.
Invent the Future IT Enterprise Systems 1 8/9/2015 Enterprise Document Management Progress and Plans 2007.
Progress in Access Technologies: NLM Video Search Jennifer Marill Chief, Technical Services Division Edward Luczak Systems Architect, Office of Computer.
Digitization Workflow Management System for Massive Digitization Projects Bibliotheca Alexandrina November 19, 2006 The 2 nd International Conference on.
Digital Text Primer Prepared for: AIEA Roundtable on Digitization of Armenian Documents Saturday 7 October 2006, University of Geneva, Switzerland Roland.
ViciDocs for BPO Companies Creating Info repositories from documents.
1 Product overview. 2 network diagram 3 what is Optix? Optix is a cross-platform, web-enabled document management system Complete solution –Document.
Web-based workflow software to support book digitization and dissemination The Mounting Books project books.northwestern.edu Open Repositories 2009 Meeting,
The Luminary Library Experience: Large scale digitization at Toronto Public Library Agenda Introduction Background The project Current status Implementation.
Get more out of your OKI MFP
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Web based METS creation Ralf Stockmann case study.
From Creation to Dissemination A Case Study in the Library of Congress’s use Open Source Software DLF Spring Forum Corey Keith
Who is ROH Incorporated? Founded in 1971 Service oriented company built on client “partnering” Florida State Contract # Provide advanced.
Meta-Knowledge Computer-age study skill or What kids need to know to be effective students Graham Seibert Copyright 2006.
HalFILE Workflow & Full Text Indexing. halFILE Workflow Support Included with halFILE 3.0 hal Systems provides consulting services to help you design.
The most powerful high-speed scanning, indexing and OCR solution on the market Supports many high speed scanners: Fujitsu, Canon, Kodak, Epson, Avision,
Million Book Bibliotheca Alexandrina Noha Adly 20 November 2006.
Z-Geoinfo Inc. Capability Briefing June 21, 2011.
Confidential, I.R.I.S. © 2005, All rights reserved I.R.I.S. new OCR Software suite: A full range for document conversion, for private and corporate users.
Proposed EOB Workflow. EOB Dilemma  Unstructured Documents, data appears in different areas  Small font affects ability for OCR software to bring back.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
1 Bridging the gap between the paper past and digital future.
E-Books Presentation. Hard Copy (Book) Scanning OCR Text Document HTML Conversion Text Formatting Linking Image Insertion Final QC Soft Copy (JPG/TIFF)
Million Book Bibliotheca Alexandrina Youssef Eldakar 19 November 2006.
1 By: Suman Negi, Technical Officer ‘B’ DESIDOC, DRDO, Delhi Presentation at NACLIN 14 (During 9-11 December 2014, Pondicherry) Design and Development.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Baylor University Digital Collections: Southern Baptist Convention annuals & the War of the Rebellion Atlas Eric S. Ames Digital Collections Consultant.
Practices and Open Problems of Document Digitization For Million Book Project Xiaohui Zheng Tsinghua Univ. Library.
OCR at INIS Branko Krznarić. Outline  What is OCR?  OCR Objectives  Principles  Techniques  Software INIS Training Seminar October 2015, Vienna,
How to combine IRIS products Available APIs Examples of integrations Ole Andersen Senior Strategic Account Manager.
2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. BI Publisher: Technology and Architecture.
 ReadSoft 2004 Processing census forms.  ReadSoft 2004 ReadSoft Corporate Profile n Swedish company - founded1991 n Listed in Stockholm stock exchange.
DOCUMENTS ON THE WEB: CAN EVERY ONE READ THEM BY DIPENDRA MANOCHA.
Delivering textual and visual resources. Overview Case studies Methods for providing access Structures for delivery Full text Marked-up Image and text.
Welcome to NexGen Data Entry Services. DATA PROCESSING SERVICES.
Creating Accessible PDFs. It’s Happening Here Agenda Creating Accessible PDF files –Is that PDF REALLY necessary? –Did you SCAN that? Using the OCR feature.
Introduction Powerful PDF To Word Converter - PDFZilla
IFLA Newspapers pre-conference Geneva, Arturs Zogla
The effort-saving, cost-cutting, low-overhead, cloud capture platform.
Corpus Linguistics I ENG 617
Preserving Our Collective Digital History
Digitizing Arabic Text: Where are we today?
Build Your Own Research Database Using DocFetcher Open Source Software
NLM Digital Repository The Search for a New Book viewer
Presentation transcript:

Digitizing Arabic Text: Where are we today? Elizabeth A. S. Beaudin Yale University Library Project AMEEL http://www.library.yale.edu/ameel/ MELCOM International 2008 Oxford

MELCOM International 2008 Oxford

اللغة العربية اللـُغَة العَرَبـِـيَّة the Arabic language اللـُغَة العَرَبـِـيَّة MELCOM International 2008 Oxford

MELCOM International 2008 Oxford

MELCOM International 2008 Oxford

MELCOM International 2008 Oxford

MELCOM International 2008 Oxford

MELCOM International 2008 Oxford

MELCOM International 2008 Oxford Outsourcing Kirtas APT Bookscan 2400 MELCOM International 2008 Oxford

MELCOM International 2008 Oxford In house … Indus 5002 Book Scanner MELCOM International 2008 Oxford

Processing (Image enhancement) after before MELCOM International 2008 Oxford

MELCOM International 2008 Oxford Sakhr vs. 9 Interface MELCOM International 2008 Oxford

Sakhr -- Increasing accuracy MELCOM International 2008 Oxford

MELCOM International 2008 Oxford VERUS vs. 2 – interface MELCOM International 2008 Oxford

VERUS – increasing accuracy MELCOM International 2008 Oxford

comparison of features and uses Sakhr Character by character Font libraries Output choices: Unicode text, proprietary, html Learning approach to improving accuracy Use of dongle for copyright Handles batch processing well Desktop and SDK versions VERUS Word by word Custom dictionary Output choices: searchable PDF or UTF-8 text Good with degraded documents Use of dongle to track quantity Handles mix of languages better API version for customization MELCOM International 2008 Oxford

MELCOM International 2008 Oxford Accuracy? Averages Conditions Improvement Decisions MELCOM International 2008 Oxford

MELCOM International 2008 Oxford Software suite Digitization Workflow Scanning: proprietary to Indus and Kirtas Processing: PhotoShop, ScanFix, ACDSee, Unifier, SuperEdi OCR: Sakhr OCR-Gold and font libraries, VERUS v2 Staging and Archiving: ACDSee, customized scripts Workflow control: MS Access, customized scripts Repository Development Fedora Repository Framework, PHP, MySQL, REST, Java MELCOM International 2008 Oxford

MELCOM International 2008 Oxford

MELCOM International 2008 Oxford

FEDORA framework Indexed and searchable in Arabic Full Text repository FEDORA framework Indexed and searchable in Arabic http://oacistest.library.yale.edu:8080/fedoragsearch/restAmeel MELCOM International 2008 Oxford

MELCOM International 2008 Oxford http://www.library.yale.edu/ameel/ MELCOM International 2008 Oxford