TELplus WP 1 “Making searchable digitised images via OCR” TELplus Kick-off Meeting Tallinn, 15./16. October 2007 Max Kaiser / Joachim Korb Austrian National Library /
WP1: Main Objectives Provide a huge amount of full text for The European Library (and the future European Digital Library) in a very short time Lay the basis for future efforts with full text –overview over OCRable material –best practices –identification of priorities
WP1: Background Main shortcoming of The European Library today: lack of access to the full text of the material WP1 will be an import step to tackle this problem Will enable TEL to make a bigger contribution to the European Digital Library targets –Commission Target: 2 million objects accessible by 2008, at least 8 million objects by
WP1: Main Goals Full text access to more than 20 million pages Facing common challenges in OCR Establishing best practices for OCR Providing full text access through The European Library
Partners OCR contributions (examples) Spain:5 Mio pages of historical newspapers and magazines Iceland:1.0 – 1.6 Mio pages of books and newspapers Poland: pages of journals 1918 – 1939 Austria: pages of governmental publications Sweden: pages of travel literature (books/manuscripts)
WP 1 Tasks organisation and research 1.1 Survey of availability of digitised images 1.2 Survey of existing OCR approaches 1.3 Identification of concrete materials for OCR. OCR Specifications, implementation plans and tenders Production 1.4 Carrying out OCR and making full texts available via partners’ digital library environments 1.5 Provision of access to newly OCRed material through The European Library
organisation and research
Task 1.1: Survey of availability of digitised images for OCR who and when Leader: Austrian National Library 01. Oct – 31. Dec Dec. 2007:Survey to be delivered Participants: All content partners in WP 1 D Oct31. Dec D1.1 – Survey of availability of digitized images for OCR
Task 1.1: Survey of availability of digitised images for OCR Which collections at partner libraries? What do partners plan to OCR? Which priorities in partners OCR plans? How will full-text collections be made accessible? Results provide basis for implementation plans (Task 1.3)
Task 1.2: Survey of existing OCR approaches who and when Leader: Austrian National Library 01. Jan. – 31. July July 2007:Survey to be delivered Participants: All content partners in WP 1 D Jan31. July D1.2 – A survey of existing OCR practices and recommendations for more efficient work
Task 1.2: Survey of existing OCR approaches which experiences and approaches partners have? which digital library environments? how will access to full text be provided? which challenges? how can they be mastered? Results will support practical work (Task 1.4) Results provide perspectives for future work to build on
Task 1.3: Identification of concrete materials for OCR; OCR specifications; implementation plans and tenders - who and when Leader: Austrian National Library 01. Dec – 28 Feb Feb. 2007:specifications, implementation plans and tenders delivered concrete OCR materials identified Participants: All content partners in WP 1 This Task commences before surveys are evaluated M Dec28. Feb D1.3 M1.2 D1.3 – Package of specifications and implementation plans M1.1 – Identification of concrete materials for OCR against an agreed budget M1.2 – Specifications and implementation plans for full text conversion available
Task 1.3: Identification of concrete materials for OCR; OCR specifications; implementation plans and tenders input from Tasks 1.1 and 1.2 (surveys) priority lists of material will be drawn up for each content provider OCR specifications and implementation plans will be set up tenders will be prepared and published (for out-sourced work) Result will give framework for practical work proper tendering procedures will be observed as budget payment depends on them
production
Task 1.4: Carrying out OCR and making full texts available via partners’ Digital Library environments - who and when Leader: Austian National Library 01. Jan – 31. Dec Sept. 2008:10 million OCRed pages 1. progress report 30. Sept. 2009:20 million OCRed pages 2. progress report Participants: All content partners in WP 1 This Task commences before 1.2 and 1.3 end 01. Jan28. Feb D1.5D1.4 D1.4 – First set of consolidated OCR progress reports D1.5 – Second set of consolidated OCR progress reports
Task 1.4: Carrying out OCR and making full texts available via partners’ Digital Library environments Tendering Carrying out of OCR implementations plans Full text conversion Quality control Indexing Providing access through partners’ Digital Library environment Providing material for WP 3 Task 3.1 and subtasks Providing feedback for Task 1.2
Task 1.5 who and when Leader: National Library of the Netherlands [TEL office] 1. March 2008 – 31. Dec Dec. 2009access to all newly OCRed material provided Participants: All content partners in WP 1 Task commences before Tasks 1.2 and 1.4 end M1.3 M Mar 31. DecD1.6 M1.4 D1.6 – Provision of access to newly OCR-ed material through TEL M1.3 – First set of full text material available M1.4 – Second set of full text material available M1.5 – Full texts accessible via The European Library
Task 1.5: Provision of access to newly OCRed material through The European Library Provision of access to all OCRed material from this WP through TEL All content partners will have to provide full text and indexes for this first half Sept second half Sept. 2009
D Oct 31. Dec Task 1.1 D Jan31. July Task 1.2 M Dec28. Feb D1.3 M1.2 Task Jan31. DecD1.5 Task 1.4 M1.3 M Mar D1.6 M1.4 Task 1.5 D Dec TIMELINE
Leading Partners in WP 1 Austria National Library Tasks 1.1 through 1.4 National Library of the Netherlands [TEL office] Task 1.5 French National Library Workshop in January Paris
Contributing Partners in WP 1 National Library of Estonia26 person months National and University Library of Slovenia 6 person months National Library of the Czech Republic19 person months Austrian National Library20 person months National and University Library of Iceland20 person months National Library of Latvia24 person months Martynas Mažvydas National Library of Lithuania28 person months National Széchényi Library of Hungary 7 person months French National Library20 person months National Library of Norway39 person months National Library of Spain10 person months Slovak National Library20 person months National Library of Sweden 5 person months The National Library of Poland40 person months
TELplus Work Package 1 Thank You! Questions? Max Kaiser / Joachim Korb Austrian National Library /