IMAODBC, The Hague, 5-9 sept 2005 INEbase history. Statistical books 1858-1990 available on the web Antonio ARGÜESO IMAODBC, The Hague, 5-9 sept 2005 Tema 3
INEbase history: a virtual library INEbase history: Statistical books 1858-1990 available on the web INEbase history: a virtual library The project: showing on the Internet the editorial funds of INE (1857 - 1997 approx. ) Information stored not as complete books but as hierarchically organised documents Search utilities INEbase history: a new section of INEbase This virtual library is not offered as a section of “products and services” not even as a “virtual library” but as a part of the output database, INEbase A link to: History 1/6 Tema 3
Project phases: Phase I. June 2004- june 2005. INEbase history: Statistical books 1858-1990 available on the web Project phases: Phase I. June 2004- june 2005. 110,000 pages scanned (77 yearbooks, 221 books of pop. censuses) Software development (100,000 euros) First 25 books catalogued and published (4 people: 1 mger, 3 grants) Phase II. July 2005- july 2006. 180,000 pages scanned (vital stats, agricultural & industrial censuses…) Software improvements (10,000 euros) 150 books catalogued and published (8 people: 1 mger, 7 cataloguers) 2/6 Tema 3
The technical process in 3 steps INEbase history: Statistical books 1858-1990 available on the web The technical process in 3 steps 1. Scanning and OCR Books are scanned in high speed scanners . The output files are TIFF 600 ppi and TIFF 4 (300 ppi) (the popular telefax format) These files are OCRed, obtaining a final enriched pdf with two layers : The image of the page (first layer) The words recognised by the OCR (second layer) => These PDFs are page images but also allow text searching 3/6 Tema 3
2. Cataloguing books into the system INEbase history: Statistical books 1858-1990 available on the web 2. Cataloguing books into the system cataloguers create the hierarchical trees (books indices) and the final nodes (statistical tables) are associated to a pdf 4/6 Tema 3
INEbase history: Statistical books 1858-1990 available on the web 3. Publication Once a book has been catalogued and revised, just one click and the book is on the web 5/6 11/17 Tema 3
Hardware and software used: An easy system INEbase history: Statistical books 1858-1990 available on the web Hardware and software used: An easy system a server for cataloguing contains the development DB and the pdf files. As many PCs as cataloguers connecting to it using the client program. A dissemination server hosts the software and a copy of the DB coherence & synchronisation mechanisms between both systems (development and dissemination) 6/6 Tema 3
Thank you! Tema 3