Digitisation of Newspapers The South African Experience Patricia Liebetrau IFLA Newspaper Conference, New Delhi, 26-28 February 2010.

Slides:



Advertisements
Similar presentations
Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
Advertisements

DC-2005: Vocabularies in Practice, September 2005, Madrid, Spain Dublin Core for an African context Pat Liebetrau Digital Imaging South Africa (DISA)
History Study Center Primary and secondary sources documenting global history 2010.
Subject Based Information Gateways in The UK Coordinated Activities in The UK Within the UK Higher Education community, the JISC (Joint Information Systems.
METS Awareness Training An Introduction to METS Digital libraries – where are we now? Digitisation technology now well established and well-understood.
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben Pennell & Jeff Campbell UNC Libraries.
Digitization from Newspaper Microfilm: The Colorado Experience Brenda Bailey-Hainer CHNC Program Director Colorado State Library June
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
OUP in support of digital libraries Main objectives Historical Context Why Xml ? Librarian Resource Centre Oxford Index Marzena Giers Fidler 5 th June.
Services Digitisation & Content Management. 600 People – India.
These ain’t “Old News”! Creating access to historic newspapers Christine Guenther OCLC Product Manager, Digital Services Preservation Service Centers Bethlehem,
Creating electronic resources for the study of forced migration: a researcher's perspective Marilyn Deegan Refugee Studies Centre University of Oxford.
DIGITIZATION OF LOCAL HISTORY COLLECTIONS IN PUBLIC LIBRARY “VLADISLAV PETKOVIC DIS” IN CHACHAK: DIGITIZATION OF THE NEWSPAPER “THE VOICE OF CHACHAK” Bogdan.
JSTOR User Services l February 2009 Using the JSTOR Interface User Services, February 2009.
EAD in A2A Bill Stockting, Senior Editor A2A and EAD Working Group: Central Archives of Historical Records, Warsaw, 26 April 2003.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
6/15/20151 Opportunities for Collaboration: The HEARTH Project Joy Paulson and Nathan Rupp Cornell University Digital Library Federation Spring Forum New.
ProQuest Supporting Research & Education 24 th June 2008 Stephen Hawthorne.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
1 History in a digital world: helping communities access and explore their heritage through newspapers. Cathy Pilgrim – Director, Australian Newspaper.
ANNO – AustriaN Newspapers Online A digitisation initiative of the Austrian National Library.
1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals 26 November 2008.
1 Australian Newspapers Digitisation Program Development of the Newspapers Content Management System Rose Holley – ANDP Manager ANPlan/ANDP Workshop, 28.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Port Townsend Leader Historical Newspaper Archive Keith Darrock.
1 NEWSPLAN – The Way ahead Ed King, Head of Newspaper Collections, British Library NEWSPLAN LIEM Regional Council 2 October 2008.
© 2014 The Regents of the University of Michigan. This work is licensed under the Creative Commons Attribution 4.0 Unported License. To view a copy of.
Update on the VERSIONS Project for SHERPA-LEAP SHERPA Liaison Meeting UCL, 29 March 2006.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
Isabel Silver and Laurie Taylor IMLS Library Publishing Services Workshop May 5, 2011 UF Smathers Libraries Publishing Services.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Erin Kinney, Wyoming State Library. Motivation #1 priority that came out of 2004 statewide digitization meeting WSL received many reference questions,
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
Presentation Path  Introduction to Ved Consultancy and OpenText  Current Challenges  The Valued Customers and Sectors  Our Solutions  Demo. Together,
VERSIONS Project Workshop London School of Economics and Political Science 10 May 2006.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
An Overview of Projects and Processes Higher Education Digitisation Service Joanne Lomax Smith
The Portal to Texas History: Harnessing Technology to Enable Collaboration with Small Museums and Libraries CNI, December 6, 2005 Cathy Nelson Hartman.
1 Using Digital Technologies to unlock history for researchers. Rose Holley – Manager Newspaper Digitisation Program Australian Academy of the Humanities.
Integrating a Statewide Web Gateway With Digital Collections ______________________ Eric Weig and Beth Kraemer University of Kentucky and KCVL.
Digitising Special Queen’s - the JSTOR Project Preservation Teaching Research 1.
Presentation to Legal and Policy Issues Cluster JISC DRP Programme Meeting 28 March 2006.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
UNIZULU INSTITUTIONAL REPOSITORY GATEWAY TO LOCAL CONTENT.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
CSUN eCommons Submitting Learning Objects to CSUN eCommons: A Preliminary Guide February 7, 2008.
Letchworth State Park Digital Image Library
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
Institutional Repositories July 2007 Intellectual property management : the DISA experience Dr D Peters DISA: Digital Innovation South Africa.
Open Access and Institutional Repositories. Accra, June 2007 Institutional repositories in SA research institutions: the DISA experience Dr D Peters.
Launching the Dean digitally : the Jonathan Jansen Collection in UPSpace eIFL.net in co-operation with the Research Library Consortium Institutional repositories.
Open Access and Institutional Repositories, 10 July 2007, UKZN, Durban,,South Africa Metadata for institutional repositories: an introduction Pat Liebetrau.
1 Overview of Progress Cathy Pilgrim – Director ANDP Presentation to NSLA 19 February 2009, National Library of Australia Australian Newspapers Digitisation.
Locating News Resources 8 Mar Outline Mastering E-newspapers –Factiva –WiseNews –SCMP Archive –ProQuest Historical Newspapers: South China Morning.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
1 Australian newspaper digitisation program Bronwyn Lee National Library of Australia Presentation to 13 th IASI World Congress – 13 March 2009 Sports.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Arabic Collections Online (ACO)
Professional Development Programme: Design and Development of Institutional Repository Using DSpace Nipul G Shihora INFLIBNET Centre Gandhinagar
IFLA Newspapers pre-conference Geneva, Arturs Zogla
Promoting and Preserving FIU Research and Scholarship
Digitisation in academic libraries: Experience from Makerere University Library, Kampala Uganda By Patrick Sekikome Presented at the CERN-UNESCO School.
Locating News Resources
Resource Lists workshop
Metadata to fit your needs... How much is too much?
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Presentation transcript:

Digitisation of Newspapers The South African Experience Patricia Liebetrau IFLA Newspaper Conference, New Delhi, February 2010

Introduction 2

Durban … a multicultural city Durban … a multicultural city 3

Digital Innovation South Africa (DISA)  National collaborative initiative  Creating online resources for education, research and training  Make accessible online SA material of high socio-political value  Collated serial literature scattered across collections  Develop local expertise in use of advanced digital technologies  Set standards for digitisation initiatives in SA IFLA Newspaper Conference, New Delhi, February 2010

DISA  Identify appropriate collections  Distributed digital production  Gateway to federated digital collections  Develop policies, strategies and guidelines in support of SA initiatives  Comply with international standards  Bridge digital gap between northern and southern hemispheres IFLA Newspaper Conference, New Delhi, February 2010

Campbell Digital microfilm scanner Obsolete technology Preservation of microfilms Newspapers and MSS on microfilm Data transfer Application to DISA

Digitising microfilm Samples were tested using the following: 1 bit at 300dpi 1 bit at 400dpi 1 bit at 600dpi 8bit greyscale at 300dpi with thresholding at 128 8bit greyscale at 400dpi with thresholding at 128 8bit greyscale at 600dpi with thresholding at 128 IFLA Newspaper Conference, New Delhi, February 2010

Comparisons Sample 1: Scanned on flat bed scanner at 300dpi 8bit greyscale from unbound original Sample 2: Scanned using Minolta MS7000 microfilm scanner at 300dpi 8bit greyscale – microfilm copy looks as though it was bound One would have to conclude from this example that perhaps the microfilm was not captured correctly IFLA Newspaper Conference, New Delhi, February 2010

OCR recognition It would be obvious that the rate of word return from the previous two samples would be far greater in the first image than it would be for the second image Conclusion Some microfilms are better than others – the resulting scan is as good as the original microfilm IFLA Newspaper Conference, New Delhi, February 2010

OCR’ed text Big no to constitution as elections draw near 11,HOUSANDS of peo-ple have rejected the Government's new constitution under which elections for In-dian and coloured chambers of Parlia-nent are to take place n August. Reports from around the country talk of feverish activity as the biggest issue facing the country nears its climax."The elections, to be held on the 22nd and -28th of August, is seen as an issue which con-cerns all South Africans. The African com-munity in particular is " leading the call for a boycott of the elec` tions.Mr. Popo Molefe, the national secretary of the United Democratic Front (UDF), said the central issue was the 'denationalisation of the African people'. 'We call on our peo-ole in Eldorado Park, Reiger Park, Acton- ville and Lenasia, to boycott the August elections. 'We call on our peo- ple to refuse to be partners in the crime of Apartheid against the majority of South Africans.' IFLA Newspaper Conference, New Delhi, February 2010

Indexing  Manual indexing!  Encoded using the international Text Encoding Initiative (TEI) later mapped to Dublin Core (DC) metadata element set  Metadata capture: publisher, place and date of publication at journal/ newspaper level  Indexing of title, author and keywords at article level  xml based  Articles over several pages  English language IFLA Newspaper Conference, New Delhi, February 2010

Capturing journal metadata Speak: the voice of the community Volume 2 No 3 DISA Digital Innovation of South Africa Durban, South Africa Jul1984 Speak: the voice of the community Volume 2 No 3 July pages Speak Community Newspaper Project Johannesburg July 1984 IFLA Newspaper Conference, New Delhi, February 2010

Search and browse  Browsing facilities  browse the text images  Searching facilities  full text searching  article title, author and keyword searching  thesaurus  acronyms Readability and advanced searchability IFLA Newspaper Conference, New Delhi, February 2010

Indexing results  Advanced searchability on all the encoded elements  By using terms from a thesaurus, language usage is standardised  Higher relevance of returned hits  Added intellectual input IFLA Newspaper Conference, New Delhi, February 2010

However …  Human indexing is time and labour intensive  Training is required  Quality control is needed  Thesaurus management software is essential IFLA Newspaper Conference, New Delhi, February 2010

Languages and translations African vernacular languages Translation challenges for a global context OCR challenges OCR training for African languages not yet developed Automated translation not yet possible Extraction of metadata useful IFLA Newspaper Conference, New Delhi, February 2010

Language examples Hindi Zulu IFLA Newspaper Conference, New Delhi, February 2010

South African newspaper digitisation  Rich collections in the vernacular  Poor quality microfilms  Low OCR success rate on microfilms scans  Level of metadata complexity  Minimal manual indexing  Cost of staff time  Service on demand  Lack of national guidelines  Lack of national funding IFLA Newspaper Conference, New Delhi, February 2010

Conclusions  Volume of newspapers and information  Value of digitisation  Rich source of social South Africa history  Vernacular  Teaching, learning and research value  Dedicated newspaper digitisation project  Overcome challenges! IFLA Newspaper Conference, New Delhi, February 2010

Recommendations  National consultation  National support  Prioritisation  Role of publishers  DISA consultancy IFLA Newspaper Conference, New Delhi, February 2010

Contact details  Patricia Liebetrau, Director, DISA   URL:  This presentation is made available under a Creative Commons Attribution 2.0 South Africa license.Creative Commons Attribution 2.0 South Africa