DIGITIZATION OF GOVERNMENT INFORMATION RESOURCES : A CASE STUDY OF CENTRAL SECRETARIAT LIBRARY BY S. MAJUMDAR DIRECTOR CENTRAL SECRETARIAT LIBRARY BY S. MAJUMDAR DIRECTOR CENTRAL SECRETARIAT LIBRARY
The course of human development has taken new dimension with the introduction of information and communication technology (ICT)
For India, the rise of Information and Communication Technology is an opportunity to overcome historical disabilities and to become the master of one's own national destiny
The GOI has recognised the potential of ICT for rapid and all-round national development. The National Agenda for Governance, which is the Government's policy blueprint, has taken due note of the ICT Revolution that is sweeping the globe
GOVERNMENTAL PUBLICATIONS IN INDIA The sphere of governmental activity in India has expanded considerably. Many burning issues like population control, health management, economic and social condition of rural and urban masses, education, basic requirements. In every field the government has intervened
Interventions are Visible Through Administrative Reports Governmental Notifications Statistical Reports Budget Documents Committee and Commission Reports Research Reports
Interventions Bills, Acts, Laws, Codes, Rules and Regulations, Law Reports, Digests and Parliamentary Debates Reports of various Parliamentary Committees
CENTRAL SECRETARIAT LIBRARY MISSION “ take an Indian Initiative to ICT through its libraries to promote, facilitate the development of Indian tangible heritage from printed form to machine readable collections and provide services for optimal utlisation of resources and provide life long accessibility of information through vast library resources.”
Central Secretariat Library : Types of Machine Readable Database Machine readable catalogue of bibliographical information for its document resources; Creating digital documents of the Annual Reports, Budget Documents of the parent Ministry(15 thousand pages); Creating digital documents of the Government of India Gazette (2 million pages) and Commission and Committee Reports (1 million pages); Developing machine readable annotated bibliography of rare books; Developing Digital documents of selected government publications
DIGITAL LIBRARY CONCEPTUALISATION PILOT PROJECT ANNUAL REPORTS; PERFORMANCE BUDGET; DEMANDS FOR GRANTS; EXPENDITURE BUDGET OF DEPARTMENT OF CULTURE TO
OBJECTIVES TO DEVELOP CD BASED FULL TEXT ENGLISH DATABASE HIGHLY INSTINCTIVE DATABASE WHICH IS 100% SEARCHABLE, NAVIGABLE AND CAN BE BROWSED ON ALL GRAPHICS, TABLES, FIGURES AND PHOTOGRAPHS
OBJECTIVES TO DEVELOP WEB ENABLING INTERFACES TO BE HOSTED ON CSL SERVER CO-LOCATED AT NIC
SECOND PHASE CONCEIVED ANNUAL REPORTS, ETC. OF DEPARTMENT OF CULTURE, TO
SECOND PHASE CONCEIVED GOVERNMENT OF INDIA (CENTRAL GOVERNMENT) GAZETTE COMPLETE SETS ARE AVAILABLE FROM 1950 ONWARDS. MORE THAN 70% QUERRIES OF THE TOTAL USAGE OF INDIAN OFFICIAL PUBLICATIONS PERTAIN TO GAZETTE OF INDIA
UNIQUENESS OF GAZETTE ORGANISED INTO DIFFERENT SECTIONS AND PARTS SEARCH REQUIREMENTS ARE ON ANY ONE OR COMBINATION OF SUBJECTS, PART NUMBERS, SECTIONS, SUBSECTIONS, DATE, GSR NUMBERS, AND S. O. NUMBER
OBJECTIVES DEVELOP VALUE BASED PRODUCT USE OF NEW TECHNOLOGY SCANNING DATABASE GENERATION CONTENT ANALYSIS CONTENT MANAGEMENT
OBJECTIVES CONSERVATION AND PRESERVATION WEB ENABLING TECHNOLOGY PORTAL APPLICATION
ADMINISTRATIVE FORMALITIES OPEN TENDER SYSTEM EVALUATION PROCESS TECHNICAL EVALUATION FINANCIAL EVALUATION
TECHNICAL EVALUATION CONSTITUTION OF TECHNICAL COMMITTEE TWO PART EVALUATION CREDIBILITY, TURNOVER, WORK FORCE,SIMILAR WORK EXPERIENCE TECHNICAL EXPERIENCE
TECHNICAL EXPERIENCE ELEVEN PARAMETERS Digitizing technique : scanning, OCR, Proofing or any other with technical specifications; Format for metadata creation; File format : TIFF and PDF or any other Organization of Images and the corresponding metadata in DBMS
TECHNICAL DETAILS … ELEVEN PARAMETERS ….. Standard DBMS used or it is a proprietary software Retrieval interface with standard search strategy Provision for Web enabling of data Resolution : Scanning and Output/Display Portability from one platform to another Platform used Work flow
PROTOTYPES ANNUAL REPORTS GAZETTE DOCUMENTS
TECHNICAL RESULTS COULD SHORTLIST TWO AGENCIES COULD ACHIEVE BEST PRACTICES TO BE FOLLOWED COULD UNDERTAKE BENCHMARKING OF THE BEST PRACTICES
BENCHMARK SCANNING OCRing PROOFING FORMATING CONTENT ANALYSIS
BENCHMARK….. METADATA CREATION USING UNIMARC AND DUBLIN CORE (Great Emphasis) CONTENT MANAGEMENT SYSTEM PLATFORM PORTABILITY WEB ENABLING DELIVERABLES PRESERVATION AND CONSERVATION
DC vs UNIMARC MAPPING THE TAGS OF DUBLIN CORE WITH UNIMARC
Costing COSTING TECHNIQUE USED BY CSL : BASE IS THE WORK FLOW - STAGES OF WORK
DELIVERABLES TIFF images to PDF with text ; PDF on CD-ROM after OCRing in RTF ( Rich Text Format) for full text search; TIFF on CD-ROM in raw form;
DELIVERABLES…. XML based ODBC compliant database archives on CD-ROM; Installation of Portal Application Infrastructure; Deliver the tools required for managing and updating the Portal;
STATUS 1.9 million pages of Gazette have already been scanned, cleaned, OCRed; Segregated the Devnagri version with English before OCRing; Capturing minimum fields of Meta Data using DC and UNIMARC; Deliverables at each stage is being received in different storage media
COMMISSION AND COMMITTEE REPORTS Best Practices used in Gazette Documents were repeated; More emphasis on the Meta Data creation using the mapping process of UNIMARC with DC; Emphasis on the Metadata Professionals based on the tools and techniques used by traditional Libraries
FIRST PHASE BASED ON THE PAPERS SUBMITTED The capabilities of the Agencies; The resources available and to be deployed of the Agencies; The time frame, work flow and performance guarantee; and CORE COMPETENCY FACTOR WAS IDENTIFIED
SECOND PHASE The prototype demonstration based on the scope of work and the deliverables envisaged 1. Scanning techniques used and quality, details of equipments used 2. Cleaning, OCRing, Proofing method and compression techniques for output document in PDF and RTF 3. XML coding process followed and files created for the purpose
SECOND PHASE Understanding of Meta Data in relation to a. General feature of Committee and Commission Reports; b. In relation to DC elements, mapping of DC with General Features; c. Mapping of DC into UNIMARC tags; d. Standard used for subject heading.
SECOND PHASE XML coding process followed and files created for the purpose Search mechanism design, developed using content management system Work Flow chart and its quality Overall quality of the Demo
OTHER ISSUES DELIVERABLES TIFF (CLEANED AND UNCLEANED) PDF (IMAGE AND DOCUMENT) XML CODED DATA RTF FOR CERTAIN PORTION DC/ UNIMARC BASED METADATA WEB ENABLED DATABASE WITH TOOLS AND TRAINING
As we move into the electronic era of digital objects it is important to know that there are new barbarians at the gate and that we are moving into an era where much of what we know today, much of what is coded and written electronically, will be lost forever. We are, to my mind, living in the midst of digital Dark Ages; consequently, much as monks of times past, it falls to librarians and archivists to hold to the tradition which reveres history and the published heritage of our times. Terry Kuny, Consultant, National Library of Canada 1998
THANKS