1 Symposium: Open Access to Information Panel 3: BDTD (Biblioteca Digital de Teses e Dissertações) 25 August 2006, Brasilia NDLTD: From Local to National to Global Edward A. Fox, Executive Director, NDLTD Chair, IEEE-CS Tech. Committee on Digital Libraries Professor, Department of Computer Science Director, Digital Library Research Laboratory Virginia Tech, Blacksburg, VA USA
2 Global Scope 1 stop shopping for access Search engine companies want single contact to reach large number of sites Spokesman needed for publicity, partnering, advocacy, monitoring, … Annual international conference: next in Sweden, then United Kingdon Research: cross-language, classification, preservation, plagiarism detection, …
3 Outline Acknowledgements Key Ideas – With Proofs ETD 2005 Concepts Institutional Repositories UK Report NDLTD DL Futures
4 Acknowledgements Students Faculty, Staff Collaborators Support Mentors
5 Key Ideas - Overview Theorem 1: Supporters of Open Access should support NDLTD. Theorem 2: 5S can guide us to better support of Open Access.
6 Theorem 1: Supporters of Open Access should support NDLTD - 1 DLs will lead to enormous benefit at all levels, from personal to global. An IR is a type of DL, in the middle of the levels (requiring support from below, and providing support for above levels). Having a DL at every university (i.e., IR) greatly encourages Open Access.
7 Theorem 1: Supporters of Open Access should support NDLTD - 2 The easiest way to launch an IR at a university is with ETDs. NDLTD is the lead world organization promoting ETD activities. NDLTD’s goals are all in support of Open Access and IRs.
8 Theorem 2: 5S can guide us to better support of Open Access - 1 5S helps us think formally about Open Access, hence clearly, hence to find focus. 5S helps us design and build DLs, hence IRs. Societies –Individuals: members of institution, discipline –Social influence can promote DL (re)use. –Economic and political and social issues lead us to a distributed architecture.
9 Theorem 2: 5S can guide us to better support of Open Access - 2 Distributed infrastructure + services lead us to harvesting (vs. federation, gathering). 5S helps make harvesting a success: –Streams of content flow from individuals. –Structures: ETD-ms, (browsing) classification –Spaces: indexes, interfaces –Scenarios: submission, workflow, harvesting –Societies (see above) More collaboration (social networks) Prestige is more widely spread. Access if more open
10 Conference Summary Words - 1 accessibilityaggregationalert annotatearchivearts attitudesauthenticationauthoring authorizationautomationbrowse catalogcollaborationcommunity componentscontextconversion customerdecentralizeddigitize discoursediscoverydissemination DSpacefederatedFedora globalgrideconomic harvestingingestinnovation institutionalintegrityinteraction
11 Conference Summary Words - 2 interchangeinteroperabilityknowledge LOCKSSmanagementmetadata nationalOCRorganization partnershipPDF (/A)podcasting portalpreservationprovider regionalrepositoryretrieval scalabilityScirussearch serverservicesharing standardizationstrategicstudent summarizationsustainabletestimonial toolkittrainingtutorial UnicodeusableVALET XMLXSLTworkflow
12 Conference Summary Phrases - 1 alumni developmentalways on business modelconcept map content managementcopyright compliance cost effectiveCreative Commons creative materialcross language dark archivedeveloping country digital librarydigital rights management digital signaturedisruptive technology document modelDublin Core
13 Conference Summary Phrases - 2 e-knowledgee-publishing e-researche-science full textGoogle Scholar institutional repositoryLDAP server learning objectmandatory deposit Million Book Projectnational initiative Net GenOAI PMH online digital studioopen access Open Archives Initiativeopen source
14 Conference Summary Phrases - 3 persistent identifierspostgraduate research public domainrestricted access retrospective conversionscholarly communication server logservice oriented architecture social networkstepping stone subject gatewaysurvey data union catalogunlocking IP user centeredvalue added voluntary participationwalking the talk web basedweb services
15 Institutional Repositories - 1 “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.” Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA
16 Institutional Repositories - 2 “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.” Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7, Feb. 2003,
17 Prospero: Summary of features of the three software packages compared DSpaceE-printsFedora What you get A package with front-end web interface directly linked to a database A repository database, with internal database. Server require- ments Unix environment, Java, Apache Ant, Apache Tomcat, PostgreSQL or Oracle Unix environment, Perl, Apache+mod-perl, MySQL Unix or Windows, Java. (optional: MySQL or Oracle) Subject class- ification Yes Community groups YesNoPossible but … (see below) Where from? MIT and Hewlett- Packard. Southampton University, outcome of a JISC project. Cornell University and the University of Virginia Library.
18 UK Report of Aug EVALUATION OF OPTIONS FOR A UK ELECTRONIC THESIS SERVICE Study report edited by Alma Swan Key Perspectives Ltd & UCL Library Services EThOS project (Electronic Theses Online Service) - commissioned to develop a model for a workable, sustainable and acceptable national service for the provision of open access to electronic doctoral theses.
19 EThoS: Stakeholders Academic registrars University administrators (graduate schools) Librarians Repository managers Authors (or potential authors) of theses and dissertations
20 EThoS: Issues Electronic thesis provision status in the UK and the reasons for its slow development Drivers for change in the provision of e-theses The administrative and academic contexts in which a national UK e-thesis service would need to operate Constraints that might apply, or which have applied until the present Architectures and service models for e-thesis provision Technical standards IPR and other rights issues Business models
21 Elements of an ETD Initiative - 1 The hub: the central focus of the service may offer multiple resources and subservices or, at the other end of the scale, may be a simple resource discovery service: The hub may point at theses located in host institutions or it may contain the full-text of theses itself Submission procedure: the simplicity of and requirements for this can vary Metadata structure and format: the required metadata formats can vary from very simple Dublin Core with few elements to a deeply descriptive specially developed metadata scheme with many elements Metadata dissemination: services vary in the extent to which they disseminate thesis metadata – some only expose it themselves, while others disseminate it via multiple discovery services and routes Accepted file formats: some services accept multiple file formats, some few and some just one
22 Elements of an ETD Initiative - 2 Digitisation: a digitisation service may be part of the offering. If it is, it may be on demand or there may be a mass retrodigitisation programme on offer Thesis level: services may offer only doctoral theses or may extend their coverage to masters theses and even undergraduate dissertations Copyright and IPR: services may incorporate advice and practical help on rights issues Plagiarism: services may offer a plagiarism detection scheme Business model: under this heading fall issues such as: whether theses are offered on a pure Open Access basis or whether the access is paid for; whether royalties are paid to authors; how digitisation costs are covered and so forth
23 E-theses Services: Summary of characteristics - 1 ProviderADTDiVATheses Canada NDLTDDART- Europe EThOS Coverage Originally 7 in CAUL: now open to all Australian and NZ universities 16 universities in Sweden and Norway; open to all universities 60 universities in Canada Any voluntarily participating institutions Pan- European service for voluntary national/Euro pean consortia Any voluntarily participating UK HE institution Hub ADTDiVA Portal portal.org Theses Canada Portal org DART-Europe repository and portal (DEEP) in 2007 EThOS (at British Library) Submission procedure At host institution By author at host univ. Template supplied To ProQuest to digitize, or universities can provide metadata. At host institutions. NDLTD encourages At host institution By author at host univ.; central service for retrodigitis- ation and submission Metadata format Details soon 99 metadata elements MARC 21 ETD-ms Dublin Core ETD-MS (ETD metadata standard). Crosswalks Developing a standards w 8 DC elements EThOS set of 15 qualified DC elements
24 E-theses Services: Summary of characteristics – 2 ProviderADTDiVATheses Canada NDLTDDART- Europe EThOS Funded by Australian Research Council (at least initially) Universities’ consortium (of the participating universities) LAC funds except for ProQuest’s fee (funded by Theses Canada and universities) Membership dues and membership in-kind contributions Business model currently being discussed Participatin g institutions, via choice of options Digitisation provision by service? At deposit. Also on request of theses NoBy ProQuestNoYes, probably mass retrodi- gitisation if funding secured Yes Number of theses N/A11,500 theses, 500 other pubs (reports, books) c50,000 ETDs c250,000 TDs (incl. ETDs) Over 250,000 Estimated 25,000/yr digitisations Masters theses Yes?Yes Yes?No plans Open Access Yes , and to ETDs harvested Yes
25 EThoS Survey: my institution’s policy position on PhD e-theses 55% no policies yet 34% current planning policies 11% has a policy
26 EThoS Survey: very important driver of a national e-thesis service 60% e-theses are more accessible 48% paper theses are not easily accessible 31% will contribute to institution’s visibility 21% storage space for print theses 19% want national support since it is slow to launch a local service 16% increasing interest in electronic preservation of research
27 EThoS Survey: very useful value- added services for PhD e-theses 59% long-term digital creation of harvested e-theses in a central archive 51% optimizing exposure to search engines 48% digital copy for local host institutions 35% IPR checks against deposited works 35% support for non-text elements 34% plagiarism checks 33% link to repositories of primary data used
28 EThoS: Benefits Hugely increased visibility of UK doctoral research output Resulting in increased usage and impact of UK doctoral research output The opportunities for resulting new research efforts and collaborations
29 EThoS: Opportunities Being able to provide a world-class electronic theses service to showcase the UK’s doctoral research Providing an example of good practice and the impetus for other nations to develop electronic theses services of their own Possible commercial opportunities for value-added service providers
A Digital Library Case Study Domain: graduate education, research Genre:ETDs=electronic theses & dissertations Submission: Collection: Project: Networked Digital Library of Theses & Dissertations (NDLTD)
31 NDLTD Incorporation Incorporated May 20, 2003 in Virginia, USA Charitable and educational purposes (501 c 3) Officers –Executive Director (Ed Fox) –Secretary (Gail McMillan) –Treasurer (Scott Eldredge) Now: –250K metadata records in Union Catalog –~50 full members, ~200 associated members
32 Board of Directors (2006) Suzie Allard (ETD 2004, U. Kentucky) Denise A. D. Bedford (World Bank) Julia C. Blixrud (ARL, SPARC) José Luis Borbinha (Natl Lib Portugal) Alex Byrne (ETD 2005, ADT: Australia) Tony Cargnelutti (ETD 2005, Australia) Vinod Chachra (VTLS) William Clark (Ohio State U.) Susan Copeland (RGU, UK) Jude Edminster (Bowling Green St. U.) Scott Eldredge (Treasurer, ETD 2002, BYU) Edward A. Fox (Exec Director,Virginia Tech) John H. Hagen (West Virginia U.) Thomas B. Hickey (OCLC) Christine Jewell (U. Waterloo, Canada) Joan K. Lippincott (CNI) Mike Looney (Adobe) Austin McLean (ProQuest) Gail McMillan (Secretary, Virginia Tech) Joseph Moxley (ETD 2000, USF) Eva M ü ller (U. Uppsala, Sweden) Ana Pavani (PUC Rio, Brazil) Sharon Reeves (National Library Canada) Peter Schirmbacher (ETD 2003, Humboldt) Hussein Suleman (U.Cape Town, S. Africa) Shalini R. Urs (U. Mysore, India) Eric F. Van de Velde (ETD 2001, Caltech)
33 NDLTD Committees (Chairs) Awards (John Hagen) Conferences (Sharon Reeves) Development (Peter Schirmbacher) Executive (Edward Fox) Finance (Scott Eldredge) Implementation (Ana Pavani) Membership (Tony Cargnelutti) Nominating (Joan Lippincott) Standards (Thomas B. Hickey) Union Catalog (Vinod Chachra)
34 Selected Projects / Sponsors Australia (ADT) Brazil (BDT, IBICT) Canada Catalunya Chile (Cybertesis) China (CALIS) Germany India (Vidyanidhi) Korea OhioLINK: 79 colleges/univs Portugal (National Library) South Africa UK (British Library, JISC, Edinburgh, …) UNESCO (especially Latin America, Eastern Europe, Africa) …
35 UNESCO and ETDs (by Axel Plathe at ETD2003) Promoting the use of the Internet as a tool for disseminating scientific knowledge Facilitating the transfer of ETD expertise from developed to developing countries 1998: Member of the NDLTD Steering Committee 1999: First UNESCO ETD meeting on ETD internationalisation 2002: “UNESCO Guide to Electronic Theses and Dissertations” 2003: Model training programmes and training courses 2003: Sponsor pilot projects 2003: Pilot projects (Africa, Europe, Latin-America)
36 Some Countries Australia Belgium Brazil Canada Chile China, Hong Kong Columbia Finland France Germany Greece India Italy Jamaica Korea Lithuania Malaysia Mexico Namibia Netherlands Norway Poland Russia Singapore S. Africa S. Korea Spain Sudan Sweden Switzerland Taiwan Thailand Turkey UK USA Venezuela Yugoslavia
37 NDLTD Members - 1 Ball State University Brigham Young University California Institute of Tech. Consorci de Biblioteques Universitàries de Catalunya Duke University Georg August Universität Göttingen George Washington University Georgetown University Georgia Institute of Technology Georgia Southern University Georgia State University Government of Canada Griffith University John Hopkins University Kauno Technologijos Universitetas Louisiana State University L'Université du Québec à Rimouski McGill University New Jersey Institute of Technology North Carolina Central University North Carolina State Ohio University
38 NDLTD Members - 2 Oregon State U. Library Penn State University Pontifícia Universidade Católica do Rio de Janeiro Portugal National Library Rita Chu (individual) Simon Fraser University State of Kansas Texas Tech University U. de las Américas, Puebla Universität St. Gallen U. Glasgow U. Maine U. Missouri U. North Carolina Chapel Hill U. Pittsburgh U. Pretoria U. Southern Florida U. Tennessee U. Waterloo Virginia Tech West Virginia U. Libraries Worcester Polytechnic Institute Yale University
39 Why ETD? Short Answer For Students: –Gain knowledge and skills for the Information Age –Richer communication (digital information, multimedia, …) For Universities: –Easy way to enter the digital library field and benefit thereby For the World: –Global digital library – large, useful, many services General: –Save time and money –Increased visibility for all associated with research results
40
NDLTD: How can a university get involved? Select planning/implementation team –Graduate School –Library –Computing / Information Technology –Institutional Research / Educ. Tech. Join online, give us contact names – Adapt Virginia Tech or other proven approach –Build interest and consensus –Start trial / allow optional submission
42 How? Steps Attend ETD xx Join NDLTD Launch initiative, dialog, encourage Pilot -> requirement OAI data provider DINI-Certificate Log, survey, analyze, improve Help other sites Serve on NDLTD committees Extend services: preservation, inst. rep., …
43 Union catalog: OCLC OCLC will expand OAI data provider on TDs. Is getting data from WorldCat (so, from many sites!). Will harvest from all others who contact them. Need DC and either ETD-MS or MARC. Has a set for ETDs.
44 OCLC SRU Interface
45 VTLS VTLS offers its free VALET system to manage ETDs at institutions, building upon Fedora, as well as VTLS software. VTLS runs a service provider atop the Union Catalog. It supports multilingual access through the interface, to metadata.
46 LOCKSS Lots of copies keep stuff safe Stanford (Vicky Reich) Initial focus on lower levels Initial content: journals Emory (Martin Halbert) –Help deploy and adapt –Help apply in other contexts, e.g., ETDs Experiments, studies of Int’l ETD service –Humboldt, PUC Rio, U. Cape Town, VT, …
47 Full-text Services Running since Sept 2005: Scirus In beta test: Google Scholar Challenges: –Broadening the coverage since OAI use has not spread as widely as we would like –Understanding use, throughout life cycle –Data and DL services quality problems –Inconsistency in way to get from metadata to the full- text file(s) –Cross-language information retrieval
48 NDLTD cross-language problem LanguageNumber English123,696 Portuguese11434 German4131 French3868 Spanish1561 Chinese1463 Catalan804 Others19962 (most unclassified) Total (summer’05)
49 Ryan Richardson solution to NDLTD cross-language problem
50 Example concept map
51 Supply-Demand Comparison 1Architecture and Design 2Law 3Medicine, Nursing and Veterinary Medicine 4Arts and Science 5Engineering and Applied Science 6Business and Commerce 7Education 8Others. (unclassifiabl e)
52 User Expertise Years
53 Date Stamp of ETD
54 Quality Dimensions
55 Quality and the Information Life Cycle
56 Metadata Specifications and Metadata Format: Completeness OCLC NDLTD Union catalog
57 Metadata Specifications and Metadata Format: Conformance Based on ETD-MS
58 DL Futures History People, Content, Tools Sustainable Infrastructure For More Information
59
60
61 People Digital librarians DL system developers DL system administrators DL managers DL collection development staff DL evaluators DL users
62
63 As data, information, and knowledge play increasingly central roles … digital library research should focus on: Increasing the scope and scale of information resources and services; Employing context at the individual, community, and societal levels to improve performance; Developing algorithms and strategies for transforming data into actionable information; Demonstrating the integration of information spaces into everyday life; and Improving availability, accessibility, and, thereby, productivity.
64 An appropriate infrastructure program will provide sustainability of digital knowledge resources among five dimensions: Acquisition of new information resources; Effective access mechanisms that span media type, mode, and language; Facilities to leverage the utilization of humankind’s knowledge resources; Assured stewardship over humanity’s scholarly and cultural legacy; and Efficient and accountable management of systems, services, and resources.
65 DLs: For More Information Magazine: Books: (1994) –MIT Press: Arms, plus by Borgman, Licklider (1965) –Morgan Kaufmann: Witten... (several), Lesk (2 nd edition) Conferences –ECDL: –ICADL: –JCDL: Associations –ASIS&T DL SIG –IEEE TCDL: (student awards, doctoral consortia) NSF: Labs: VT: