Bibliothèque nationale de France Tallinn, 2015-01-29 1 BnF update: production and development priorities in 2015.

Slides:



Advertisements
Similar presentations
Cultural Heritage in REGional NETworks REGNET Workplan Adherence.
Advertisements

Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)
Harvesting digital newspapers at the Bibliothèque nationale de France
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
Bibliothèque nationale de France Tallinn,
BnF projects and priorities On the collection side – Perform broad and focused crawls with a maximum of 100TB – Set up the legal deposit of ebooks.
Título de la presentación NetarchiveSuite at the BNE Juan Carlos García Arratia – Chief of IT Development Service, NLS Mar Pérez Morillo – Chief of Web.
14 mai 2007Evolution of Scientific Publications, Colloque de l'Académie des sciences1 Preservation of electronic publications mission Catherine Lupovici.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Mixing web and digitized archives The future of digital heritage of the World War I Valérie Beaudouin (Telecom ParisTech), Philippe Chevallier (BnF), Lionel.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive July 2008.
APSR Forum on Long-Term Repositories National Library of Australia, 31 August – 1 September, Trust and the Web: Can the audit criteria apply to.
The Great War Archive A Community Collection Alun Edwards University of Oxford
Digitization Projects: Internal Development vs. Outsourcing Production or D.I.Y. vs. The Pros.
11 WARC standard revision workshop Clément Oury IIPC General Assembly open workshops Stanford, April 28th, 2015 IIPC General Assembly – Stanford – April.
Developing PANDORA Mark Corbould Director, IT Business Systems.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
NATIONAL MEMORY AND DIGITAL DELIVERY PROGRESS WITH LEGAL DEPOSIT OF ELECTRONIC PUBLICATIONS IN THE UNITED KINGDOM Graeme Forbes National Library of Scotland.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National.
1 Archive-It Training University of Maryland July 12, 2007.
Towards a new cooperation between libraries and educational institutions Matthieu BONICEL Bibliothèque nationale de France - CNRS.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Dealing with DRM and Digital Rights at the German National Library.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
25th June 2008ECCBSO - XBRL filings with Central banks European Committee of Central Balance Sheet Data Offices XBRL filings with central banks Amsterdam,
WebArchiv Czech Web Archive IIPC 2007, Paris.
1 News and media websites harvesting. 2 A daily crawl since December 2010 The selective crawl contains 92 websites National daily newspapers (
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
Tool Academy: Web Archiving Nicholas Digital Cultural Heritage DC Meetup December 20, 2012 “cobwebbed screw driver” by Flickr user Colby.
The Digital Object Management Programme (DOM) Richard Masters, Programme Manager PRESERV Partners Meeting 18 th November
WebArchive – Archive of the Czech Web Mgr. Jan HUTAŘ.
Building Scalable Web Archives Florent Carpentier, Leïla Medjkoune Internet Memory Foundation IIPC GA, Paris, May 2014.
Annick Le Follic Bibliothèque nationale de France Tallinn,
1 Bibliothèque nationale de France use case Pauline Chougnet ISNI AGM October 2014.
Antonella Fresa Amman, December 2006 The MINERVA Products Antonella Fresa Amman, December 2006 Ministerial NEtwoRk for Valorising Activising.
The ECHO DEPository Project A project of the University of Illinois at Urbana-Champaign and OCLC in partnership with the Library of Congress ALA Annual.
Aarhus. BnF main topics – 2013 – crawling side Keep crawling –Broad and focused crawls –Limit of 100 Tb Crawl of password protected content –“Press project”:
Françoise Bourdon Deputy Head of the Digital and Bibliographic Information Department French National Library IFRRO International seminar Oslo, October.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
A historical perspective of Digital Preservation at The Royal Library, Denmark.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
NetarchiveSuite Meeting, BnF, Austria Updates and Plans for 2012 Michaela Mayr, Andreas P. Austrian National Library
The BnF Digital Strategy Bibliothèque nationale de France Denis Bruckmann, Head of collections, Deputy general director.
9:00am – Welcome/Setting the Agenda for the Day 9:10am - 10:30am – Challenges of the Web Now & in the Future Response to these Challenges 10:30am – BREAK.
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
Digital library of Spanish old newspapers and magazines National Library of Spain.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
1 NetarchiveSuite Workshop Paris November , 2011.
LIALIA The LIA Project Italian Accessible Books London Book Fair– April, 11 th.
1 « Luxembourg, 18 April 2007 « Virtual Library of Official Statistics « Dissemination Working Group.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008.
Copenhagen 11 March 2015 Dias 1 Theme 2a: Media Tools — NetLab, a Research Infrastructure for Internet Studies Niels Brügger, Aarhus University Advisory.
Use of DDC at the BnF, display of authority Data
IFLA Satellite conference - Helsinki - 10 août 2012
Institution update KB DK
BnF experiences with harvesting content beyond paywalls
Workshop on Web Archiving
BnF - DLWEB - Umbra & Heritrix 3
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
21st Century Online Exhibits:
DDP/DAP Design and Technology Overview
Malte Dreyer – Matthias Razum
Web archives as a research subject
Presentation transcript:

Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015

Production Infrastructure: end of old « Petabox » architecture Harvesting Focused crawls, notably electoral crawl in December 2015 Annual domain crawl, September-November Budget of 120 Tb in 2015 against 100 in 2014 Preservation Maintaining a sufficient ingesting ratio (ingesting should run faster than harvesting!) Accept WARC format in the current ingesting channel Start new ingest channels (BnF crawls with HTTrack, , crawls performed by IA for the BnF, ) 2

Legal deposit of ebooks Two working groups Grouping together representatives of BnF, SNE (main publishers union) and Ministry of Culture Legal side: adapting the decree on internet legal deposit Technical side Working with e-distributors to get data and metadata Agreeing on the formats for data (EPUB, PDF) and metadata (ONIX) Setting up an internal workflow Entry with FTP deposit Re-use of already exiting applications: SPAR, Gallica… Important issue: automation of cataloguing Real-scale tests to start in April-May

Development: access projects Data mining Project supported by a specific research fund Cooperation with a research library (specialized in 20 th century history) and a technology university (Telecom ParisTech) Goal: studying the use of digitized documents on WWI web On BnF side Extracting metadata in “WAT” files Providing a server where a researcher can “play” with the data … while respecting legal deposit regulations On Telecom ParisTech side Recruit a researcher which will use the BnF services to perform the study Who arrived in January 2015, for 6 to 9 months 4

Development: access projects Data extraction A follow-up of the data mining project Being able to extract specific data from legacy W/ARC files… … according to filters: by domain names, MIME types, dates. etc. Interested in the use of WARC tools and JWAT Full-text indexing On specific corpora: news, government websites Probably using SOLR and tools developed by BL and other IIPC partners Should start in June

Remote access Remote access: a legal possibility Regulation of the Ministry of Culture: a remote access to web archive should be offered to the 26 regional libraries… …corresponding to the 26 French regions, including overseas regions Technically: use of a “virtual brower” Use of VMware “View” solution A progressive deployment Already available in 2 libraries, 2 new openings up to March Goal: 8 libraries end 2015, 15 end In parallel, the BnF proposes to libraries to maintain an “ongoing collect” to harvest their regional web 6

Development: harvesting projects Investigating the adoption of Heritrix 3 Identifying the benefits, the shortcomings, the opportunities and the risks In close relationship with… you More information to come! Crawl of FTP platforms with Heritrix Issue: the BnF is not able to get the paper versions of local editions of main regional newspapers So it tries to get the online PDF version Currently we crawl the websites… but we would like to investigate “FTP deposit by robot” with Heritrix It’s just a teaser for the end of the workshop… 7