Aarhus. BnF main topics – 2013 – crawling side Keep crawling –Broad and focused crawls –Limit of 100 Tb Crawl of password protected content –“Press project”:

Slides:



Advertisements
Similar presentations
PUMA & MetaPub Open Access to Italian CNR Repositories in the Perspective of the European Digital Repository Infrastructure GL9 - NINTH INTERNATIONAL CONFERENCE.
Advertisements

Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.
Opportunities for the cultural sector Claude POLIART DG Information Society (IFSO)
From web archiving to web collecting The development of the KB’s web archive Anna Rademakers, May 21st 2014.
Harvesting digital newspapers at the Bibliothèque nationale de France
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
Sandra McIntyre Program Director. OVERVIEW Analysis.
BnF projects and priorities On the collection side – Perform broad and focused crawls with a maximum of 100TB – Set up the legal deposit of ebooks.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Mixing web and digitized archives The future of digital heritage of the World War I Valérie Beaudouin (Telecom ParisTech), Philippe Chevallier (BnF), Lionel.
Steve Yip Head of Reference and Research Services HKUST Library Research Support Provided by HKUST Library and other JULAC Libraries in HK 1 Date : March.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Jackie Knowles, Project Manager. Image from
Building Digital Museums, Libraries and Archives David Dawson Senior Policy Adviser (Digital Futures)
Rutgers University Libraries What is RUcore? o An institutional repository, to preserve, manage and make accessible the research and publications of the.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
11 WARC standard revision workshop Clément Oury IIPC General Assembly open workshops Stanford, April 28th, 2015 IIPC General Assembly – Stanford – April.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Romain Wenz- BnF-DIBN – SWIB 2010 November The data.bnf.fr project describing resources of the French National Library.
Promoting Digital Preservation Partnerships at the U.S. Library of Congress April 2004.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Digital Objects Management Arbicon Visit, June 7, 2007 Esa-Pekka Keskitalo, Senior Analyst esa-pekka.keskitalo [at] helsinki.fi.
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
WebArchiv Czech Web Archive IIPC 2007, Paris.
Resource Sharing Development and Challenge in Academic Libraries: the Case Study of CALIS Yao XiaoXia CALIS Administrative Center , PUL , shanghai.
Good practice in Research Data Management Module 6: Tools, training and support.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Managing journals: challenges and opportunities How to get started (with OJS) Jackie Proven.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Ymchwil Research Ymchwil Research RESAW Ioan Isaac-Richards Ingest Processes Manager Head of Web Archiving
The Western Waters Digital Library: Building a Resource Through Multi- State Collaboration and Technology Dawn Paschal Assistant Dean, Digital Library.
CUBARTE The doorway to the world of Cuban Culture.
Françoise Bourdon Deputy Head of the Digital and Bibliographic Information Department French National Library IFRRO International seminar Oslo, October.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
IFAP Special Event: Information and Knowledge for All, Emerging Trends and Challenges Information Preservation 4000 Years of Traditions Challenged by Digital.
Enhancing Digital Repository of Scholarly Publications at Indian Institute of Technology Bombay by Mr. Mahendra N. Jadhav Assistant Librarian Central Library.
Martin Halbert UNT Dean of Libraries MetaArchive President Monday, April 11, 2011 Newspaper Archive Summit University of Missouri Columbia, MO.
NetarchiveSuite Meeting, BnF, Austria Updates and Plans for 2012 Michaela Mayr, Andreas P. Austrian National Library
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Issues in preparedness for sustainable digital preservation: the.
Uganda Scholarly Digital Library (USDL) Makerere University’s Institutional Repository By Margaret Nakiganda URL:
Examples for Open Access Scholar Electronic Repository by New Bulgarian University IP LibCMASS Sofia 2011 Contract № 2011-ERA-IP-7 Sofia, September,
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
Knowledge Ontario Integration Collaboration Content Knowledge Virtual Communities Information Resources Libraries Archives Museums Education Social Space.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
To find journals by language of publication, click on the Languages bar in the horizontal frame. The Languages drop down menu appear and we will choose.
Aligning Digital Preservation Policies with Community Standards Nancy McGovern Digital Preservation Officer.
Institutional Repositories and Licensing of Research Output advanced information management laboratory university of cape town department of computer science.
Challenges in Web Archiving UNT Perspective NDIIPP – July 21, 2010.
Joint Information Systems Committee Repositories Support Project Summer School 2008 Amber Thomas, JISC.
Notes accompany this presentation. Please select Notes Page view. These materials can be reproduced only with official approval from Gartner. Such approvals.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Digitization Workflows From the Digital Projects Unit University of North Texas Libraries Mark E. Phillips Jeremy D. Moore February 12, 2009.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Building A Repository for Digital Objects
BnF experiences with harvesting content beyond paywalls
Preserving Our Collective Digital History
Common Solutions to Common Problems
GISELA & CHAIN Workshop Digital Cultural Heritage Network
APENet and EUROPEANA: Digitization Issues in the European Context
Presentation transcript:

Aarhus

BnF main topics – 2013 – crawling side Keep crawling –Broad and focused crawls –Limit of 100 Tb Crawl of password protected content –“Press project”: PDFs of daily newspapers –Tests with other kinds of content Work on direct deposit of e-books

BnF main topics – 2013 – access and preservation sides Merging professional and public WB –Various optimizations –Clickable permalink… Draw links between web archives and BnF indexing and promotion tools –general catalogue, data.bnf.fr… Open access to web archives in regional libraries –Legal and technical aspects Start ingesting our web archives in our digital repository

Direct deposit for e-books? High-level discussions between National Publishers Union and BnF –A better international framework: IFLA statement on legal deposit, FEP/CENL declaration… Why not crawling? –A better unitary indexation of each e-book –No problems of DRMs –Discussing directly with publishers

Direct deposit for e-books? / technical side A technical layer is available: the extranet for publishers –2011: digital legal deposit forms –2012/3: direct transfer of metadata (ONIX) –2013/4: ebooks? What do we need to decide? –Who will be the main interlocutor? –How many and what kind of formats? What validation? Is it possible to refuse? –What link between the paper and digital version in the catalogue? –What access tool? Gallica or web archives?

RESAW project : some keywords Networking (researchers and heritage institutions) Standards and collection quality Shared tools and services (storage infrastructure, analyzing tools, portal) Methods and training

RESAW project : interest for BnF Promote the use of web archives towards researchers Help launching international and national research programs Offer groundbreaking tools and services Get feedback about our collection development policies Promote the building and use of web archives towards high level decision makers

Current situation at BnF No current research project –But the Web legal deposit team involved in research frameworks: “Labex” : “excellence laboratories” –Participation in the “Hypertext corpus initiative framework” (lead: Medialab) Relationships with researchers –Political sciences (Political science institute in Paris and Grenoble, universities of Nancy and Cergy) –Social sciences (university of Paris 1, Grenoble) –Netart (Avignon) –Web metrics (AFNIC)? Relationships with associations (literature, sustainable development…)

International initiatives to follow up Collaborative web harvesting –EU elections, “Olympics” project, Vaclav Havel collection –Use of “nomination tool” provided by University of North Texas Portal and shared access –IIPC website, Memento Research project –BL/IA/JISC project on.uk analysis –80 Tb of data provided by IA –Common crawl project (?) Training –PhD sponsorship (UNT)

Questions and comments The networked we dream about! Some objectives already (partially) covered by IIPC –standards, interoperability, shared portal Legal issues will be very difficult to solve Be cautious with the term “quality” (prefer relevancy for specific goals?) What will you ask for? –Money, doctoral students, engineers…