Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Local content in a Europeana cloud Alternative methods of ingestion for small institutions (Stein) Runar Bergheim Asplan Viak Internet as LoCloud is funded.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014.
Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
1 Co-developing access to the UK Web Archive Helen Hockx-Yu Head of Web Archiving, British Library.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Recent approaches to capture web content, which Heritrix can’t harvest  Capturing Social Media  Screen filming of Rich Media  Project: Event crawl of.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
Tools and resources supporting the cultural tourism Istituto di Linguistica Computazionale “Antonio Zampolli” CNR - Pisa GL14: November 28, Sassolini.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland.
Svein Arne Brygfjeld National Library of Norway Nordic Web Archive.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
TAG-Org Websites 1. Why Websites ? Branding: Since it's our website, we can set the design and build the awareness of our brand. To create our own Online.
Exploring Europe's Television Heritage in Changing Contexts Connected to: Funded by the European Commission within the eContentplus programme
5-7 November 2014 ADLSN - ADLC Practical Digital Content Management from Digital Libraries & Archives Perspective.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Building Scalable Web Archives Florent Carpentier, Leïla Medjkoune Internet Memory Foundation IIPC GA, Paris, May 2014.
ICT PSP Infoday Brussels Call 2011 – Theme 2 Digital Content ICT-PSP Call Theme 2: Digital Content Federico Milani, Marc Röder Infso E6/eContent.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Challenges and Opportunities for Academic Libraries Collaborative Imperatives to Support Collections, Digital Initiatives, and New Services for a Changing.
The Development of the Ceramics and Glass website Mia Ridge Museum Systems Team Museum of London.
Data Mining By Dave Maung.
Examples for Open Access Scholar Electronic Repository by New Bulgarian University IP LibCMASS Sofia 2011 Contract № 2011-ERA-IP-7 Sofia, September,
Current Quality Assurance Practices in Web Archiving Brenda Reyes Ayala, Mark Phillips, and Lauren Ko University of North Texas
A centre of expertise in digital information management 1 UKOLN is supported by: Approaches to Archiving Professional Blogs Hosted in the.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
The TERENA-OER Portal Eli Shmueli IUCC- Israeli-Inter Universities Communication Center MEITAL- Inter-University Center for e-Learning
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Course : Study of Digital Convergence. Name : Srijana Acharya. Student ID : Date : 11/28/2014. Big Data Analytics and the Telco : How Telcos.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Solving the Twitter Home Feed Noise Problem by Qweboo Qweboo – A better way to keep up to date with your interests on Twitter.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Glencoe Introduction to Multimedia Chapter 2 Multimedia Online 1 Internet A huge network that connects computers all over the world. Show Definition.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
From monitoring global media to context-aware recommendation
Data mining in web applications
Archiving & Preserving Digital Content
User Characterization in Search Personalization
Exploring Europe’s Television Heritage in the Digital Age
Joseph JaJa, Mike Smorul, and Sangchul Song
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Extraction, aggregation and classification at Web Scale
Federated & Meta Search
iCrawl – Master Thesis and Hiwi Jobs
Speech Capture, Transcription and Analysis App
iCrawl – Hiwis Jobs and Master Thesis
YANDEX ZEN based on Award Winning machine learning technology
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Márton Németh – László Drótos How to catalogue a web archive?
Digital Marketing Offerings
Web archives as a research subject
AI Discovery Template IBM Cloud Architecture Center
Metadata supported full-text search in a web archive
Presentation transcript:

Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands Institute for Sound and Vision

Our history of web archiving

Purposes of web archiving

What Web archiving is not

Web archiving as a context collection

Current project: selection of sites: broadcaster

Current project: selection of sites

Issues and challenges

Current status

Front end & back end

Web Archiving in audiovisual field Studiedag webarchivering in Nederland, Hilversum, October 30, 2014 Chloé Martin

Web archiving

What? & Why? What is a Web archive? A copy of website Recorded by a crawler At a specific date and time Look and feel like a real website For Whom? Any institution whose aim is to collect & preserve web/media material for historical, cultural, heritage or legal (compliance) purpose Pervasive Dynamic Valuable Web content Variety of format Ephemeral Why?

How? Collection policy Management tools Quality control Access

Web Archiving Team Put in place a cross-disciplinary team ‣ Curator / Librarian / Archivist ‣ Information system technician Train a team ‣ Web archivist / Project Manager ‣ Engineer(s) to design & monitor the whole process (for in house solution) Web archiving requires critical skills and experience, especially concerning engineers in the case of an in-house solution

Collection policy

Extensive Collection vs Intensive Collection

How to i i mprove Selection Policy IMR value propositions: [Topic crawls] Percolable, a tool to discover relevant sources [Crawl of actives sources] Automated refreshment rate [Large Crawls] Smart discovery crawl based on topic or language

How? Collection policy Management tools Quality control Access

Archivethe.net

User Interface

Challenges: Technical issues Deep & Hidden Web Webspams and Traps Dynamic websites Social Web (Twitter, FB, YouTube, Flickr,...)Twitter YouTubeFlickr Video

Challenges: Video B&G Screenshot

OurTube / Our Tweet screenshot Challenges: Social Media

Quality Assurance

Access

Access & Search Browsing in the archive URL Full Text with Elastic Search Full Text + Branding (search, web archive)searchweb archive Automatic redirection Automated categorization Semantic expansion

Extract valuable information From your large corpus for Users / Researchers Cleaned text Keywords to add Cloud Outlinks to analyze Graphs Structure unstructured data (forums,...) Named entities More are coming soon...

About IMR Internet Memory Research ✓ Spin-off of the Internet Memory Foundation, French start-up, founded in 2011 ✓ 20+ engineers actively engaged in the Web Archiving and Information Mining field ✓ EU Projects: DOPA, Annomarket, TrendMiner, Rethink Big, ASAP ✓ Large Scale Crawler with high performances ✓ Scalable platform based on a distributed architecture and Big Data components (Hadoop, Hbase, HDFS,…) ✓ Innovative infrastructure with low consumption

About IMR Any Question? Twitter ArchiveTheNet