Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.

Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands Institute for Sound and Vision

Our history of web archiving 2008-2010 2011-2012 2008-2010

Purposes of web archiving

What Web archiving is not

Web archiving as a context collection

Current project: selection of sites: broadcaster

Current project: selection of sites

Issues and challenges

Current status

Front end & back end

jvytopil@beeldengeluid.nl

Web Archiving in audiovisual field Studiedag webarchivering in Nederland, Hilversum, October 30, 2014 Chloé Martin chloe@internetmemory.net http://archivethe.net

Web archiving

What? & Why? What is a Web archive? A copy of website Recorded by a crawler At a specific date and time Look and feel like a real website For Whom? Any institution whose aim is to collect & preserve web/media material for historical, cultural, heritage or legal (compliance) purpose Pervasive Dynamic Valuable Web content Variety of format Ephemeral Why?

How? Collection policy Management tools Quality control Access

Web Archiving Team Put in place a cross-disciplinary team ‣ Curator / Librarian / Archivist ‣ Information system technician Train a team ‣ Web archivist / Project Manager ‣ Engineer(s) to design & monitor the whole process (for in house solution) Web archiving requires critical skills and experience, especially concerning engineers in the case of an in-house solution

Collection policy

Extensive Collection vs Intensive Collection

How to i i mprove Selection Policy IMR value propositions: [Topic crawls] Percolable, a tool to discover relevant sources [Crawl of actives sources] Automated refreshment rate [Large Crawls] Smart discovery crawl based on topic or language

How? Collection policy Management tools Quality control Access

Archivethe.net

User Interface

Challenges: Technical issues Deep & Hidden Web Webspams and Traps Dynamic websites Social Web (Twitter, FB, YouTube, Flickr,...)Twitter YouTubeFlickr Video

Challenges: Video B&G Screenshot

OurTube / Our Tweet screenshot Challenges: Social Media

Quality Assurance

Access

Access & Search Browsing in the archive URL Full Text with Elastic Search Full Text + Branding (search, web archive)searchweb archive Automatic redirection Automated categorization Semantic expansion

Extract valuable information From your large corpus for Users / Researchers Cleaned text Keywords to add Cloud Outlinks to analyze Graphs Structure unstructured data (forums,...) Named entities More are coming soon...

About IMR Internet Memory Research ✓ Spin-off of the Internet Memory Foundation, French start-up, founded in 2011 ✓ 20+ engineers actively engaged in the Web Archiving and Information Mining field ✓ EU Projects: DOPA, Annomarket, TrendMiner, Rethink Big, ASAP ✓ Large Scale Crawler with high performances ✓ Scalable platform based on a distributed architecture and Big Data components (Hadoop, Hbase, HDFS,…) ✓ Innovative infrastructure with low consumption

About IMR Any Question? http://archivethe.net chloe@internetmemory.net Twitter ArchiveTheNet

Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.

Similar presentations

Presentation on theme: "Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.

Similar presentations

Presentation on theme: "Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands."— Presentation transcript:

Similar presentations

About project

Feedback