Presentation is loading. Please wait.

Presentation is loading. Please wait.

Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.

Similar presentations


Presentation on theme: "Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands."— Presentation transcript:

1 Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands Institute for Sound and Vision

2 Our history of web archiving 2008-2010 2011-2012 2008-2010

3 Purposes of web archiving

4 What Web archiving is not

5 Web archiving as a context collection

6

7 Current project: selection of sites: broadcaster

8 Current project: selection of sites

9 Issues and challenges

10

11 Current status

12 Front end & back end

13

14 jvytopil@beeldengeluid.nl

15 Web Archiving in audiovisual field Studiedag webarchivering in Nederland, Hilversum, October 30, 2014 Chloé Martin chloe@internetmemory.net http://archivethe.net

16 Web archiving

17 What? & Why? What is a Web archive? A copy of website Recorded by a crawler At a specific date and time Look and feel like a real website For Whom? Any institution whose aim is to collect & preserve web/media material for historical, cultural, heritage or legal (compliance) purpose Pervasive Dynamic Valuable Web content Variety of format Ephemeral Why?

18 How? Collection policy Management tools Quality control Access

19 Web Archiving Team Put in place a cross-disciplinary team ‣ Curator / Librarian / Archivist ‣ Information system technician Train a team ‣ Web archivist / Project Manager ‣ Engineer(s) to design & monitor the whole process (for in house solution) Web archiving requires critical skills and experience, especially concerning engineers in the case of an in-house solution

20 Collection policy

21 Extensive Collection vs Intensive Collection

22 How to i i mprove Selection Policy IMR value propositions: [Topic crawls] Percolable, a tool to discover relevant sources [Crawl of actives sources] Automated refreshment rate [Large Crawls] Smart discovery crawl based on topic or language

23 How? Collection policy Management tools Quality control Access

24 Archivethe.net

25 User Interface

26 Challenges: Technical issues Deep & Hidden Web Webspams and Traps Dynamic websites Social Web (Twitter, FB, YouTube, Flickr,...)Twitter YouTubeFlickr Video

27 Challenges: Video B&G Screenshot

28 OurTube / Our Tweet screenshot Challenges: Social Media

29 Quality Assurance

30 Access

31 Access & Search Browsing in the archive URL Full Text with Elastic Search Full Text + Branding (search, web archive)searchweb archive Automatic redirection Automated categorization Semantic expansion

32 Extract valuable information From your large corpus for Users / Researchers Cleaned text Keywords to add Cloud Outlinks to analyze Graphs Structure unstructured data (forums,...) Named entities More are coming soon...

33 About IMR Internet Memory Research ✓ Spin-off of the Internet Memory Foundation, French start-up, founded in 2011 ✓ 20+ engineers actively engaged in the Web Archiving and Information Mining field ✓ EU Projects: DOPA, Annomarket, TrendMiner, Rethink Big, ASAP ✓ Large Scale Crawler with high performances ✓ Scalable platform based on a distributed architecture and Big Data components (Hadoop, Hbase, HDFS,…) ✓ Innovative infrastructure with low consumption

34 About IMR Any Question? http://archivethe.net chloe@internetmemory.net Twitter ArchiveTheNet


Download ppt "Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands."

Similar presentations


Ads by Google