Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014.

Similar presentations


Presentation on theme: "From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014."— Presentation transcript:

1 From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014

2 Overview Internet Memory Research Company Vision Techno logies Services Archive the Net Mignify Newstretto Use-Cases Improve your Selection Process Search in your Web archive Extract valuable information Internet Memory Research2

3 Spin-off of the Internet Memory Foundation French start-up, founded in 2011 20+ engineers Actively engaged in the Web Information Mining field: EU Projects: DOPA, Annomarket, TrendMiner, Rethink Big, ASAP Clusters Cap Digital & Systematic Alliance Big Data Conferences: Search, iexpo, Crawl the Web... Internet Memory Research3

4 Vision The Web is full of valuable data:  Variety  Quantity This data is not so easy to collect, access and process at large scale Making Web data available will create many new business opportunities for the data ecosystem 23/04/2015Internet Memory Research4

5 Technologies Large Scale Crawler with high performances Scalable platform based on A distributed architecture Big data components (Hadoop, Hbase, HDFS,...) Set of proprietary and open source analytic agents providing Text Mining & Data Mining Semantical operations Statistical operations Infrastructure 170+ servers Innovative infrastructure with low consumption Internet Memory Research5

6 6 References

7 From 23/04/2015Internet Memory Research7 ✓ SaaS, automated software service with a friendly user interface ✓ Qualified team to provide quality ✓ Combining new technology and user needs Any institution whose aim is to collect and preserve web material for historical, cultural or heritage purpose For whom? Archives / Research Selective crawls with high level of Quality Assurance National Libraries Large scale crawl for the German National Library A.V. Archives Advanced module for web video and social media content

8 To Web data processing platform Market place for technological bricks Crawl on demande Sources Packages Set of extracted data (price, posts, micro-formats) Internet Memory Research8

9 Through 23/04/2015Internet Memory Research9 Innovative app fighting information deluge and bringing you information sur mesure You give Keywords, and it brings back From the Web and social media Selected hot and relevant news, without all the noise. Today 8+M URLs are sent to the platform and around of the ¼ URLs match with users favorite topics.

10 Improve your Selection Process o Manual selection VS Newstretto o Automated refreshment rate for active sources (RSS, Forums,...) o Smart discovery crawl for large crawls (topic, language, TLD,...) Internet Memory Research10

11 Internet Memory Research11 Example of RSS Refreshment Rate (sample)

12 Search in your Large Corpus o Full text Index with Elastic Search o Automated categorization (News, Forums, Blogs,...) o Semantic expansion o TopicMatching Internet Memory Research12

13 Internet Memory Research13 Example of Semantic Expansion

14 Extract valuable information from your large corpus for Users / Researchers o Cleaned text o Keywords to add Cloud o Outlinks to analyze Graphs o Structure unstructured data (forums,...) o Named entities (partner’s brick) o Summarization (partner’s brick) o More are coming soon... Internet Memory Research14

15 Internet Memory Research15 URL Thread Dates User names Content Example of Extracted Data

16 What if you could integrate those tools on the top of your current corpus? Internet Memory Research16

17 Internet Memory Research17 Chloé Martin chloe@internetmemory.net Co-founder & Sales Manager http://archivethe.net contact@archivethe.net @archivethenet With the support of the European Commission http://newstretto.com contact@newstretto.com @newstretto http://mignify.com contact@mignify.com @mignify Internet Memory Research


Download ppt "From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014."

Similar presentations


Ads by Google