From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014.

From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014

Overview Internet Memory Research Company Vision Techno logies Services Archive the Net Mignify Newstretto Use-Cases Improve your Selection Process Search in your Web archive Extract valuable information Internet Memory Research2

Spin-off of the Internet Memory Foundation French start-up, founded in 2011 20+ engineers Actively engaged in the Web Information Mining field: EU Projects: DOPA, Annomarket, TrendMiner, Rethink Big, ASAP Clusters Cap Digital & Systematic Alliance Big Data Conferences: Search, iexpo, Crawl the Web... Internet Memory Research3

Vision The Web is full of valuable data:  Variety  Quantity This data is not so easy to collect, access and process at large scale Making Web data available will create many new business opportunities for the data ecosystem 23/04/2015Internet Memory Research4

Technologies Large Scale Crawler with high performances Scalable platform based on A distributed architecture Big data components (Hadoop, Hbase, HDFS,...) Set of proprietary and open source analytic agents providing Text Mining & Data Mining Semantical operations Statistical operations Infrastructure 170+ servers Innovative infrastructure with low consumption Internet Memory Research5

6 References

From 23/04/2015Internet Memory Research7 ✓ SaaS, automated software service with a friendly user interface ✓ Qualified team to provide quality ✓ Combining new technology and user needs Any institution whose aim is to collect and preserve web material for historical, cultural or heritage purpose For whom? Archives / Research Selective crawls with high level of Quality Assurance National Libraries Large scale crawl for the German National Library A.V. Archives Advanced module for web video and social media content

To Web data processing platform Market place for technological bricks Crawl on demande Sources Packages Set of extracted data (price, posts, micro-formats) Internet Memory Research8

Through 23/04/2015Internet Memory Research9 Innovative app fighting information deluge and bringing you information sur mesure You give Keywords, and it brings back From the Web and social media Selected hot and relevant news, without all the noise. Today 8+M URLs are sent to the platform and around of the ¼ URLs match with users favorite topics.

Improve your Selection Process o Manual selection VS Newstretto o Automated refreshment rate for active sources (RSS, Forums,...) o Smart discovery crawl for large crawls (topic, language, TLD,...) Internet Memory Research10

Internet Memory Research11 Example of RSS Refreshment Rate (sample)

Search in your Large Corpus o Full text Index with Elastic Search o Automated categorization (News, Forums, Blogs,...) o Semantic expansion o TopicMatching Internet Memory Research12

Internet Memory Research13 Example of Semantic Expansion

Extract valuable information from your large corpus for Users / Researchers o Cleaned text o Keywords to add Cloud o Outlinks to analyze Graphs o Structure unstructured data (forums,...) o Named entities (partner’s brick) o Summarization (partner’s brick) o More are coming soon... Internet Memory Research14

Internet Memory Research15 URL Thread Dates User names Content Example of Extracted Data

What if you could integrate those tools on the top of your current corpus? Internet Memory Research16

Internet Memory Research17 Chloé Martin chloe@internetmemory.net Co-founder & Sales Manager http://archivethe.net contact@archivethe.net @archivethenet With the support of the European Commission http://newstretto.com contact@newstretto.com @newstretto http://mignify.com contact@mignify.com @mignify Internet Memory Research

From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014.

Similar presentations

Presentation on theme: "From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014.

Similar presentations

Presentation on theme: "From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014."— Presentation transcript:

Similar presentations

About project

Feedback