Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –

Search Bootstrapping How / Where to get started

Crawling Start with Nutch – http://nutch.apache.org/ Index directly to SOLR – http://www.lucidimagination.com/blog/2010/09/10 /refresh-using-nutch-with-solr/ http://www.lucidimagination.com/blog/2010/09/10 /refresh-using-nutch-with-solr/ Create a seed list from DMOZ rdf – http://www.dmoz.org/rdf.html http://www.dmoz.org/rdf.html – http://wiki.apache.org/nutch/NutchTutorial http://wiki.apache.org/nutch/NutchTutorial

Understanding Content Entity Extraction – LingPipe http://alias-i.com/lingpipe/http://alias-i.com/lingpipe/ – OpenNLP http://incubator.apache.org/opennlp/http://incubator.apache.org/opennlp/ Entity Identification / Taxonomies – Freebase http://www.freebase.com/http://www.freebase.com/

Some Additional Links Basic Web Page Parser – https://github.com/pjaol/Webcrawler https://github.com/pjaol/Webcrawler Example of OpenNLP usage – https://github.com/pjaol/entity_extractor https://github.com/pjaol/entity_extractor

Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –

Similar presentations

Presentation on theme: "Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –

Similar presentations

Presentation on theme: "Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –"— Presentation transcript:

Similar presentations

About project

Feedback