Search Bootstrapping How / Where to get started
Crawling Start with Nutch – Index directly to SOLR – /refresh-using-nutch-with-solr/ /refresh-using-nutch-with-solr/ Create a seed list from DMOZ rdf – –
Understanding Content Entity Extraction – LingPipe – OpenNLP Entity Identification / Taxonomies – Freebase
Some Additional Links Basic Web Page Parser – Example of OpenNLP usage –