Download presentation
Presentation is loading. Please wait.
1
New free text search engine for www.stat.fi
2
Background ”Project” began year 2000
First version based on Finnish, commercial Oval search engine Oval was deemed too cumbersome to develop and lacked necessary features Search for new search service was started Jussi Arpalahti
3
New Search Service: Lucene/Solr
Full text search Match highlight Categories (tags, facets, ...) More like this Did you mean..? Finnish language support (Lingsoft module) Free Software Fast, adaptable, scalable, popular in free and commercial settings from small to large scale Jussi Arpalahti
4
Structured Search Structure means distinguishable parts of document (title, author, paragraph, creation date) Searches based on structure give better results Regular HTML page poorly structured, PC Axis and CoSSI XML much better F.x. searching directly from table title and variable names statistical data can be found Jussi Arpalahti
5
Implementation Solr supports infinite amount of fields in principle => structure easy to index Search syntax supports Boole's operators, field based, fuzzy and wildcard searches Solr accepts only text, so other solutions are needed to extract structured text from documents and feed it to index Formats supported as of now: HTML, CoSSI XML, PC Axis (PX Web) Jussi Arpalahti
6
Still things to do Search service is newer ”done”
Documents change and their structure evolves Users searches and search result tell more about services usability than developer(s) can test Most of Solr's features are yet to be utilized Most of Statistics Finland's documents are still poorly structured: better structure -> better search service This is just the beginning! Jussi Arpalahti
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.