New free text search engine for

Slides:



Advertisements
Similar presentations
VuFind in a Nutshell A modern and user-friendly OPAC replacement, easily adaptable to other search applications. Built on popular and trusted Open Source.
Advertisements

Multichannel publishing of statistics (electronic publications and database) - Finnish experience Seminar on dissemination of statistics and launching.
XML-publication in Finnish Labour Force Survey (LFS) ESTP training course on Data Dissemination and Publication of Statistics Madrid, Kalle.
Learning HTML. > Title of page This is my first homepage. Tells Browser This is an HTML page Basic Tags Tells Browser End of HTML page Header information.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Disseminating Statistics: Internet and Publications INE – Madrid, 3-5 March 2008 Ulrich Wieland, Eurostat How to link publications and Internet in order.
The New Version of Academic Universe and the Product Suite Presented by Beth P. Bigman, J.D., Information Professional Consultant.
Principles of Information Systems, Sixth Edition The Internet, Intranets, and Extranets Chapter 7.
Basics of HTML What is HTML?  HTML or Hyper Text Markup Language is the standard markup language used to create Web pages.  HTML is.
The Web of data with meaning... By Michael Griffiths.
Information Retrieval in Practice
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
TC 310 The Computer in Technical Communication Dr. Jennifer Turns Week 5, Day 1 (10/28)
SpoolFactory – Free Tools Spool Conversion Software for i5/iSeries/AS400.
Overview of Search Engines
147,000 more website visits per month? Three Simple Secrets That will get your website higher on Google SEO101.
Simple Web SQLite Manager/Form/Report
Chapter 10 Publishing and Maintaining Your Web Site.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Implementing search with free software An introduction to Solr By Mick England.
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
Tutorial 1: Getting Started with HTML5
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Search Search Drupal with Apache Solr with CERN Web Communications Group – Copyright 2013.
Slide 1 Today you will: think about criteria for judging a website understand that an effective website will match the needs and interests of users use.
IManage – New Table Option For better sorting, filtering and sorting.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Dynamic Web Pages (Flash, JavaScript)
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
XML BIS4430 – unit 10. XML Origins Extensible Markup Language (XML) 1998 Inspired by Standard Generalized Markup Language (SGML) and HTML. SGML defines.
Web Indexing and Searching By Florin Zidaru. Outline Web Indexing and Searching Overview Swish-e: overview and features Swish-e: set-up Swish-e: demo.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Chapter 9 Publishing and Maintaining Your Site. 2 Principles of Web Design Chapter 9 Objectives Understand the features of Internet Service Providers.
Accessible Technology On-Line Seminar Series November 18th, Fred Gonzalez.
´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr.
Syllabus Management System. The Problem There is need for a management system for syllabi that: Provides a simple and effective user interface Allows.
Introducing the World Wide Web Internet- a structure made up of millions of interconnected computers whose users communicate with each other and share.
Searching CiteSeer Metadata Using Nutch Larry Reeve INFO624 – Information Retrieval Dr. Lin – Winter 2005.
VIRGINIA TECH BLACKSBURG CS 4624 MUSTAFA ALY & GASPER GULOTTA CLIENT: MOHAMED MAGDY IDEAL Pages.
Design a full-text search engine for a website based on Lucene
Implementing the GSIM Statistical Classification model – the Finnish way Essi Kaukonen / Statistics Finland UNECE Workshop on International Collaboration.
Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.
Presentation On HTML & Podcast Done by: Shamelia Young & Sheriece Williamson.
Applications of ST.96 XML at ROSPATENT Federal Institute of Industrial Property (ROSPATENT) Yury Zontov Engineer at Software Application Development Division.
HTML Basic Structure. Page Title My First Heading My first paragraph.
1 PDMLink Application - User Features & Functions Module 6: Search Capabilities.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Chapter 13 A & B Programming Languages and the.
Getting Your Content in the Penn State Student Portal Presented By James Leous, Program Manager James Vuccolo, Lead Research Programmer.
A presentation on ElasticSearch
Information Retrieval in Practice
Nordic webseminar, Stockholm, February 2010
Open data in Statistics Finland
Search Engine Architecture
Building Search Systems for Digital Library Collections
Dynamic Web Pages (Flash, JavaScript)
User Information Architecture: Blogs, Wikis, and RSS
Thanks to Bill Arms, Marti Hearst
XML- based dissemination process based on Common Structure of Statistical Information (CoSSI) Harri Lehtinen.
Collection Name Collection Banner Collection Name
Health On-Line Patient Education Web Site
Information and software architecture for statistical dissemination
Identify Different Chinese People with Identical Names on the Web
WEB DESIGNING THROUGH HTML
Search Demo.
OpenURL: Pointing a Loaded Resolver
Presentation transcript:

New free text search engine for www.stat.fi

Background ”Project” began year 2000 First version based on Finnish, commercial Oval search engine Oval was deemed too cumbersome to develop and lacked necessary features Search for new search service was started Jussi Arpalahti 21.09.07

New Search Service: Lucene/Solr Full text search Match highlight Categories (tags, facets, ...)‏ More like this Did you mean..? Finnish language support (Lingsoft module)‏ Free Software Fast, adaptable, scalable, popular in free and commercial settings from small to large scale Jussi Arpalahti 21.09.07

Structured Search Structure means distinguishable parts of document (title, author, paragraph, creation date)‏ Searches based on structure give better results Regular HTML page poorly structured, PC Axis and CoSSI XML much better F.x. searching directly from table title and variable names statistical data can be found Jussi Arpalahti 21.09.07

Implementation Solr supports infinite amount of fields in principle => structure easy to index Search syntax supports Boole's operators, field based, fuzzy and wildcard searches Solr accepts only text, so other solutions are needed to extract structured text from documents and feed it to index Formats supported as of now: HTML, CoSSI XML, PC Axis (PX Web)‏ Jussi Arpalahti 21.09.07

Still things to do Search service is newer ”done” Documents change and their structure evolves Users searches and search result tell more about services usability than developer(s) can test Most of Solr's features are yet to be utilized Most of Statistics Finland's documents are still poorly structured: better structure -> better search service This is just the beginning! Jussi Arpalahti 21.09.07