Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Slides:



Advertisements
Similar presentations
Chapter 6 Server-side Programming: Java Servlets
Advertisements

How to Look at ExBPA Files Exchange Tech Talk 10/04/2004.
EPrints Web Configuratio n Management. SQL database Web server Scripts to configure repository activities Configuration files EPrints - the Administrator's.
May 13th, Lucek Consulting Basic Java Servlet/JSP Web Development David Lucek Lucek Consulting
E-Commerce CMM503 – Lecture 8 Stuart Watt Room C2.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Maintaining State Between the Client and Server Internet Programming Using VBScript and JavaScript 9.
Lucene/Solr Architecture
Introduction to Maven 2.0 An open source build tool for Enterprise Java projects Mahen Goonewardene.
XML: Extensible Markup Language
Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
FIRST SESSION - XAMPP Jeongmin Lee.  Jeongmin Lee  CS  PHD  Machine Learning, AI  Web System Development.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Solr has a lot of extensive features Solr Integration and Enhancements Todd Hatcher.
Information Retrieval in Practice
Tutorial 6 Working with Web Forms
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
Thank you SPSKC15 sponsors!. SharePoint 2013 Search Service Application (SSA) Ambar Nirgudkar Software Engineer
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Overview of Search Engines
Implementing search with free software An introduction to Solr By Mick England.
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
Introduction to Apache Lucene/Solr CSCI 572: Information Retrieval and Search Engines Summer 2010.
Search Search Drupal with Apache Solr with CERN Web Communications Group – Copyright 2013.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
JSP Standard Tag Library
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
Configuration Management and Server Administration Mohan Bang Endeca Server.
Kuali Rice at Indiana University Rice Setup Options July 29-30, 2008 Eric Westfall.
JavaScript, Fourth Edition Chapter 12 Updating Web Pages with AJAX.
Another PillowTalk Presentation  2004 Dynamic Systems, Inc. Introduction to XML for SOA Lee H. Burstein,
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
Fundamentals of Database Chapter 7 Database Technologies.
OpenURL Link Resolvers 101
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Revolutionizing enterprise web development Searching with Solr.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Open Search Office Web Services Database Doc Mgt Sys Pipeline Index Geospatial Analysis Text Search Faceting Caching Query parsing Clustering Synonyms.
Searching Business Data with MOSS 2007 Enterprise Search Presenter: Corey Roth Enterprise Consultant Stonebridge Blog:
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Tutorial 6 Working with Web Forms. XP Objectives Explore how Web forms interact with Web servers Create form elements Create field sets and legends Create.
Tutorial 6 Working with Web Forms. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore how Web forms interact with.
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
Solutions using Microsoft Content Management Server 2002 Connector for SharePoint Technologies Sue Corke Mark Harrison Microsoft UK.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
807 - TEXT ANALYTICS Massimo Poesio Lab 2: (Quick intro to) SOLR Document clustering with MAHOUT.
Tutorial 6 Working with Web Forms. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore how Web forms interact with.
Session 1 Module 1: Introduction to Data Integrity
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Tutorial 1 Getting Started with Adobe Dreamweaver CS5.
Introduction to Enterprise Search Corey Roth Blog: Twitter: twitter.com/coreyrothtwitter.com/coreyroth.
Information Retrieval in Practice
Searching and Indexing
Building Search Systems for Digital Library Collections
Introduction to JBoss application server
Lucene/Solr Architecture
Getting Started With Solr
Indexing with ElasticSearch
Intro to Azure Search Julie Smith 2019.
Intro to Azure Search Julie Smith 2019.
Presentation transcript:

Practical Solr Guide for Developers

First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will be using Solr or text search technology in their upcoming projects?

Why am I here speaking to you about this? Several projects in 2011/2012 involving search technology One of most visited recipe sites un the US with 200,000 hits per hour during peak times Resource portal for worlds leading vendor of large format printers First encounter was with Lucene.NET which lead to Solr Second encounter with Solr on Azure Afterwards Jetty and Tomcat configurations Currently working on 99bugs.com

Solr and Lucene Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full- text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java

Not Frictionless Java Complex configuration Still evolving documentation Too many brief tutorials

What we will talk about today. Getting up and running Setting up as service Importing data Spelling Stopwords, Synonyms, Elevate Facets Replication, Zoo Keeper (Cloud setup) Integration deep dives Etc.

Solr and Lucene

Web ClientsWeb Server Solr web application (Solr.war) Core1 (recipes) data-config.xml solrconfig.xml schema.xml CMS Bash/PowerShell etc. PHP Core2 (food articles) data-config.xml solrconfig.xml schema.xml Core3 (etc.) data-config.xml solrconfig.xml schema.xml Document Repositories

Solr Terminology Solr Core: Also referred to as just a "Core" This is a running instance of a Solr index along with all of its configuration (SolrConfigXml, SchemaXml, etc...). A single Solr application can contain 0 or more cores which are run largely in isolation but can communicate with each other if necessary via the CoreContainer. From a historical perspective: Solr initially only supported one index, and the SolrCore class was a singleton for coordinating the low-level functionality at the "core" of Solr. When support was added for creating and managing multiple Cores on the fly, the class was refactored to no longer be a Singleton, but the name stuck.SolrConfigXmlSchemaXmlcreating and managing multiple Cores Facet: A distinct feature or aspect of a set of objects; "a way in which a resource can be classified" (*)* Request Handler: A Solr component that processes requests. For example, the DisMaxRequestHandler processes search queries by calling the DisMax Query Parser. Request Handlers can perform other functions, as well.DisMaxRequestHandlerDisMax

Solr Terminology Solr Core: Searchable grouping of documents (index). E.g. Core 1 = Recipes Core 2 = Articles about Food Facet: categorisation Request Handler: Functional grouping under a URL, a lot like a route under PHP frameworks e.g /core1/search -> searches recipes /core1/importxml-> triggers importing from XML files

Starting Solr under 1 minute Requirements: Downloaded and unpackaged Solr JRE Installed 1.Via command line Navigate to /apache-solr-3.6.1/example 2.Run java -Dsolr.solr.home=multicore -jar start.jar * Also see README.txt in /apache-solr-3.6.1/example

Solr With Tomcat C:\Program Files\Apache Software Foundation\Tomcat 6.0\conf\Catalina\localhost

Files and Directories solr core0 conf schema.xml solrconfig.xml data-config.xml dataimport.properties solrcore.properties data core1 solr.xml

<!-- adminPath: RequestHandler path to manage cores. If 'null' (or absent), cores will not be manageable via request handler --> Tip: use sharedLib="global_libs attribute Other options: Solr web application settings, Define your cores here along a few global settings.

schema.xml Schema XML is there you describe your data. Lucene Field definitions with analysis chain Column names and their respective Lucene types Unique key Default search field Default operator (AND/OR) – being deprecated in the future FieldTypesIncludedwithSolr Gotcha: Multivalued fields cannot be sorted

dataimport.properties Status file Managed by solr Contains import information such as last import etc. solrcore.properties Contains core specific settings assigned by developer Settings can be passed to data import definition file dataimport.properties and solrcore.properties mycore.languagegroup=en mycore.filenamefilter=.*(en|eew|enw|eez|eep)\.(xml) In data config, these options can be retrieved as: ${ mycore.languagegroup } $ {mycore.filenamefilter} Etc.

Importing Gotcha: The XPathEntityProcessor implements a streaming parser which supports a subset of xpath syntax. Complete xpath syntax is not supported but most of the common use cases are covered as follows:- xpath="/a/b/c" Gotcha: SQL Timeouts From XML XML can originate in a single file, multiple files (same schema) or HTTP Solr with loop over common data nodes using its for-each mechanism From Database You will need a JDBC driver for your database Can run multiple queries with reference variables passed from one entity to another

JDBC Timeouts <dataSource name="jdbc" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" type="JdbcDataSource" url="jdbc:sqlserver://dvtaoomb.database.windows.net:1433;database=DB ncrypt=true;hostNameInCertificate=data.ch1-1.database.windows.net" responseBuffering="full" >

Stop Words a an and are as at be but by for if in into is it no not of on or s such t that the their then there these they this to was will with Stop words list in /apache-solr-3.6.1/example/example-DIH/solr/solr/conf You can find more stopwords using schema browser

Spellcheck Solr will build a spell index from existing index Spell index will be a separate set of index files and its building needs to be triggered Spell index generation is called only once, do not call with every query rows=10&indent=on&spellcheck.build=true&spellcheck.q=stering&spellcheck=true Note: the spellcheck.build=true which is needed only once to build the spellcheck index from the main Solr index. It takes time and should not be specified with each request. Note: Combine multiple fields into single spell field using Gotcha: solr.PorterStemFilterFactory

Faceting Just Facets: start=0&rows=5&indent=on&facet=true&facet.field=ProductScale&facet.field=Prod uctLine For predictive search: start=0&rows=0&indent=on&facet=true&facet.field=Keywords&facet.prefix=a More with Facets:

Transformers RegexTransformer ScriptTransformer DateFormatTransformer NumberFormatTransformer TemplateTransformer HTMLStripTransformer ClobTransformer LogTransformer

beefstew = Beef stew bring certain documents to the top based on query Synonyms Query Elevate

Documentation Types#SolrFieldTypes-FieldTypesIncludedwithSolr +Types#SolrFieldTypes-FieldTypesIncludedwithSolr

Gotchas Form content type query-in-solr-select query-in-solr-select application/xml (not application/x-www-form-urlencoded) Mutlivalue fields cannot be sorted Dates (use date transformers) JDBC Timeouts Slow indexing with multiple database entities XPath Limitations Can you recreate your updates? Are you storing enough data?

Thank You! Radek Zajkowski