 Apache Solr Apache Solr – Introduction David Shemer.

Slides:



Advertisements
Similar presentations
Lucene/Solr Architecture
Advertisements

Raptor Technical Details. Outline Workshop structured by Raptor workflow – Raptor Event model. – ICA log file parsing – ICA/MUA event storage – ICA event.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
EasySearch Technical Overview. Ever seen a website without a full text search? BUT – Search is expensive Financially Computationally – Search is complicated.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Solr has a lot of extensive features Solr Integration and Enhancements Todd Hatcher.
Information Retrieval in Practice
Introduction to Open Source Search with Apache Lucene and Solr Grant Ingersoll.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Copyright © Sequence Collective Ltd 2014 Content Search Using SOLR Copyright © Sequence Collective Ltd 2014 By: Dr. Ehab ElGindy Technical Team Lead Implementing.
Thank you SPSKC15 sponsors!. SharePoint 2013 Search Service Application (SSA) Ambar Nirgudkar Software Engineer
Overview of Search Engines
Simple Web SQLite Manager/Form/Report
Enterprise Search. Search Architecture Configuring Crawl Processes Advanced Crawl Administration Configuring Query Processes Implementing People Search.
Implementing search with free software An introduction to Solr By Mick England.
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Search Search Drupal with Apache Solr with CERN Web Communications Group – Copyright 2013.
© NYC Apache Lucene/Solr Meetup. Lucid Imagination, Inc. Agenda Welcome "Faster. Better. Solr! What to look for in Solr 1.4“ Yonik Seeley,
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć – sematext.com.
© 2014 Jenzabar, Inc. Presented by Jude Bowman Jenzabar, Inc. Oct. 17 th, 2014 Latest Enhancements to JICS: Search.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Revolutionizing enterprise web development Searching with Solr.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Searching Business Data with MOSS 2007 Enterprise Search Presenter: Corey Roth Enterprise Consultant Stonebridge Blog:
Kuali Enterprise Workflow Kuali Days – November 2008 Scott Gibson, University of Maryland Bryan Hutchinson, Cornell University James Smith, University.
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Working with Feature Services Gary MacDougall Russell Brennan.
807 - TEXT ANALYTICS Massimo Poesio Lab 2: (Quick intro to) SOLR Document clustering with MAHOUT.
How to combine IRIS products Available APIs Examples of integrations Ole Andersen Senior Strategic Account Manager.
Quick search in documents stored in DBMS InterSystems Caché using IndexTank API VІI scientific and practical seminar with international participation "Economic.
Lucene Jianguo Lu.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Representing Tabular Data in Alfresco Share “Smooth Like Butter” Gary Cox Blue Fish Development Group.
HW3 Overview There are 4 components to this homework; you will possibly not need all of them; 1. Installing Ubuntu 2. Installing Solr 3. Using Solr to.
Introduction to Enterprise Search Corey Roth Blog: Twitter: twitter.com/coreyrothtwitter.com/coreyroth.
The Business Source Databases Basic Searching
A presentation on ElasticSearch
Information Retrieval in Practice
ArcGIS for Server Security: Advanced
Introduction to YouSeer
Implementing Autocomplete with Solr and jQuery
Global Search: An Introduction and Administrator Perspective
Working with Feature Layers
DotNetNuke® Web Application Framework
LOCO Extract – Transform - Load
Searching and Indexing
Open Source distributed document DB for an enterprise
Usability vs. Purity: How UCSF and Duke Enabled Data Reuse By Going Beyond Linked Data Julia Trimmer, Anirvan Chatterjee, Eric Meeks, Richard Outten and.
Safe by default, optimized for efficiency
Building Search Systems for Digital Library Collections
Servicenow Admin Certification Training
CS 5604 Information Storage and Retrieval
CS6604 Digital Libraries IDEAL Webpages Presented by
Overview of big data tools
Lucene/Solr Architecture
Getting Started With Solr
Rafał Kuć – Sematext sematext.com
Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20
Bryan Soltis – Kentico Technical Evangelist
Indexing with ElasticSearch
Intro to Azure Search Julie Smith 2019.
Intro to Azure Search Julie Smith 2019.
Presentation transcript:

 Apache Solr Apache Solr – Introduction David Shemer

Overview  What is Solr  standalone open-source enterprise search server with a “REST-like” API, Written in java  How it works  You provide him with the Data in various formats: Text, Database Queries, Xml, Json, Word, Pdf, and attachments…  Solr indexes (Slice and Chops) them by using a schema you define in advance  In the schema there are very specific definitions to how data is sliced using anlyzers, tokenizers and filters.  You query on your data with highly featured search syntax

Overview  What can you do with solr  BLAZING Fast queries  BLAZING Fast data retrieval  facet (Group by) queries  Geospatial Search with multiple points and geo polygons  Out of the box autocomplete support  Results Highlighting  Server statistics exposed over JMX  Auto failover and recovery  Extensible Plugin Architecture

illustration of need  Search has been moving from an expensive, complicated option to an affordable and more easy necessity.  How solr is being utilized in the industry  Large websites search engine  Big Data Analytics  Fine grained access for organization users to data  Performance booster - façade to your data layer  Writing more interesting and fast algorithms  No SQL (Not only SQL) engine  Geo Location Engine  Discovery and Recommendations

Demo  Download   tutorial -  Extract  unzip solr zip  Run (on jetty)  cd sdjug/solr-4.6.0/example/  java Xmx512m -jar start.jar  The administration server is up and running 

Demo  Index  cd to exampledocs  java -jar post.jar money.xml monitor.xml manufacures.xml  Solr index using  Query  *:*  +id:a*  Display The different options for the Admin UI  View the solr directory structure

The Components – Core Config  Core  a running instance of a Solr index with its own schema and configuration.  Can hold thousands of cores simultaneously.  Core Configuration – solrconfig.xml  Index dir  Caching  Query Listeners  DIH – Data Import Handler Configuration  UI Section

The Components - Schema  Schema – schema.xml  Indexing guidelines you add to your fields:  What is searchable (indexed=true|false)  What is retrievable (stored=true|false)  Is this the default field for search (default)  Data Types  Basic data types (string, int, boolen, float )  Advanced data types (text)  Custom data types (text_en_spliting_tight)  Data type = Analyzer  Analyzer = Tokenizers + Filters

Tokenizers  Tokenizer  Split your text based on pattern  Examples:  solr.LetterTokenizerFactory – ignore all none letters chars In: "I can't.” Out: "I", "can", "t”  solr.NGramTokenizerFactory minGramSize="4" maxGramSize="5” In: "bicycle” Out: "bicy", "icyc", "cycl", "ycle", "bicyc", "icycl", "cycle"  More on:

Filters  The filter looks at each token in the stream sequentially and decides whether to pass it a long, replace it or discard it.  Examples:  solr.TrimFilterFactory In: " Kittens! ", "Duck" Out: "Kittens!", "Duck”  "solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"  More on:

Queries  Query parsers  Lucene – Default  eDismax – generally makes the best first choice query parser for user facing Solr applications.  Simple Queries:  Search for word "foo" in the title field. title:foo  AND OR (title:"foo bar" AND body:"quick fox") OR title:fox  +/ /- +title:foo -title:bar

Queries  More Queries  startDate:[ TO ]  createdate:[ T23:59:59.999Z TO T00:00:00Z]  pubdate:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]  Distance  Search the words foo and bar with distance of 4 chars "foo bar"~4  All the Documents with a field* field1:[* TO *] -field1:[* TO *]  Define query parser and Uses the default field: q={!lucene q.op=AND df=text}myfield:foo +bar -baz

Queries  Score  Each document in the results list gets one  It’s the default sort of the results  You can and should influence the score  Boosting  field1:"foo bar"^50 OR field2:"foo bar"^  Slop - distance between the terms  ps=10 field1:"foo bar"~10^50 OR field2:"foo bar"~10^20  Function Query  Nested Query

SolrJ  SolrJ - java client to access solr.  offers a java interface to add, update, and query the solr index

Alternatives  Elastic search  Sphinx  Xapian  OpenSearchServer

More Reading     Revolution-2013-Presentation

Toda Thank you. David Shemer