Revolutionizing enterprise web development Searching with Solr.

Slides:



Advertisements
Similar presentations
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Advertisements

Introduction to Maven 2.0 An open source build tool for Enterprise Java projects Mahen Goonewardene.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Customizing the MOSS 2007 Search Results November 2007 Rafael Perez.
REQUIREMENTS Before starting, these requirements should be met: Service Body Requirements Trusted Servant Requirements Server/ISP Requirements Budget.
 Apache Solr Apache Solr – Introduction David Shemer.
For ITCS 6265 Professor: Wensheng Wu Present by TA: Xu Fei.
Drupal Online Tutorial A Product of an ENGL 421 class at Purdue University Page 1.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Solr has a lot of extensive features Solr Integration and Enhancements Todd Hatcher.
SharePoint 2013 Search NO LONGER JUST FOR ADMINS, NO LONGER JUST FOR FINDING DOCUMENTS.
Introduction to Open Source Search with Apache Lucene and Solr Grant Ingersoll.
Drupal Create a website/web app quickly with this Content Management System Jiaying Xu Spring 2011 COMS E6125 Web-enHanced Information.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
EASY LOGISTICS CENTER - the TURNTABLE for information, documents and processes EASY LOGISTICS CENTER DOCUMENTS SHOP CONTENT COMMUNITY MODULES EASY ENTERPRISE.
An Open Source ILS Independent OPAC Jackie Wrosch, Systems Librarian Eastern Michigan University.
Get closer to the most advanced CMS Mihail Semedzhiev Joomla!
Overview of Search Engines
The easy way to a nice looking website design By a total non-designer (Me!)
GOAT SEARCH Revorg GOAT Search Solution (Powered by Lucene)
Implementing search with free software An introduction to Solr By Mick England.
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
Drupal Workshop Introduction to Drupal Part 1: Web Content Management, Advantages/Disadvantages of Drupal, Drupal terminology, Drupal technology, directories.
Drupal 7 as an enterprise web application framework Why as a developer you should use Drupal to build web applications? Klaus Harris DrupalCon Munich 2012.
Web Archiving at the Innsbruck Newspaper Archive Innsbrucker Zeitungsarchiv / IZA Presentation by Renate Giacomuzzi, Elisabeth Sporer, Armin Schleicher.
Introduction to Apache Lucene/Solr CSCI 572: Information Retrieval and Search Engines Summer 2010.
Search Search Drupal with Apache Solr with CERN Web Communications Group – Copyright 2013.
Welcome to Drupal Crash course - Gartheeban Ganeshapillai.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
SCHOOL OF INFORMATION UNIVERSITY OF MICHIGAN si.umich.edu Drupal: Configuration and Customization Week 4: Installation, Module Development January 25,
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Drupal Training Syllabus Chaitanya Lakshmi
Kuali Rice at Indiana University Rice Setup Options July 29-30, 2008 Eric Westfall.
Wikis are websites where pages can be edited using an online document editor. Users can easily edit and share content. Enterprise wikis are platforms.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
From Creation to Dissemination A Case Study in the Library of Congress’s use Open Source Software DLF Spring Forum Corey Keith
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Peter Laird. | 1 Building Dynamic Google Gadgets in Java Peter Laird Managing Architect WebLogic Portal BEA Systems.
Forensic Drupal Debugging Dan Harris daneesia on drupal.org.
U.S Geological Survey National Biological Information Infrastructure Technical Overview: NBII Metadata Clearinghouse May 2008 Mike Frame.
Module 10 Administering and Configuring SharePoint Search.
What’s new in Kentico CMS 5.0 Michal Neuwirth Product Manager Kentico Software.
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
AxKit A member of the Apache XML project Ryan Maslyn Kyle Bechtel.
Greenstone Internals How to Build a Digital Library Ian H. Witten and David Bainbridge.
807 - TEXT ANALYTICS Massimo Poesio Lab 2: (Quick intro to) SOLR Document clustering with MAHOUT.
VIVO architecture March 1, Major Components Vitro is a general-purpose Web-based application leveraging semantic standards VIVO is a customized.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Marcel Trümpy Platform Technology Advisor Microsoft Switzerland
CMS Showdown What Is A Content Management System (CMS)? CMS Website Content Outside Content Social Media Connections with CRM Programs Statistics and.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Introduction to Enterprise Search Corey Roth Blog: Twitter: twitter.com/coreyrothtwitter.com/coreyroth.
How to use Drupal Awdhesh Kumar (Team Leader) Presentation Topic.
Apache Lucene Searching the Web and Everything Else Daniel Naber Mindquarry GmbH ID 380.
Introduction to YouSeer
Global Search: An Introduction and Administrator Perspective
Magento Development Company
IST 516 Fall 2010 Dongwon Lee, Ph.D. Wonhong Nam, Ph.D.
About Client Client is a pioneer in industry that provides catastrophe risk modeling, real-time risk exposure and risk management through available live.
DotNetNuke® Web Application Framework
Open Source distributed document DB for an enterprise
Custom search forms with Apache Solr David Hernández
Processes The most important processes used in Web-based systems and their internal organization.
Building Search Systems for Digital Library Collections
PHP / MySQL Introduction
April 15, 2014 Faceted Browsing: Analysis and implementation of a Big Data Solution using Apache Solr. Advisor: Prof. Sonia Bergamaschi Co-Advisor: Prof.
Lucene/Solr Architecture
Getting Started With Solr
Presentation transcript:

Revolutionizing enterprise web development Searching with Solr

What is Solr? Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. What’s Lucene? Apache Lucene TM is a high-performance, full- featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

What is Solr? Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. See for more info.

Why Solr? Why Solr or why Solr with Drupal? Core Drupal SearchSolr Search Reasonable performance only for small sites Quality performance for all installations, including large deployments Poor scalability: Relies on Drupal’s DB to handle all search results Quality scalability: Single-purpose servers independent of Drupal Few configuration options (better in D7 than D6) Significant configuration options out of the box, including configurable filters and indexed material Few search optionsSignificant search options out of the box (based on filters above) No multi-site capabilityMulti-site (even non-Drupal sites) capabilities

Where does it fit? Sits beside your application servers in the stack PHP communicates with the Solr servers (Apachesolr modules handles this for you) Retrieve: URL strings Push: XML packets

Solr Setup Options Self-Hosted Look for “Download Solr here”here Service Acquia search search

Solr Setup Example directory Start.jar java -jar start.jar & > /dev/null & Solr directory Conf directory Schema.xml Solrconfig.xml

Solr Setup Solr admin accessible here:

Solr Setup Schema.xml Primarily handles what is indexed

Solr Setup Solrconfig.xml Handles general configuration. Might need to edit it for replication or if you plan to do file handling on the Solr server.

Drupal + Solr Core Module: Apachesolr Optional Modules: Apachesolr_multisitesearch Self-explanatory Apachesolr_attachments Requires an additional Solr component (Tika). Allows full-text indexing of docs. Apachesolr_views Sorta…& maybe someday

Drupal + Solr Basic Drupal Settings

Drupal + Solr Examples of filters that can be surfaced

Example: Drupal.org

Solr hooks Add new data to the index By default, all data displayed on the node view is indexed. We can also set up additional information to be indexed and/or filtered even if the information is not on the node page. It’s worth taking a look at apachesolr_node_to_document (in apachesolr.index.inc)

Solr hooks hook_apachesolr_update_index (&$document, $node, $namespace) Allows a module to change the contents of the $document object before it is sent to the Solr Server

Solr hooks Altering the query (3 possible methods) hook_apachesolr_prepare_query(&$query, &$params, $caller) Occurs before the query is cached Modifications you make can be used by others

Solr hooks

Altering the query (3 possible methods) hook_apachesolr_modify_query(&$query, &$params, $caller) Occurs after the query is cached Modifications that you don’t want other modules to inherit

Solr hooks

Altering the query (3 possible methods) _finalize_query (&$query, &$params) Occurs after the query is cached Technically only for use by modules originating Solr queries (aka custom Solr search invocations, not the search page)

Solr hooks hook_apachesolr_search_result_alter(&$doc, &$extra) Allows for modification of each search result independently

Solr hooks hook_apachesolr_process_results(&results) Allows for modification of all search results

Solr hooks No technically a hook, but worth noting that search theming is identical to search module. search-result.tpl.php search-results.tpl.php If you pass the same values from Solr as you had via node_load, the theming template becomes interchangeable.

Summary Apachesolr module provides a replacement for core Drupal search with better performance, scalability, and configuration than Drupal default. Solr requires a separate service running on Jetty or Tomcat. hook_apachesolr_update_index provides a way to change what goes into the index. hook_prepare_query, hook_modify_query and _finalize_query allow return modifications. hook_apachesolr_search_result_alter & hook_apachesolr_process_results allow for result modification. Theming is the same as core.

Thank You Bill O’Connor, CTO d.o: csevb10 t: csevb10 e: