Project Overview Bibliographic merging, Endeca, and Web application.

Slides:



Advertisements
Similar presentations
AskMe A Web-Based FAQ Management Tool Alex Albu. Background Fast responses to customer inquiries – key factor in customer satisfaction Costs for customer.
Advertisements

ALEPH 500 Union Catalogue Overview Judy Levi Senior Product Analyst Ex Libris Ltd. November 2004.
Shared Bib Environment Introduction The Shared Bib project involves changing the Aleph environments that FCLA supports Currently only the Aleph software,
Information & Library Services Australian Education Index, British Education Index and ERIC Sally Giffen August 2006.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
University of Adelaide Library Life Impact The University of Adelaide The well connected catalogue Patricia Scott, Denise Tobin and Helen Attar.
Information Retrieval in Practice
New Library Catalogue Interface Proposal 3. Introduction This presentation will outline the design decisions for the new interface of the on-line library.
An Open Source ILS Independent OPAC Jackie Wrosch, Systems Librarian Eastern Michigan University.
DT228/3 Web Development Databases. Database Almost all web application on the net access a database e.g. shopping sites, message boards, search engines.
Overview of Search Engines
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
Databases & Data Warehouses Chapter 3 Database Processing.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Sys Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 15: PHP Introduction.
Florida Center for Library Automation
The FCLA Endeca Project By Michele Newberry. M.Newberry2 Why ENDECA?  Already proven by NCSU  Build on NCSU’s work instead of starting from zero  Product.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
JSP Standard Tag Library
CSCI 6962: Server-side Design and Programming Course Introduction and Overview.
RUG Australia meeting 2012 Feb 6, V Tiers & sequencing suppliers Tiers and sequencing and load balancing  Tiers = groups of suppliers.
Classroom User Training June 29, 2005 Presented by:
Configuration Management and Server Administration Mohan Bang Endeca Server.
Version 18 Upgrade: Web OPAC. Version 18 Upgrade: Web OPAC Customization 2 All of the information in this document is the property of Ex Libris Ltd. It.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Connecting users to Collections Collection Development/Resource Sharing Conference March 26, 2009 Jean Phillips Florida Center for Library Automation
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Protecting Patron Information in a Consortial Environment Issues and Strategies Jennifer Kuntz
The FCLA Endeca Project By Michele Newberry. M.Newberry2 Current OPAC environment  Aleph 500 v.15.5  Heavily customized to reflect pre- implementation.
University Library System, CUHK 香港中文大學圖書館系統 University Library System The Chinese University of Hong Kong Simple, Flexible and Informative - Personalised.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
NCSU Libraries Andrew Pace & Emily Lynema NCSU Libraries May 24, 2006.
NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.
ICOLC Paris, 27 Oct Putting the “M” in Mango ICOLC -Paris 27 October 2009.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Web OPAC & GUI (Staff) Search v.16 eSeminar Doron Greenshpan.
Searching Business Data with MOSS 2007 Enterprise Search Presenter: Corey Roth Enterprise Consultant Stonebridge Blog:
Chapter 6 Server-side Programming: Java Servlets
The Internet 8th Edition Tutorial 4 Searching the Web.
What’s new in Kentico CMS 5.0 Michal Neuwirth Product Manager Kentico Software.
Z39 Server and Z39.50 Gateway. Z39 Configuration Z39.50 Server Bath Profile conformance has been added to the Z39 Server. Z39 server supports Structure.
Implementing a Faceted Search Framework Emily Lynema & Andrew K. Pace NC State University Libraries ASIS&T Seminar April 9, 2007.
Endeca: a faceted search solution for the library catalog Kristin Antelman & Emily Lynema UNC University Library Advisory Council June 15, 2006.
MetaLib 4 User Guide. 2 MetaLib 4 Access MetaLib at: – MetaLib may be used at two different levels –
What You Get With Mango Mango is not just your library catalog, it's a set of public access services that are tied around your library's catalog and the.
MaNGO Mango is FCLA created application that uses the Solr/Lucene search engine and repository Begun in October 2007, live in August 2008, major overhaul.
Single Bib Test Shared Bib environment on Aleph test server is called SBTest Within the SBTest environment share Alephe, Scratch and Print directories.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
Free the Data: creating a web services interface to the online catalog Emily Lynema NC State University Libraries Code4lib 2007 February 28, 2007.
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
WEB SERVER SOFTWARE FEATURE SETS
I. Understanding Record Loading and EDIS II. Database Statistics & Top 10 Search III. Problem with merging records IV. Pseudo Tag (Special 035 Tag ) V.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Loading Bibliographic Records Online and in Batch Pat Riva Romance Languages Cataloguer/ Bibliographic Database Specialist McGill University
Implementation of a faceted catalog search solution Kristin Antelman & Emily Lynema NCSU Libraries Feb. 7, 2006.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
VuFind Digital Libraries à la Carte International Ticer School 2009 Tilburg University 31 July, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB)
Lucene Jianguo Lu.
Web OPAC Developments 14.2 Seminar March Seminar 2 WEB OPAC: Major Changes 1.Apache 2.UTF-8 environment 3.Profile sensitive user environment.
10 Copyright © 2004, Oracle. All rights reserved. Building ADF View Components.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Presented By:. What is JavaHelp: Most software developers do not look forward to spending time documenting and explaining their product. JavaSoft has.
Cataloging v.16 eSeminar September 2003 Judith Fraenkel.
Katalozi nove generacije Jadranka Stojanovski. PODACI I BAZA PODATAKA FeatureDescription MetasearchIncluding all resources available to the user. Enriched.
Putting the ‘M’ in MANGO Article Searching using the MetaLib X-Server in Florida’s Discovery Tool Joshua Greben, Systems Librarian/Analyst Florida Center.
Information Retrieval in Practice
Integrated ILL GUI desktop
Knowledge Byte In this section, you will learn about:
Presentation transcript:

Project Overview Bibliographic merging, Endeca, and Web application

Three Processes Merging of bibliographic records –Pre-processing stage –8M contractual record limit Endeca Forge, Dgidx, and MDEX Web application –Presentation platform –Can be used to present more than data from the MDEX Engine

Merging of bibliographic records BIB and HOL data extracted from Aleph Oracle (z00) x 11 Merge routine –'Endeca Field Mapping and Pipeline' shows the action that is taken during the merge routine for each MARC field –Deduplication based on OCLC no –HOL data written into the Union MARC –The Aleph service p_print_03 is run for all merged records to apply UTF8 encoding and Material Types for the BIB records. Material Types for all SUL BIB records are applied using a single instance of Aleph’s tab_type_config. The Aleph file tab_type_config translates the information encoded in the LDR/008/007/006 of the BIB record into a two-letter material type code that is placed in the BIB record, which is the source of the format facet in Endeca. –The merge routine happens before the field mapping. Other data sources in the future (digital libraries) –Via DLU01 –Direct

Endeca Forge Dgidx and MDEX Forge is a data processing program Dgidx is an indexing program MDEX is the search engine/API that serves data in response to a query, includes all of the information needed to build an entire page Documents – –‘Endeca Field Mapping and Pipeline’ shows how MARC fields are mapped to Endeca record fields. –‘Endeca Dimensions (Facet) Mappings’ shows how MARC fields are mapped to Endeca dimensions (facets). –‘Endeca Search Configuration’ shows the search 'interfaces' and the Endeca record fields that are searched

Forge is a data processing program Endeca provided a custom MARCadapter to transform MARC records into records that are readable by Forge –FCLA has modified the “MARCadapter” files to help define the “Online” format for bibliographic records and assist with other features of the WebApp where there needs to be a custom field in the Endeca record. –E.g. we apply the Online format code to the record based on the presence of http in $u of the 856 but excluding "table of contents," "publisher," "sample text," or "contributor." Transforms your source data into standardized, tagged Endeca records Each record has a list of dimension (text) values tagged to it.

Dgidx Indexing program that reads the tagged Endeca records that were prepared by Forge Creates the proprietary indices for the Endeca MDEX Engine –Dgraph: An Index for every N-value –Entire Endeca Database stored in memory –Output stored in directories the file system of the Endeca box. Indexing Configuration in Endeca (pipeline) includes: –Stop words –Character normalization and ‘internationalization’ –Thesaurus and Stemming (automatic) –Taxonomies ('hierarchies') e.g. LCC/NLM –Search Configuration ('interfaces') –Relevance ranking –DYM and Spell Correction –Truncation

MDEX Engine Serves data in response to a query via the API, includes all of the information needed to build an entire page Queries include Search and Navigation An entire page (object) is returned in response to a query, constructed from a subset of the Dgraph. Subsequent navigation is applied to this object, not the entire Dgraph ( Follow-up queries are faster).

“WebApp” Apache Tomcat and JSP –Maintains a connection state with Endeca Nav Engine (Dgraph/MDEX) –Similarly, maintains a connection with ORACLE via JDBC –Restarted every morning (with refreshed configuration files).

“WebApp” Other than the Forge, Dgidx programs, and some control scripts, this is what Endeca provides: Endeca API includes: –Http Connections into the MDEX –Method to query via a URL –An result object that can be parsed and manipulated for display –A method to get the dimensions, dimension values, and corresponding IDs for a particular Navigation state. –Other Classes and Methods –Boolean query mode and other query match modes Interaction with other features (no stop words, stemming, spelling, thesaurus, ranking) Proximity searching (NEAR/n, ONEAR/n) Statistical Report (See and linked document that explains the reporting categories) Everything else is a customization of the Web application by FCLA.

“WebApp” Features Hooks into Aleph –Display of item information Sublibrary/Collection (Aleph tabs) According to item status (Aleph tabs) Availability (Circ) status (SQL) Detailed holdings (SQL) –Patron Empowerment Loans list (SQL) Renewals (API) Requests and Holds (API) –SFX Contextual links for Full Text (Query SFX server) –List Functions and Session ID (SQL)

“WebApp” Features Custom Features –Advanced Search and limits –RSS for Result Set – list of records –Hooks to RefWorks –Permalink –Marc Views via SQL –Debug View (raw Endeca record) –Browse lists (in development) –Book covers (in development) –…What’s next?

Development and System Environment The Endeca MDEX Server –125gig mem –Uses 317 gig of file space (out of of 392 gig). –Each dgraph uses about 16% of the memory 1-5 dgraphs are running at a given time. Each dgraph uses about 20gig memory. The WebApp Server –7gig mem. –File space is not an issue. No dgraphs. –Handles 19meg http requests. Forge box –32gig mem. –197 gig file space. –Runs dgidx which uses 17 gigs, and forge which uses 2.4 gigs. Subversion –svnlog (