Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bielefeld Academic Search Engine

Similar presentations


Presentation on theme: "Bielefeld Academic Search Engine"— Presentation transcript:

1 Bielefeld Academic Search Engine
Open Scholarship 2006 New Challenges for Open Access Repositories Univ. of Glasgow, October 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Friedrich Summann Bielefeld University Library

2 Overview: BASE: concept and content
Overview BASE user-interface and further visions BASE dataflow OAI harvesting challenges BASE interfaces Demo

3 BASE: concept and content
BASE uses Fast Data Search BASE uses Linux-based multi-node system BASE contains intellectual selected resources with focus on OAI Servers but also web crawled content BASE displays result lists as bibliographic data and full text hits BASE frontend is written in PHP using the search API from Fast Data Search BASE offers sorting, search refinement and search history

4 TUNING, ADMINISTRATION and DEBUGGING
BASE: concept and content TUNING, ADMINISTRATION and DEBUGGING CONNECTORS SEARCH Search API WEB CRAWLER Pipeline Pipeline QUERY & RESULT PROCESSING PROCESSING DOCUMENT FILE TRAVERSER INDEX FILES FILTER Pipeline

5 BASE: concept and content
At present 3.8 mio documents in 274 collections, 15 of them web crawled data

6 BASE: concept and content
Projekt Gutenberg-DE Internet Library of Early Journals Oxford Various Institutional Repositories Springer Link Metadata Cornell HistMath Fulltext Crawl University Michigan Historical Math CiteSeer Zentralblatt Mathematik Bielefeld Univ: Math. Preprints ArXiv OPAC UL Bielefeld Ifo Institute Munich PubMed Journals of Enlightment (Digital Collection of Bielefeld UL)

7 Special view on IR server collections
Collections are listed in configuration file [ftubirmingham] url = " desc_de = "The Univ. of Birmingham: Eprints Archive" desc_en = "The Univ. of Birmingham: Eprints Archive" descdd_de = "Birmingham Univ." descdd_en = "Birmingham Univ." Collections can be clustered for user-interface, e.g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], … Parametric search possible Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)

8 BASE: end-user interface (1)
Displays search results as bibliographic data and full text hits

9 BASE: end-user interface (2)
The result list (left hand side) If the document contains meta data (e.g. title, author, abstract) the displayed description is highlighted

10 BASE: end-user interface (3)
The result list (right hand side) Various options to sort the result set Search refinement by author, keyword, document type, language etc. Search history comprises up to 10 queries

11 BASE: end-user interface (4)
Search Refinement Select an author ... ... only documents by this author are displayed

12 Google Scholar integration
Check citations (citing articles) in Google Scholar ...

13 Vision: DDC Browsing

14 BASE dataflow Database Records Web Pages OAI-Data Harvesting Pre-Processing Processing Internal Index (FAST) User interface (PHP)

15 OAI-compliant university repositories in BASE
4 3 18 39 USA 82 Canada 14 South America 2 Africa 3 India 5 Australia 11 New Zealand 1 3 17 2 6 55 1 12 7 1 3 12 16 2 1

16 OAI harvesting challenges
Repositories do not response or deliver Error Messages Links to the Document are not included or do not work XML file is not well-formed Data contain only References without any Fulltext Access to fulltext often is restricted Field content varies

17 Some Rules from the Harvesting Practice
Standard repository software is great - for OAI harvesting as well Small collections – small problems Getting the related fulltext is complicated Libraries produce better metadata Writing s helps - sometimes Data aggregation may produce problems

18 BASE interfaces Search form HTTP calls Web Service

19 Local integration (via search form)
E-Repository Integration <form action=" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /> </form>

20 Prototype: Search Based on SOAP interface
(EU project DRIVER)

21 Thank you!


Download ppt "Bielefeld Academic Search Engine"

Similar presentations


Ads by Google