11/21/2000Database Management -- Spring 1998 -- R. Larson Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library University.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Distributed Data Processing
Chapter 10: Designing Databases
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
For Mapping Biodiversity Data Data Management Options.
Information Retrieval in Practice
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
11/20/2001Database Management -- Spring R. Larson Databases and the Future University of California, Berkeley School of Information Management.
SLIDE 1IS Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
SLIDE 1IS Fall 2004 Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library University of California,
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
SLIDE 1IS 257 – Spring 2004 Object-Relational Database System Features University of California, Berkeley School of Information Management.
Content Management Systems Digital Resources for Research in the Humanities 2001.
11/15/2001Database Management -- Spring R. Larson Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library University.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Development of Japanese GIS Tool for use in the Humanities ○ Masatoshi ISHIKAWA †, Yoichi KAWANISHI ††, Hidefumi OKUMURA †††, Shoichiro HARA †††† † University.
Overview of Search Engines
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Educause October 29, 2001 A GEM of a Resource: The Gateway to Educational Materials Copyright Nancy Virgil Morgan, This work is the intellectual.
Digital Library Architecture and Technology
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
MAHI Research Database Data Validation System Software Prototype Demonstration September 18, 2001
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Introduction.
3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of.
Master Thesis Defense Jan Fiedler 04/17/98
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
UVa's Digital Library CSG - September 2005 Slides courtesy of: Leslie Johnston Director, Digital Access Services, UVA Library Tim Sigmon University of.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
SLIDE 1DID Meeting - Montreal Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California,
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
Introduction to metadata
Tsinghua University Library Yang Zhao & Airong Jiang Tsinghua University Library, Beijing China 4 June, 2004 Electronic Thesis and Dissertation System.
1 Overview Finding and importing data sets –Searching for data –Importing data_.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Information Retrieval
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Overviews of the Library of Texas & ZLOT Project Dr. William E. Moen Principal Investigator.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
A Logistic Regression Approach to Distributed IR Ray R. Larson : School of Information Management & Systems, University of California, Berkeley --
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Alexandria Digital Library The ADL Testbed Greg Janée
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Alexandria Digital Library ADL Metadata Architecture Greg Janée.
Oya Y. Rieger Cornell University Library May 2004
Introduction to Information Retrieval
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Presentation transcript:

11/21/2000Database Management -- Spring R. Larson Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management

11/21/2000Database Management -- Spring R. Larson Today Object Relational Database Applications –The Berkeley Digital Library Project Slides from RRL and Robert Wilensky, EECS –Use of DBMS in DL project.

11/21/2000Database Management -- Spring R. Larson Final Presentations and Reports Specifications for final report are on the Web Site under assignments Presentations (1 on Nov. 28, Others on Nov 30, Dec 5 th and 7 th (Full))

11/21/2000Database Management -- Spring R. Larson Today Object Relational Applications The UCB Digital Library

11/21/2000Database Management -- Spring R. Larson Overview What is an Digital Library? Overview of Ongoing Research on Information Access in Digital Libraries

11/21/2000Database Management -- Spring R. Larson Digital Libraries Are Like Traditional Libraries... Involve large repositories of information (storage, preservation, and access) Provide information organization and retrieval facilities (categorization, indexing) Provide access for communities of users (communities may be as large as the general public or small as the employees of a particular organization)

11/21/2000Database Management -- Spring R. Larson Originators Libraries Users Traditional Library System

11/21/2000Database Management -- Spring R. Larson But Digital Libraries Are Different From Libraries... Not a physical location with local copies; objects held closer to originators Decoupling of storage, organization, access Enhanced Authoring (origination, annotation, support for work groups) Subscription, pay-per-view supported in addition to “free” browsing. Integration into user tasks.

11/21/2000Database Management -- Spring R. Larson Originators Repositories Users A Digital Library Infrastructure Model Index Services Network

11/21/2000Database Management -- Spring R. Larson UC Berkeley Digital Library Project Focus: Work-centered digital information services Testbed: Digital Library for the California Environment Research: Technical agenda supporting user- oriented access to large distributed collections of diverse data types. Part of the NSF/NASA/DARPA Digital Library Initiative (Phases 1 and 2)

11/21/2000Database Management -- Spring R. Larson UCB Digital Library Project: Research Organizations UC Berkeley EECS, SIMS, CED, IS&T UCOP Xerox PARC’s Document Image Decoding group and Work Practices group Hewlett-Packard NEC SUN Microsystems IBM Almaden Microsoft Ricoh California Research Philips Research

11/21/2000Database Management -- Spring R. Larson Collection: Diverse material relevant to California’s key habitats. Users: A consortium of state agencies, development corporations, private corporations, regional government alliances, educational institutions, and libraries. Potential: Impact on state-wide environmental system (CERES ) Testbed: An Environmental Digital Library

11/21/2000Database Management -- Spring R. Larson The Environmental Library - Users/Contributors California Resources Agency, California Environment Resources Evaluation System (CERES) California Department of Water Resources The California Department of Fish & Game SANDAG UC Water Resources Center Archives New Partners: CDL and SDSC

11/21/2000Database Management -- Spring R. Larson The Environmental Library - Contents Environmental technical reports, bulletins, etc. County general plans Aerial and ground photography USGS topographic maps Land use and other special purpose maps Sensor data “Derived” information Collection data bases for the classification and distribution of the California biota (e.g., SMASCH) Supporting 3-D, economic, traffic, etc. models Videos collected by the California Resources Agency

11/21/2000Database Management -- Spring R. Larson The Environmental Library - Contents As of late 2000, the collection represents about one terabyte of data, including over 165,000 digital images, about 300,000 pages of environmental documents, and nearly 2 million records in geographical and botanical databases.

11/21/2000Database Management -- Spring R. Larson Botanical Data:  The CalFlora Database contains taxonomical and distribution information for more than 8000 native California plants. The Occurrence Database includes over 600,000 records of California plant sightings from many federal, state, and private sources. The botanical databases are linked to our CalPhotos collection of Calfornia plants, and are also linked to external collections of data, maps, and photos.

11/21/2000Database Management -- Spring R. Larson Geographical Data:  Much of the geographical data in our collection is being used to develop our web-based GIS Viewer. The Street Finder uses 500,000 Tiger records of S.F. Bay Area streets along with the 70,000- records from the USGS GNIS database. California Dams is a database of information about the 1395 dams under state jurisdiction. An additional 11 GB of geographical data represents maps and imagery that have been processed for inclusion as layers in our GIS Viewer. This includes Digital Ortho Quads and DRG maps for the S.F. Bay Area.

11/21/2000Database Management -- Spring R. Larson Documents:  Most of the 300,000 pages of digital documents are environmental reports and plans that were provided by California state agencies. This collection includes documents, maps, articles, and reports on the California environment including Environmental Impact Reports (EIRs), educational pamphlets, water usage bulletins, and county plans. Documents in this collection come from the California Department of Water Resources (DWR), California Department of Fish and Game (DFG), San Diego Association of Governments (SANDAG), and many other agencies. Among the most frequently accessed documents are County General Plans for every California county and a survey of 125 Sacramento Delta fish species.

11/21/2000Database Management -- Spring R. Larson Documents - cont.  The collection also includes about 20Mb of full-text (HTML) documents from the World Conservation Digital Library. In addition to providing online access to important environmental documents, the document collection is the testbed for our Multivalent Document research.

11/21/2000Database Management -- Spring R. Larson Testbed Success Stories LUPIN: CERES’ Land Use Planning Information Network –California Country General Plans and other environmental documents. –Enter at Resources Agency Server, documents stored at and retrieved from UCB DLIB server. California flood relief efforts –High demand for some data sets only available on our server (created by document recognition). CalFlora: Creation and interoperation of repositories pertaining to plant biology. Cloning of services at Cal State Library, FBI

11/21/2000Database Management -- Spring R. Larson Research Highlights Documents –Multivalent Document prototype Page images, structured documents, GIS data, photographs Intelligent Access to Content –Document recognition –Vision-based Image Retrieval: stuff, thing, scene retrieval –Natural Language Processing: categorizing the web, Cheshire II, TileBar Interfaces

11/21/2000Database Management -- Spring R. Larson Multivalent Documents MVD Model –radically distributed, open, extensible –“behaviors” and “layers” behaviors conform to a protocol suite inter-operation via “IDEG” Applied to “enlivening legacy documents” –various nice behaviors, e.g., lenses

11/21/2000Database Management -- Spring R. Larson Document Presentation Problem: Digital libraries must deliver digital documents -- but in what form? Different forms have advantages for particular purposes –Retrieval –Reuse –Content Analysis –Storage and archiving Combining forms (Multivalent documents)

11/21/2000Database Management -- Spring R. Larson Spectrum of Digital Document Representations Adapted from Fox, E.A., et al. “Users, User Interfaces and Objects: Evision, an Electronic Library”, JASIS 44(8), 1993

11/21/2000Database Management -- Spring R. Larson Document Representation: Multivalent Documents Primary user interface/document model for UCB Digital Library (Wilensky & Phelps) Goal: An approach to new document representations and their authoring. Supports active, distributed, composable transformations of multimedia documents. Enables sophisticated annotations, intelligent result handling, user-modifiable interface, composite documents.

11/21/2000Database Management -- Spring R. Larson Multivalent Documents Cheshire Layer OCR Layer OCR Mapping Layer History of The Classical World The jsfj sjjhfjs jsjj jsjhfsjf sjhfjksh sshf jsfksfjk sjs jsjfs kj sjfkjsfhskjf sjfhjksh skjfhkjshfjksh jsfhkjshfjkskjfhsfh skjfksjflksjflksjflksf sjfksjfkjskfjskfjklsslk slfjlskfjklsfklkkkdsj ksfksjfkskflk sjfjksf kjsfkjsfkjshf sjfsjfjks ksfjksfjksjfkthsjir\\ ks ksfjksjfkksjkls’ks klsjfkskfksjjjhsjhuu sfsjfkjs Modernjsfj sjjhfjs jsjj jsjhfsjf sslfjksh sshf jsfksfjk sjs jsjfs kj sjfkjsfhskjf sjfhjksh skjfhkjshfjksh jsfhkjshfjkskjfhsfh skjfksjflksjflksjflksf sjfksjfkjskfjskfjklsslk slfjlskfjklsfklkkkdsj GIS Layer taksksh kdjjdkd kdjkdjkd kj sksksk kdkdk kdkd dkk skksksk jdjjdj clclc ldldl taksksh kdjjdkd kdjkdjkd kj sksksk kdkdk kdkd dkk skksksk jdjjdj clclc ldldl Table 1. Table Layer kdk dkd kdk Scanned Page Image Valence: 2: The relative capacity to unite, react, or interact (as with antigens or a biological substrate). Webster’s 7th Collegiate Dictionary Network Protocols & Resources

11/21/2000Database Management -- Spring R. Larson

11/21/2000Database Management -- Spring R. Larson

11/21/2000Database Management -- Spring R. Larson MVD Third Party Work Japanese support by NEC; application to office document management Printing, support for other OCR formats, by HP Chinese character and multilingual lens by UCB Instructional Support staff (Owen McGrath) Automatic enlivening of documents via Transcend proxy.

11/21/2000Database Management -- Spring R. Larson MVD Forthcoming Support for XML + style sheets More robust parsing Saving where you want Media adaptors for –Continuous media –Near image formats, word proc. formats Improve authoring tools Interoperation with paper Application versus applet? Release to community, get feedback, iterate.

11/21/2000Database Management -- Spring R. Larson GIS in the MVD Framework Layers are georeferenced data sets. Behaviors are –display semi-transparently –pan –zoom –issue query –display context –“spatial hyperlinks” –annotations Written in Java (to be merged with MVD-1 code line?)

11/21/2000Database Management -- Spring R. Larson GIS Viewer: Recent Developments Annotation and saving –points, rectangles (w. labels and links), vectors –saving of annotations as separate layer Integration with address, street finding, gazetteer services Application to image viewing: tilePix Castanet client

11/21/2000Database Management -- Spring R. Larson

11/21/2000Database Management -- Spring R. Larson

11/21/2000Database Management -- Spring R. Larson

11/21/2000Database Management -- Spring R. Larson GIS Viewer Example

11/21/2000Database Management -- Spring R. Larson Geographic Information: Plans and Ideas More annotations, flexible saving Support for large vector data sets Interoperability –On-the-fly conversion of formats generation of “catalogs” –Via OGDI/GLTP –Experimenting with various CERES servers

11/21/2000Database Management -- Spring R. Larson Documents: Information from scanned document Built document recognizers for some important documents, e.g. “Bulletin 17”. “TR-9”. Recognized document structure, with order magnitude better OCR. Automatically generated 1395 item dam relational data base. Enabled access via forms, map interfaces. Enable interoperation with image DB.

11/21/2000Database Management -- Spring R. Larson Document Recognition: Future Plans Document recognizers: for ~ dozen document types Development and integration of mathematical OCR and recognition. Eventually produce document recognizer generator, i.e., make it easier to write recognizers.

11/21/2000Database Management -- Spring R. Larson Vision-Based Image Retrieval Stuff-based queries: “blobs” –Basic blobs: colors, sizes, variable number demonstrated utility for interesting queries –“Blob world”: Above plus texture, applied to retrieving similar images successful learning scene classifier Thing-finding: Successfully deployed detectors adding body plans (adding shape, geometry and kinematic constraints) Find objects by grouping coherent low-level properties

11/21/2000Database Management -- Spring R. Larson Image Retrieval Research Finding “Stuff” vs “Things” BlobWorld Other Vision Research

11/21/2000Database Management -- Spring R. Larson (Old “stuff”-based image retrieval: Query)

11/21/2000Database Management -- Spring R. Larson (Old “stuff”-based image retrieval: Result)

11/21/2000Database Management -- Spring R. Larson Blobworld: use regions for retrieval We want to find general objects  Represent images based on coherent regions

11/21/2000Database Management -- Spring R. Larson (“Thing”-based image retrieval using “body plans”: Result)

11/21/2000Database Management -- Spring R. Larson Natural Language Processing Automatic Topic Assignment Developed automatic categorization/disambiguation method to point where topic assignment (but not disambiguation) appears feasible. Ran controlled experiment: –Took Yahoo as ground truth. –Chose 9 overlapping categories; took 1000 web pages from Yahoo as input. –Result: 84% precision; 48% recall (using top 5 of 1073 categories)

11/21/2000Database Management -- Spring R. Larson (Isaac’s Automatically Generated Ontology) IAGO (0.1)! = Yahoo - labor + NLP We categorized (part of) the Web: –1073 categories; 8000 web pages –~80% precision for good categories E.g., “motion pictures”, “the environment”, “music” IAGO 1.0 in the works: –Eliminate pages with little text. –Eliminate proper nouns. –Retrained with MS Encarta - Improved performance dramatically (perhaps enough to disambiguate the web)! –Need to compute word sense priors using the web. –[Recode implementation to keep up with web crawler.]

11/21/2000Database Management -- Spring R. Larson Cheshire II: Cross-Domain Resource Discovery: Integrated Discovery and Use of Textual, Numeric and Spatial Data Ray R. Larson, PI Kirby Zhang – Yonghui Zhang School of Information Management & Systems University of California, Berkeley Paul Watry, Co-PI Robert Sanderson University of Liverpool Archives and Special Collections

11/21/2000Database Management -- Spring R. Larson Overview Goals are –Practical application of existing DL technologies to some large-scale cross-domain collections –Theoretical examination and evaluation of next- generation designs for systems architecture and and distributed cross-domain searching for DLs

11/21/2000Database Management -- Spring R. Larson Current Usage of Cheshire II Web clients for: –Berkeley NSF/NASA/ARPA Digital Library –World Conservation Digital Library –SunSite (UC Berkeley Science Libraries) –University of Liverpool –DeMontfort University (MASTER) –Higher Education Archives Hub Glasgow, Edinburgh, Bath, Liverpool, Kings College London, University College London, Nottingham, Durham, School of Oriental and African Studies, Manchester, Southhampton, Warwick and others (to be expanded) –University of Essex, HDS (part of AHDS) –Oxford Text Archive (test only) –California Sheet Music Project –Cha-Cha (Berkeley Intranet Search Engine) –Berkeley Metadata project cross-language demo –Univ. of Virginia (test implementations) –Use in NESSTAR (NEtworked Social Science Tools and Resources) –Cheshire ranking algorithm is basis for Inktomi

11/21/2000Database Management -- Spring R. Larson The Participants NSF/JISC International Digital Library Grant Berkeley working with –University of Liverpool/Manchester Computing –DeMontfort University (MASTER) –Art and Humanities Data Service ( OTA (Oxford), HDS (Essex), PADS (Glasgow), ADS (York), VADS (Surrey & Northumbria) –Consortium of University Research Libraries (CURL) –UC Berkeley Library Making of America II Online Archive of California

11/21/2000Database Management -- Spring R. Larson Approach For the first goal, we are implementing a distributed search system based on international standards (Z39.50 and SGML/XML) (existing Cheshire II technology) which will be used for cross-domain searching. Databases include: –HE Archives hub – Arts and Humanities Data Service (AHDS) –MASTER –CURL (Consortium of University Research Libraries) –Online Archive of California (OAC) –Making of America II (MOA2)

11/21/2000Database Management -- Spring R. Larson Approach The second goal will be addressed in the design, development, and evaluation of the distributed information retrieval system architecture, its client-side systems that aid the user in exploiting distributed resources and in the design and evaluation of protocols for efficient and effective retrieval in a internationally distributed multi- database environment. (Cheshire III?)

11/21/2000Database Management -- Spring R. Larson Research Issues Appropriate system architecture for information retrieval in distributed network environment (distributed object architecture) Management of vocabulary control in a Cross- Domain context Distributed access to existing metadata resources Navigating Collections Support for Cross-Domain resource clumps to facilitate resource discovery

11/21/2000Database Management -- Spring R. Larson Architecture Overview

11/21/2000Database Management -- Spring R. Larson Architecture Overview Focus on high performance N.O.W. style operations: A scalable, extensible platform for IR Current design uses JavaSpaces – a high- level coordination mechanism for distributed systems using a light-weight publish/subscribe distributed programming model

11/21/2000Database Management -- Spring R. Larson Current Design A single operational model for Cheshire that encompasses single node installations, uniformly administered clusters, as well as independently administered federations. –every operation is a distributed operation –an operation is applied over a set of collections

11/21/2000Database Management -- Spring R. Larson Collections: Single node or cluster –can be partitions of other collections Federation – can be partitions or subsets of other collections. In other words, collections in a loosely coupled federation may have overlapping records Virtual Collections

11/21/2000Database Management -- Spring R. Larson Virtual Collections The external interface to collections –A VC may only present part of the underlying real collection in its interface –A VC may grow or shrink dynamically within the bounds of the real collection. A search only needs to be done over documents in VC, not all documents in the collection –Ability to logically partition a collection across a number of machines for performance increase, with built in redundancy in the case of node failures. –When a node failures, its VC is simply distributed (logically) to other nodes in the cluster. –Cheshire servers can be organized into server groups. A server group can be thought of as an administrative unit.

11/21/2000Database Management -- Spring R. Larson Distributed Access to Existing Metadata Resources Use of current (Z39.50) and new (SDLIP) protocols for access to other metadata systems –Support for common semantics (e.g. Dublin Core mappings for disparate systems) –Cross-system use of EVMs

11/21/2000Database Management -- Spring R. Larson Navigating Collections Support for “drilling down” from broad Collection-level descriptions, to sub- collection descriptions to individual digital objects. –Primary test bases will be EAD collection descriptions linked to digital objects as in MOA2.

11/21/2000Database Management -- Spring R. Larson Cross-Domain Resource Discovery Initially -- Use of Z39.50 Cross-domain element set for search (Dublin Core based) Support for new protocols and semantics (such as SDLIP) Research into a metaprotocol for communicating information about databases, search elements and collections between systems –Initially based on Z39.50 Explain

11/21/2000Database Management -- Spring R. Larson Meta-Search for Cross- Domain Resource Discovery Hundreds or Thousands of servers with databases ranging widely in content, topic, format –Broadcast search is expensive in terms of bandwidth and in processing too many irrelevant results –How to select the “best” ones to search? What to search first Which to search next –Topical /domain constraints on the search selections (EVMs for databases?)

11/21/2000Database Management -- Spring R. Larson Cross-Domain Resource Discovery Meta-Search –New approach to building metasearch based on Z39.50 –Instead of using broadcast search we will explore Extraction of GlOSS-like indexes using Z39.50 SCAN GIPSY2 extraction of place coverages from index data –We will also Investigate How to choose databases using the index How to merge search results from multiple sources Hierarchies of servers (general/meta-topical/individual) –Other methods Treating database contents as distributed objects

11/21/2000Database Management -- Spring R. Larson Distributed Metadata Servers Replicated servers Meta-Topical Servers General Servers Database Servers

11/21/2000Database Management -- Spring R. Larson Meta-Search Server Index Creation For all servers, or a topical subset… –Get Explain information (especially DC mappings) –For each index (or each DC index) Use SCAN to extract terms and frequency Add term + freq + source index + database to the meta-search index –Post-Process indexes (especially Geo Names, etc) for special types of data e.g. create “geographical coverage” indexes

11/21/2000Database Management -- Spring R. Larson Z39.50 SCAN Results % zscan title cat {SCAN {Status 0} {Terms 20} {StepSize 1} {Position 1}} {cat 27} {cat-fight 1} {catalan 19} {catalogu 37} {catalonia 8} {catalyt 2} {catania 1} {cataract 1} {catch 173} {catch-all 3} {catch-up 2} … zscan topic cat {SCAN {Status 0} {Terms 20} {StepSize 1} {Position 1}} {cat 706} {cat-and-mouse 19} {cat-burglar 1} {cat-carrying 1} {cat-egory 1} {cat-fight 1} {cat-gut 1} {cat-litter 1} {cat-lovers 2} {cat-pee 1} {cat-run 1} {cat-scanners 1} …

11/21/2000Database Management -- Spring R. Larson Conclusions A lot of interesting work to be done –Redesign and development of the Cheshire II system –Evaluating new meta-indexing methods –Developing and Evaluating methods for merging cross- domain results (or, perhaps, when to keep them separate) –Developing, Testing and evaluating GIPSY2 –User interface development and testing for distributed resource and object access

11/21/2000Database Management -- Spring R. Larson Further Information Berkeley DL web site Full Cheshire II client and server source is available ftp://cheshire.berkeley.edu/pub/cheshire/ –Includes HTML documentation Project Web Site