Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
ETANA-DL: Leveraging Digital Library Technologies to Support Archaeology Vanderbilt University Nashville, TN -- Sept. 8, 2006 Weiguo Fan, Edward A. Fox,
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
3. Technical and administrative metadata standards Metadata Standards and Applications.
Digital Libraries. Synchronous Scholarly Communication Same time, Same or different place.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Introduction and Conceptual Modeling
Overview of Search Engines
Lecture-8/ T. Nouf Almujally
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
1 CS5604 October 13, 2010 “5S Overview for Modules” by Edward A. Fox and Lillian (Boots) Cassel (on Ensemble) Dept. of.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Digital Library Architecture and Technology
Copyright © Allyn & Bacon 2008 POWER PRACTICE Chapter 6 Academic Software START This multimedia product and its contents are protected under copyright.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Requirements Gathering and Modeling of Domain Specific Digital Libraries with the 5S Framework: An Archaeological Case Study with ETANA ECDL 2005, Vienna,
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
Multimedia Databases (MMDB)
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Administrative Software Chapter 7 Teaching and Learning with Technology.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
ETANA-DL NSF Digital Library Project Edward A. Fox, Virginia Tech ASOR Annual Meeting, 2004
ETANA-DL Managing complex information applications: An archaeology digital library This research is funded in part by NSF-ITR grant #IIS Edward.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
Information Systems & Databases 2.2) Organisation methods.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Shruthi(s) II M.Sc(CS) msccomputerscience.com. Introduction Digital Libraries have become the source of information sharing across the globe for education,
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Introduction to metadata
1 Slides for Steve Griffin, NSF “ETANA and Digital Library Integration” by Edward A. Fox Oct. 3, Dept. of Computer.
ITGS Databases.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Information Retrieval
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
DSpace - Digital Library Software
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
ECDL 2006, Alicante, September 18, 2006 Naga Srinivas Vemuri, Ricardo da S. Torres, Rao Shen, Marcos Andre Goncalves, Weiguo Fan, and Edward A. Fox A Content-Based.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
A RCHIVAL COLLECTIONS IN A D IGITAL W ORLD Cheryl Walters Nov. 6, 2008.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Information Retrieval in Practice
Search Engine Architecture
9/22/2018.
NSDL Data Repository (NDR)
Metadata in Digital Preservation: Setting the Scene
Database Design Hacettepe University
Presentation transcript:

Digital Libraries Prof. Marcos Andre Goncalves Universidade Federal de Minas Gerais

Synchronous Scholarly Communication Same time, Same or different place

Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place

Digital Libraries Shorten the Chain from Editor Publisher A&I Library Reviewer

DLs Shorten the Chain to Author Reader Digital Library Editor Reviewer Teacher Learner Librarian

DL Overview Why of Global Interest? National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly Knowledge and information are essential to economic and technological growth, education DL - a domain for international collaboration –wherein all can contribute and benefit –which leverages investment in networking –which provides useful content on Internet & WWW –which will tie nations and peoples together more strongly and through deeper understanding

Digital Libraries --- Objectives World Lit.: 24hr / 7day / from desktop Integrated “super” information systems: 5S: Table of related areas and their coverage Ubiquitous, Higher Quality, Lower Cost Education, Knowledge Sharing, Discovery Disintermediation -> Collaboration Universities Reclaim Property Interactive Courseware, Student Works Scalable, Sustainable, Usable, Useful

How is a DL different from a database? A traditional SQL database has as its basic element data items in a relation: – select name – from employee, project – where employee.deptnumber = “25” AND – project.number = “100” databases exploit known structures and relations DBMS retrieval is not probabilistic (Frakes, Baeza-Yates, p. 3)

How is a DL different from the WWW? The keyword is managed –The WWW is not managed Some meta searchers (Yahoo, Lycos) attempt to add an organizational framework to their web holdings –However, most are focused on keyword searching (i.e., Google)

How is a DL different from the WWW? Another key difference is who controls the input into the system –most meta searchers hunt down their holdings Lycos is short for Lycosidae lycosa (the “wolf spider”), which pursues its prey and does not build a web (Mauldin, IEEE Expert, 1/97) –some (Yahoo) have humans in the loop for review and classification To date, DLs are generally more tightly controlled, and have a targeted customer set

DL = Content + Services “Why not just use the WWW” ? – WWW by itself has low archival & management characteristics “Why not use a RDBMS?” – In the same way that a card catalog is not a TL, a RDBMS is candidate technology for use in DLs DL is the union of the content and services defined on the content

How is a DL Different from a Traditional Library? TL has as its focus physical objects – even if the card catalog (metadata) is electronic, the purpose is to point you to a physical location – trafficking in physical objects has both obvious and subtle implications object can exist only in 1 place if you have it, I can’t have it (zero-sum distribution) I have to go to the object, or wait for it to come to me

TLs vs. DLs DLs clearly better than TLs at: –Dissemination, storing information variety However, TL objects are more survivable –Who will archive the research information? the publishers? the institutions? the authors? –Will the average DL object still be accessible in 10 years? take my digital preservation seminar in the spring! image from:

Digital Library – removing the physical restriction has obvious benefits multiple access, multiple listings, electronic transmission – also complicates many other issues... intellectual property, terms and conditions, etc. Note that a TL offers additional social and educational benefits – Most TLs also offer hybrid services too. How is a DL Different from a Traditional Library?

DL Definitions - 1 “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.” Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003

DL Definitions - 2 “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities” Waters,D.J. CLIR Issues, July/August

Informal 5S & DL Definitions DLs are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)

5Ss SsExamplesObjectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

5S and DL formal definitions and compositions (April 2004 TOIS)

ETANA-DL Archaeological DL Integrated DL –Heterogeneous data handling Applies and extends the OAI-PMH –Open Archives Initiative Protocol for Metadata Handling Design considerations –Componentized –Extensible –Portable

Map courtesy: Initial ETANA-DL Member Locations Virginia Tech Mississippi State University Vanderbilt University Canadian University College Walla Walla College Andrews University CWRU Willamette University

Lahav Website

Megiddo Opening Screen

Locus Screen: Pictures View all

Area Screen

ETANA-DL Approach Applying and extending Digital Library (DL) techniques to solve key problems: making primary data available, data preservation, and interoperability Modeling archaeological information systems using 5S to better understand the domain and design the system and the supporting services Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks: –eliciting requirements –refining metamodel and union schema –modeling sites –mapping –harvesting –providing useful services

ETANA-DL Website

Marking – writing notes for a specific user Marking Items

Marked Items Display Sender, Date, Object OAI ID Sender Comments Options: View Record, Add record to Items Of Interest, Re-mark item (Redirect), Unmark item (Remove item from list)

Discussions Page Discussions about an object View/Post messages, create new threads

Recommendations Items recommended on the basis of similar interests

ETANA-DL Searching Service Search

ETANA-DL Multi-dimensional Browsing 3 new sites 2 new types of artifacts

ETANA-DL Visual Browsing Service Visual Browse By site

Visual Browsing Nimrin: Topographical Drawings Full siteNorth west quadrant Square: N40/W20

Visual Browsing Nimrin : Square information Square: N40/W20 Locus: 86 Loci layout

Visual Browsing Nimrin : locus sheet

Visual Browsing Bab edh-Dhra' Cemetery Pottery # 25

Visual Browsing Bab edh-Dhra' Cemetery Pottery # 25

ETANA Societies 1.Historic and pre-historic societies (being studied) 2.Archaeologists (in academic institutes, fieldwork settings, or local and national governmental bodies) 3.Project directors 4.Technical staff (consisting of photographers, technical illustrators, and their assistants) 5.Field staff (responsible for the actual work of excavation) 6.Camp staff (e.g., camp managers, registrars, tool stewards) 7.General public (e.g., educators, learners, citizens)

ETANA Societies Social issues 1.Who owns the finds? 2.Where should they be preserved? 3.What nationality and ethnicity do they represent? 4.Who has publication rights? 5.What interactions took place between those at the site studied, and others? What theories are proposed by whom about this?

ETANA Scenarios 1.Life in the site in former times 2.Digital recording: the planning stage and the excavation stage 3.Planning stage: remote sensing, fieldwalking, field surveys, building surveys, consulting historical and other documentary sources, and managing the sites and monuments 4.Excavation 1.Detailed information is recorded, including for each layer of soil, and for features such as pole holes, pits, and ditches. 2.Data about each artifact is recorded together with information about its exact find spot. 3.Numerous environmental and other samples are taken for laboratory analysis, and the location and purpose of each is carefully recorded. 4.Large numbers of photographs are taken, both general views of the progress of excavation and detailed shots showing the contexts of finds. 5.Organization and storage of material 6.Analysis and hypotheses generation and testing 7.Publications, museum displays 8.Information services for the general public

ETANA Spaces 1.Geographic distribution of found artifacts 2.Temporal dimension (as inferred by archaeologists) 3.Metric or vector spaces 1.used to support retrieval operations, and to calculate distance (and similarity) 2.used to browse / constrain searches spatially 4.3D models of the past, used to reconstruct and visualize archaeological ruins 5.2D interfaces for human-computer interaction

ETANA Structures 1.Site Organization 1.Region, site, partition, sub-partition, locus, … 2.Temporal orderings (ages, periods) 3.Taxonomies 1.for bones, seeds, building materials, … 4.Stratigraphic relationships 1.above, beneath, coexistent

ETANA Streams 1.successive photos and drawings of excavation sites, loci, unearthed artifacts 2.audio and video recordings of excavation activities and discussions 3.textual reports 4.3D models used to reconstruct and visualize archaeological ruins.

Streams Multiple media types and representation –See ch. 4 for IR (except some here for non-text) –Standards for each, and for some combinations Text –Character strings, encoding (Unicode) –Morphology -> Stemming –Syntax, semantics -> stop words –** POS tagging, phrases Images, Audio, Video, Graphics, Animation –Capture, digitization, representation –CBIR for each ** Compression, processing, analysis **Synchronization, rendering, presentation, interchange –RealVideo, SMIL, QoS

Content Based Information Retrieval

Problems Image similarity is subjective –Personal Interpretation Concept x Appearance

By Visual features –Retrieve images with 50 percent of white colour and 50 percent of black colour

Textual information retrieval Query on Google using Sunset and Rio de Janeiro Query result

Image Classification by shape

Work of Torres et al Search in collections of fish images using combination of image properties (CBIR) and textual descriptions

Motivation Query 1: –List all metadata related to fish which were observed in the Amazon River Query 2: –Retrieve images of fishes whose shape is similar to that in the example oQuery 3: List all metadata related to fishes that were observed in the Amazon River and whose shape is similar to that in the example

Motivation Retrieve fish descriptions whose shapes are similar to the one shown below, that belong to the “Notropis” genre, that have large yes” e and that have been observed in the “Tennessee River”

Problem There is no BIodiversity Information System which allow queries involving : –Geographic data –Species metadata –Image Descriptors Existing systems: –Metadada or –Metadada + spatial data –Images are stored as separate files With no possibilty of retrieval by content

WeBioS

Torres: Visualizations Spiral Pattern Concentric Rings Pattern

Structures Digital Objects –Documents, digitization, packaging (METS), interchange, standards, format conversion –Genre: plays, encyclopedia, dictionaries, educational resources: courses (e.g., syllabi) and lessons –Structural organizations (books, chapters, sections), excerpts/spans (mark, superimposed info) Metadata: standards, markup Knowledge Structures & Representations –Databases, Schema, Ontologies, Thesauri, Lexicons, Authority files, Concept maps, Semantic networks Indexes –Inverted files, signature files, R-trees, Quad trees, etc. Clusters & Classification Schemes

Degree of Structure Chaotic OrganizedStructured WebDLsDBs

Digital Objects (DOs) Born digital Digitized version of “real” object –Is the DO version the same, better, or worse? –Decision for ETDs: structured + rendered Surrogate for “real” object –Not covered explicitly in metamodel for a minimal DL –Crucial in metamodel for archaeology DL

Metadata Objects (MDOs) MARC Dublin Core RDF IMS OAI (Open Archives Initiative) Crosswalks, mappings Ontologies Topics maps, concept maps

Complex to Simple MARC ($50)Dublin Core (DC) + thesis

Spaces Retrieval models –Boolean, extended Boolean –Vector, LSI –Probabilistic: classical, belief network, inference network, language models User interfaces and visualization

2D interfaces 3D interfaces GIS Other paradigms

Scenarios Recall OO for streams – now have objects as well as scenarios – ex interface components Information Access –Searching: ad hoc, filtering/routing –Browsing: using an organization, using a visualization, using links (i.e., hypertext, hypermedia) –Workflow: sessions, feedback, etc. Scenario-based Design Usability: goals, tasks, claims NOTE: this is covered in the outline

Societies User communities –Authors, editors, teachers, students, readers –Personal(ization), group(ware), community, global –Accessibility, universal access Librarians: reference, acquisition, operations Research community –Associations, conferences, publications, labs, projects Economics –Copyright, intellectual property rights, digital rights management, authorization, authentication, security, privacy, self-archiving (eprints) –Publishers, catalogers, distributors, sustainability –Open source, commercial, hybrid