Digital Libraries Models and Content. Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining,

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
METS: An Introduction Structuring Digital Content.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
CS570 Artificial Intelligence Semantic Web & Ontology 2
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
O N T O P E D I A The Identity of Everything psi.ontopedia.net Expressing Dublin Core.
Dublin Core Metadata Initiative Dr. donna Bair-Mundy Adding metadata to a web page.
Introduction to XML This material is based heavily on the tutorial by the same name at
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Models for Digital Libraries CSC week 2 The 5S model is the work of Edward A. Fox and his students at Virginia Tech. These slides rely heavily on.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Website Development & Management Creating Web Pages CIT Fall Instructor: John Seydel, Ph.D.
XML Syntax - Writing XML and Designing DTD's
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Models for Digital Libraries CSC 9010 Digital Libraries - week 2 The 5S model is the work of Edward A. Fox and his students at Virginia Tech. These slides.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Basics of Information Retrieval W Arms Digital Libraries 1999 Manuscript as background reading.
RDF and XML 인공지능 연구실 한기덕. 2 개요  1. Basic of RDF  2. Example of RDF  3. How XML Namespaces Work  4. The Abbreviated RDF Syntax  5. RDF Resource Collections.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Models for Digital Libraries CSC 9010 Digital Libraries - week 2 The 5S model is the work of Edward A. Fox and his students at Virginia Tech. These slides.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Tutorial 13 Validating Documents with Schemas
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
A centre of expertise in digital information management UKOLN is supported by: Metadata for the People’s Network Discovery Service PNDS.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
1 Schemas or Vocabularies? April 26, 2005 OASIS Symposium on the Future of XML Vocabularies Bob DuCharme LexisNexis.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Pete Johnston, Eduserv Foundation 16 April 2007 An Introduction to the DCMI Abstract Model JISC.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
Understanding the Value and Importance of Proper Data Documentation 5-1 At the conclusion of this module the participant will be able to List the seven.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Unit 4 Representing Web Data: XML
XML QUESTIONS AND ANSWERS
Chapter 7 Representing Web Data: XML
Web Programming Maymester 2004
Attributes and Values Describing Entities.
Attributes and Values Describing Entities.
Presentation transcript:

Digital Libraries Models and Content

Goals for tonight Finish up from last week – the 5 S model more formally – Status of the systems available Obtaining, describing, indexing content – XML – Dublin Core – Introducing content exchanges (OAI)

Applying the 5S model, informally Choose a subject area – then answer the questions Stream - what types of data? gif, jpg, avi, docx, pdf, html? Structure - How are the elements organized? Is there a hierarchy? Are there multiple structures? Spaces - How will we index the items? How will we divide them into related groups Scenarios - what services will we provide? What information do we need to provide those services? What events might happen that we need to plan for? Societies - who is the library intended to serve? Remember to include agents and other processes as well as users. This is the first deliverable for your first project.

More formally: Definitions Definition: A stream is a sequence whose codomain is a non empty set. Definition: A structure is a tuple (G, L, F) where G = (V,E) is a directed graph with vertex set V and edge set E, L is a set of label values, and F is a labeling function. F : (V ∪ E ) → L. See for a nice description of domain, range, codomain if you need it.

Structure illustration ImagesAudio files Books Collection includes A very simple structure. How might it be enhanced? How would an index be included? What substructures might be added? What are the G, L, F, V, E parts of this example?

Definitions, cont’d Definition: A space is a measurable space, measure space, probability space, vector space, topological space, or metric space – A vector space is a representation for the set of elements in a collection. The vector representing each element is a set of characteristics held by that element and both connecting that element to others that are similar and distinguishing it from those that are different. – We will do an exercise to illustrate

Vector space illustration Consider a car. What are the characteristics that you associate with a car? – If you want to compare one car to another, what characteristics would you choose? – If you wanted to distinguish a car from another type of vehicle, what characteristics would you need? distinguish from a snowmobile distinguish from a truck Make a vector of those characteristics. Then, fill in the vector for several specific cars.

Definitions - 3 Definition: A scenario is a sequence of related transition events (e 1, e 2, …, e n ) on state set S such that e k = (s k, s k+1,) for 1 <= k <= n. – More easily visualized, a scenario is a path in a directed graph, G = (S, ∑ e ), where vertices correspond to states in the state set S and directed edges are equivalent to events in a set of events, ∑ e, and correspond to transitions between states. – Scenarios must be implemented to make a working system.

Definitions - 4 Definition: A society is a tuple (C,R) where – C = (c 1, c 2, …, c n ) is a set of conceptual communities, each community referring to a set of individuals of the same class or type (e.g. actors, activities, components, hardware, software, data); – R = (r 1, r 2, …, r m ) is a set of relationships, each relationship being a tuple r j = (e j, i j ) where e j is a Cartesian product c k 1 x c k 2 x … x c k n j. 1<= k 1 < k 2 < … < k n j <= n, which specifies the communities involved in the relationship and i j is an activity.

Projects in our DL laboratory Mendel 289 is the center of activity for projects related to digital libraries and similar projects. Summary of the projects under way, which may present opportunities for class projects or for independent study NSDL, CITIDEL, CSTA, Ensemble, Distributed Expertise, Computing Ontology, Interdisciplinary Computing and its relationship to the libraries ….

Our systems Now available – Fedora linux machines, remotely accessible (use the gateway) – Bare machines with just basic system – We can install Drupal either from the Drupal site (doing things for ourselves) or from the Bitnami site (builds the stack for us) I just heard that Drupal may already be installed. Feel free to uninstall and reinstall if you wish. If you have a computer of your own and want to use it, – Fine, but you must be able to demonstrate it to the class at the end of the semester. I will need to be able to see what you are doing from time to time during the semester. – That means you need a static IP address.

The Digital Library Content Essential elements for a digital library – Users – Content – Services

Content - requirements Obtain Store – Organize – Describe Find Deliver

Describing the content How to describe content – Metadata Machine readable description of anything What description – Machine readable requires standard descriptive elements Dublin Core ( – International standard – “a standard for cross-domain information resource description.” – 15 descriptive elements Other metadata schemes – IEEE-LOM

Metadata What does metadata look like? Metadata is data about data – Information about a resource, encoded in the resource or associated with the resource. The language of metadata: XML – eXtensible Markup Language

XML XML is a markup language XML describes features There is no standard XML Use XML to create a resource type Separately develop software to interact with the data described by the XML codes. Source: tutorial at w3school.com

XML rules Easy rules, but very strict First line is the version and character set used: – The rest is user defined tags Every tag has an opening and a closing

Element naming XML elements must follow these naming rules: –Names can contain letters, numbers, and other characters –Names must not start with a number or punctuation character –Names must not start with the letters xml (or XML or Xml..) –Names cannot contain spaces

Elements and attributes Use elements to describe data Use attributes to present information that is not part of the data – For example, the file type or some other information that would be useful in processing the data, but is not part of the data.

Repeating elements Naming an element means it appears exactly once. Name+ means it appears one or more times Name* means it appears 0 or more times. Name? Means it appears 0 or one time.

Parts of an XML document Elements – The components of an XML document – Some contain other parts, some are empty Ex in HTML: “br” or “table” in XML “ingredient” Attributes – Information about elements, not data Ex in HTML “src=” in XML “scale=” Entities – Special characters or strings with pre-assigned meaning Ex in HTML &nbsp for non-breaking space PCDATA – Parsed Character data: text that will be parsed and interpreted by the reader. Tags and entities will be expanded and used in presentation. CDATA – Character data: text that will not be parsed and interpreted. It will be displayed exactly as provided. The HTML examples are familiar; the XML examples are made up – dependent on the specific XML scheme used

Using XML - an example Define the fields of a recipe collection: ISO 8859 is a character set. See

Processing the XML data How do we know what to do with the information in an XML file? – Document Type Definition (DTD) Put in the same file as the data -- immediate reference Put a reference to an external description Provides the definition of the legitimate content for each element

Document Type Definition <!DOCTYPE recipe [ ]> Repeat 0 or more times

Meringue cookies 3 egg whites 1 cup sugar 1 teaspoon vanilla 2 cups mini chocolate chips Beat the egg whites until stiff. Stir in sugar, then vanilla. Gently fold in chocolate chips. Place in warm oven at 200 degrees for an hour. Alternatively, place in an oven at 350 degrees. Turn oven off and leave overnight. Not the way that I want to see a recipe in a magazine! What could we do with a large collection of such entries? How would we get the information entered into a collection? External reference to DTD

XML exercise Design an XML schema for an application of your choice. Keep it simple. Examples -- address book, TV program listing, DVD collection, …

Another example A paper with content encoded with XML: First few lines: Standards E-learning and their possible support for a rich pedagogic approach in a 'Integrated Learning' context Rodolophe Borer "ePBLpaper11.dtd” shown on next slide

%foreign-dtd; Source:

Vocabulary Given the need for processing, do you want free text or restricted entries? Free text gives more flexibility for the person making the entry Controlled vocabulary helps with – Consistent processing – Comparison between entries Controlled vocabulary limits – Options for what is said

Vocabulary example Recipe example – What text should be controlled? – What should be free text? Ingredients – Ingredient-amount – Ingredient-name – Should we revise how we coded ingredient amount? Directions

Dublin Core Standard set of metadata fields for entries in digital libraries: – Title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights

Dublin Core elements see: Title Creator Subject - C Description Publisher Contributor Date Type - C Format - C Identifier Source Language Relation Coverage - C Rights Rights Management information Space, time, jurisdiction. C = controlled vocabulary recommended. Ref. to related resource Standards RFC 3066, ISO639 Unambiguous ID Ex: collection, dataset, event, image YYYY-MM-DD, ex. Entity primarily responsible for making content of the resource Entity making the resource available Contributor to content of the resource What is needed to display or operate the resource.

Dublin Core Terms An update to the original DC elements – Adds the concept of range and domain Each term has this minimal set of attributes: Name: A token appended to the URI of a DCMI namespace to create the URI of the term. Label: The human-readable label assigned to the term. URI: The Uniform Resource Identifier used to uniquely identify a term. Definition: A statement that represents the concept and essential nature of the term. Type of Term: The type of term as described in the DCMI Abstract Model [DCAM].

DC Terms Additional Attributes possible : Comment: Additional information about the term or its application. See: Authoritative documentation related to the term. References: A resource referenced in the Definition or Comment. Refines: A Property of which the described term is a Sub-Property. Broader Than: A Class of which the described term is a Super-Class. Narrower Than: A Class of which the described term is a Sub-Class. Has Domain: A Class of which a resource described by the term is an Instance. Has Range: A Class of which a value described by the term is an Instance. Member Of: An enumerated set of resources (Vocabulary Encoding Scheme) of which the term is a Member. Instance Of: A Class of which the described term is an instance. Version: A specific historical description of a term. Equivalent Property: A Property to which the described term is equivalent.

The DC Terms – from 15 to … abstract, accessRights, accrualMethod, accrualPeriodicity, accrualPolicy, alternative, audience, available, bibliographicCitation, conformsTo, contributor, coverage, created, creator, date, dateAccepted, dateCopyrighted, dateSubmitted, description, educationLevel, extent, format, hasFormat, hasPart, hasVersion, identifier, instructionalMethod, isFormatOf, isPartOf, isReferencedBy, isReplacedBy, isRequiredBy, issued, isVersionOf, language, license, mediator, medium, modified, provenance, publisher, references, relation, replaces, requires, rights, rightsHolder, source, spatial, subject, tableOfContents, temporal, title, type, valid

DC terms See terms/ terms/ Review the list and see what has been added

A Drupal example Ensemble:

IEEE - LOM Example of a specialized metadata scheme – Learning Object Metadata Specifically for collections of educational materials Includes all of Dublin Core See

Computing systems Linux machines Introduction to unix: Dspace: – Documentation, including installation - Najib Nadi, our system administrator, is setting up the machines. He will send a message to the class by the middle of the week with details of machine location and login. Remember - you have the option to use your own machine, but must meet the criteria described last week.

This session Defined meta data and its role in digital libraries. Introduced XML as a language for describing a collection of content. Described the computing resources and how to get ready for the first DL setup.