LIS654 lecture 1 omeka installation, system overview Thomas Krichel 2012-01-29.

LIS654 lecture 1 omeka installation, system overview Thomas Krichel 2012-01-29

this lecture Early history of digital libraries. – Vannevar Bush – Joseph Carl Robnett Licklider, aka “Lick” – some things that happened since. Introduction to Omeka. Installation of a theme. Omeka tables and metadata.

background Vannevar Bush (1890—1974) directed the US office of Science Research and Development during WW2. As the war ended he saw two problems – how to make the war time scientific reports available – find a new challenge for the scientists He proposed a solution in “As we may think”.

As we may think It remains to date one of the most frequently cited papers in Library and Information Science. I think this fame is somewhat undeserved.

the scientific record As scientists do more work, the “record” extends. This is good. Recent advances in microfilm also made is possible to store more of the record in microfilm. But with much research and increased specialization, “significant attainments become lost in the mass of the inconsequential”.

the memex The memex was a proposed desktop machine that would store millions of books in microfilm. It would have a mechanism that would allow any known item from the collection rapidly. But the problem is what items to look at?

the brain, by Bush Bush thought that the brain works by association. “With one item in its grasp, [the brain] snaps instantly to the next that is suggested by association of thought”. This is done “in accordance with some intricate web of trails carried by the cells of the brain.”

memex as a brain Every time a document is added to the memex it is given an identifier. Every time an item is consulted the user can associate with it other items. These associations are recorded. Trails of associations can be annotated and copied. Selection by association replaces indexing.

sharing An annotated trail between items can form a new item. That item can be shared. Bush envisioned that there would be a way for each memex to learn from all other memexes. Memex users would improve their thinking ability by its use. This would greatly increase the speed of scientific discoveries.

implementation There is no evidence that anything like the memex was ever built. Microfilm was replaced by digitization. But the idea of associative trails or associative indexing has something to do with the hypermedia. The later goes back to Ted Nelson.

Licklider Joseph Carl Robnett Licklider (1915—1990) trained as a mathematician and psychologist and worked mainly at the MIT. The Council of Library Resources got funding from the Ford Foundation to examine how technology could help libraries. Work was undertaken by Bolt, Beranek and Newman (BBN) of later ARPA fame

the system The system was call “procognitive” meaning for the advancement of knowledge. It would not be based on documents, metadata and retrieval. It would process information into knowledge and questions into answers. Users transmit their knowledge to the system.

information to knowledge To see how information can be processed into knowledge, Lick, looked at the human brain. He had studied cat brains in his PhD work. If it is possible to the process the body of information into knowledge structures, then questions can be answered by knowledge rather than be documents.

human processing Lick understood that current and foreseeable technology would not allow processing of documents into knowledge. This would be the job of set of librarian called “procognitive system specialists”. The would encode contents of documents in a knowledge language. They would watch for ambiguity warnings. Users would also provide feedback.

encoding Surprisingly Lick still imagined the procognitive system be based on natural language. The hope was that artificial intelligence (AI) methods would be developed to extract information from documents. That hope seemed justified in the 60s when AI was in its infancy.

steps to implementation The first attempts, in the 60s, tried to find the citation string in a database of citations. Thus this was more information retrieval on a small set of metadata than actual digital library work. Librarians preparing bibliographies for researchers were the prime users.

into 80s In the 80s the personal computer “came back”. Searching could be done of the full-texts of document. Browsing became available.

90s In the 90s the Internet and the search engine came along. Initially search engines followed standard information retrieval principles. My first work, about 1993, was based on gopher access and WAIS indexing.

the semantic web The semantic web is the actual successor to Lick’s vision. It’s still not done. I speculate it will not be done for a long time. The reason is that while Lick thought Psychology and Computers, he did not think through the economics of operating such system as the ones that he proposed. He also had too optimistic a vision about AI.

omeka Omeka is an open-source software for image- based repositories. Each of you gets a complete installation of omeka installed in your home directory. Omeka uses mySQL [4 slides]. Omeka is delivered via the web [2 slides]. Omeka uses PHP [2 slides].

mySQL This is a relational database system. It is a client/server based system where the server keeps a set of tables. There may be relations between the tables. A client can make request to the server. These request follow a language called “structured query language” SQL.

SQL and us Some people think all librarians should know SQL and relational databases. We don’t need to learn SQL, as omeka will formulate the SQL for us. We still have a web-based tool that will show and that can change all the omeka tables.

mySQL and you MySQL keeps as set of databases and a set of users. To keep things simple, I create one database for you. Its name is your wotan user name. I create one mySQL user for you. Its name is the same as your wotan user name. It’s password is the one in your.my.cnf at creation time. The user created for you has full rights to the database created for you.

the PHPmyadmin site PHPmyadmin is a web interface, written for the administration of mySQL, written in PHP. Thus you don’t have to learn mySQL commands. At http://wotan.liu.edu/phpmyadmin we have the PHPmyadmin installation. You gain access to the interface by login with your mySQL user name and your mySQL password.

the web service On wotan, the Apache web server software delivers http responses. The web top directory is /var/www. http://wotan.liu.edu/ goes to the /var/www directory. There it shows the file index.html. If that file would not be there Apache would create a file list on the fly.

the omeka web service http://wotan.liu.edu/omekal goes to the /var/www/omeka directory. In that directory, there are links to each omeka user’s omeka directories. There in the user’s directory, Apache finds no index.html. Therefore it shows the file index.php, a second choice. Wotan also has a directory /etc/apache/sites- enabled/omeka. This contains configuration snippets for your omeka sites, written in Apache’s language.

PHP Omeka is written in PHP. PHP is the PHP Hypertext Processor, a high level scripting language. PHP code is interpreted by the web server. On wotan we run the Apache web server. Apache on wotan is configured in such a way as to interpret PHP.

Apache interprets PHP When Apache on wotan finds a file that has the ending.php, it does not send it to the web client directly. Instead it scans the page for special processing instructions, and sends these processing instructions to a PHP interpreter. Then it sends the processed code (usually in HTML) to the client. This verry slow and inefficient.

the Omeka site As part of the course, you are being provided with a complete copy of the latest version of on the server wotan.liu.edu, at the URL http://wotan.liu.edu/omeka/user where user is a user name that you have chosen. This URL will initially redirect to an initialization screen. Fill it out.

filling out the form Things to note – “Superuser Account Username” – “Email” for password reminders to the superuser. Note that wotan can not send out non-local email ;-( – “Administrator Email” will be noted on the site All other settings can be left at the default and/or can be changed in the “general settings” later.

omeka’s mySQL tables Each database has a set of tables inside. The tables that omeka creates, and then uses, all have a name starting with “omeka_”. You can create other tables in your mySQL database but choose names that don’t start with “omeka_”. You can see the tables created once you have filled out the installation form.

when we are done with this Then Thomas will delete some tables in omeka that are still shipped, but are no longer used is modern versions. This will simplify the discussion of the table structure that follows. This will now be done when all omeka sites are installed.

the admin interface The admin interface for your omeka site is at http://wotan.liu.edu/omeka/user/admin. You simply add ‘admin’ to your omeka address. For trivial security, there is no link to this place. Bookmark it in your browser. Don’t bookmark it in the lab.

omeka/db.ini This contains the instructions that allow to connect to your database. Since PHP is executed by the web server as www-data, and that is not a member of a group with you, the file has to be readable by all. Security of omeka is not Fort Knox.

omeka/themes There we have themes. Each theme is in a directory. You can install a new theme by downloading a theme from the omeka theme web site http://omeka.org/add-ons/themes/, upload it and then selected it from the admin interface. Let us try this out now.

foreword to Omeka Terminology is one of the difficult problems in digital librarianship. I will use the double quotes here to represent a term that is used as it is in omeka. Please open your winscp, omeka web, web admin and phpmyadmin.

items In omeka, you store “items”. Item are either digital resources – images – video or something non-digital of which your are storing a digital representation of – person – event

some item properties Items can be “public”. By default, as a basic security precaution, items are not public. Items can be “featured”. An items that is features is highlighted on the site in a particular way. This allows you to change the appearance of the site by providing different featured items over time. Items have item types. Each item is of one type. Each item may belong to a collection.

table: “items” It stores data about each “item” – “id” of the item, an autoincrement – “item_type_id”, number | +1 slide – “collection_id”, number | +2 slide – whether it is “featured”, a Boolean – whether it is “public”, a Boolean – when last “modified”, a time – when “added”, a time

table: “item_types” Each item is of one type. Types are described in the “item_types” table, with the columns – “id” an autoincrement – “name” the name of the item type, string – “description” a longer explanation what the item type means. Each record in the “item” table references an id in the type.

table: “collections” Each collection is described here – “id” auto_increment – “name”, a string – “description”, a string – users who are “collectors”, a string |not further discussed| – whether it is “public”, a Boolean – whether it is “featured”, a Boolean – when “added”, a time – when last “modified”, a time – the “owner_id” |not further discussed|

items to files An item has two aspects to it. – There is the metadata about the items. – There is the item itself. This is in fact a collection of “file”s. File records store information about files on the server that hold information related to an item. The item can be viewed as a conceptual container of (possibly zero) files. Each file is attached to an item.

table: “files” |1| The fields in that table are – “id” auto_increment – “item_id” of the item the file attaches to – “size” in bytes – “has_derivative_image”, a Boolean – the time last “modified”, a time – the time it was “added”, a time – if it was “stored”, a Boolean

table: “files” |2| More fields of this table – the “authentication” a checksum of the path to the file – the “mime_browser”, a mime type as sent to browser – the “mime_os”, the mime type as determined by the omeka installation, using an external application – the “type_os”, the file type as determined by the omeka installation, using an external application – the “archive_filename”, a random file name – the “original_filename”, filename or URL of origin

file storage The “archive” directory stores files. The original is in “files”. Derivative files are in – “thumbnails” – “fullsize” – “square_thumbnails” I don’t know why the original size is not the full size.

metadata Metadata is a descriptions that can be attached to a “record”. Records something that groups are items and files, and some aggregates of items – collections – exhibits (only used by ExhibitBuilder) Metadata is a set of attribute/value pairs. The attributes are called “elements”.

table: “elements” We start with the “elements” table. It contains all the properties one can attach to records. – an “id” auto_increment – a “record_type_id”, the id of a “record_type” |+1 – a “data_type_id”, the id of a “data_type” |+2 – an “element_set_id”, id of an “element_set” |+3 – an “order” that appears always to be null, unused – a “name” for the property – a “description” containing the fill-in instructions.

table: “record_types” This table contains two records id | name | description 1 | All | Elements, element sets, and element texts assigned to this record type relate to all possible records i.e. items and their aggregates. 2 | Item | Elements, element sets, and element texts assigned to this record type relate to item records.

table: “element_sets” Only contains these records id| record_type_id | name | description 1 |1 | Dublin Core | “The Dublin Core metadata element set. These elements are common to all Omeka resources, including items, files, collections, exhibits, and entities. See http://dublincore.org/documents/dces/.” 3 | 2 | Item Type Metadata | “The item type metadata element set, consisting of all item type elements bundled with Omeka and all item type elements created by an administrator.”

item-type specific metadata You can create data elements (aka metadata fields) for a specific item. You can not however, share these fields across item types. So if you want to express the “geekiness” of your item, and you have several types that can be geeky, you have to add “geekiness” as an element for each item type separately.

creating item types You can create your own item types. When you create an item type, it automatically has the Dublic Core metadata property fields attached to it. But if your item type, say is a room, you can create properties such as “size”, “height”, “cul- de-sac-ness”.

Omeka tags A tag is a way for Omeka to group individual items together. Each item can have multiple tags. Each tag can be attached to multiple items. We say that there is a many-to-many relationship between items and tags. For LIS650 veterans, it’s like grouping HTML elements in the into classes.

table: elements_texts This contains the values of properties. Fields are – id | an auto_increment – record_id | the id of the record it is attached to – record_type_id | the id of the record type of the record. I am not sure why this is required. – element_id | the id of the element (property) – html | a Boolean, whether HTML or not – text | the value of the property

table: tags Each tag is recorded in this table. It has only two columns – “id”, an autoincrement identifier – the “name” a string up to 256 characters long This table stores all the tags. A “tag” here is the value that as tagging takes. This is not what we would commonly call a tag.

table: taggings This table has the following columns – “id” an auto_increment – “relation_id” gives the id of the item that has been tagged. – “tag_id” gives the number of the tag being given – “entity_id” |who did it?, not further discussed – “type”, a type of action taken, not further discussed. – “time” a timestamp when the action happened.

Dublin Core data Dublin Core is a metadata set that is used in omeka. This is the common set for all types. We need to review the official meaning of these elements here. I quote from Hillman’s Dublin core usage guide. http://dublincore.org/documents/usageguide /elements.shtml

dublin core: title “The name given to the resource. Typically, a Title will be a name by which the resource is formally known.” “If in doubt about what constitutes the title, repeat the Title element.”

dublin core: subject “The topic of the content of the resource. Typically, a Subject will be expressed as keywords or key phrases or classification codes that describe the topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.”

dublin core: description “An account of the content of the resource. Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free- text account of the content.” “Use full sentences.”

dublin core: type “The nature or genre of the content of the resource. Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMIType vocabulary ). To describe the physical or digital manifestation of the resource, use the FORMAT element.”

dublin core: source “A Reference to a resource from which the present resource is derived. The present resource may be derived from the Source resource in whole or part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system”… “include in this area information about a resource that is related intellectually to the described resource but does not fit easily into a Relation element.”

dublin core: relation “A reference to a related resource. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.”

dublin core: coverage “The extent or scope of the content of the resource. Coverage will typically include spatial location (a place name or geographic co-ordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary.

dublin core: creator “An entity primarily responsible for making the content of the resource. Examples of a Creator include a person, an organization, or a service. Typically the name of the Creator should be used to indicate the entity.” “Creators should be listed separately, preferably in the same order that they appear in the publication.”

dublin core: publisher “The entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.” “The intent of specifying this field is to identify the entity that provides access to the resource. “

dublin core: contributor An entity responsible for making contributions to the content of the resource. Examples of a Contributor include a person, an organization or a service. Typically, the name of Contributor should be used”. “The same general guidelines for using names of persons or organizations as Creators apply here.”

dublin core: rights “Information about rights held in and over the resource. Typically a Rights element will contain a rights management statement for the resource, or reference a service providing such information.” “Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.”

dublin core: date “A date associated with an event in the life cycle of the resource. Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601” “and follows the YYYY-MM-DD format.”

dublin core: format “The physical or digital manifestation of the resource. Typically, Format may include the media-type or dimensions of the resource. Examples of dimensions include size and duration.” “Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [http://www.iana.org/ assignments/media- types/]”

dublin core: identifier “An unambiguous reference to the resource within a given context. Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Examples of formal identification systems include the Uniform Resource Identifier (URI)” …

dublin core: language “A language of the intellectual content of the resource. Recommended best practice for the values of the Language element is defined by RFC 3066 [RFC 3066, http://www.ietf.org/rfc/ rfc3066.txt] which, in conjunction with ISO 639 [ISO 639, http://www.oasis- open.org/cover/iso639a.html]), defines two- and three-letter primary language tags with optional subtags.”

item type specific metadata There are a bunch of different types that are built-in. Each type takes Dublin Core metadata as well as some extra metadata These item-specific metadata fields can be changed using the web interface.

Omeka built-in item types 1 Document A resource containing textual data. Moving Image A series of visual representations that, when shown in succession, impart an impression of motion. Oral History A resource containing historical information obtained in interviews with persons having firsthand knowledge.

Omeka built-in item types 2 Sound A resource whose content is primarily intended to be rendered as audio. Still Image A static visual representation. Examples of still images are: paintings, drawings, graphic designs, plans and maps. Website A resource comprising of a web page or web pages and all related assets ( such as images, sound and video files, etc. ).

Omeka built-in item types 3 Event A non-persistent, time-based occurrence. Metadata for an event provides descriptive information that is the basis for discovery of the purpose, location, duration, and responsible agents associated with an event. Examples include an exhibition, webcast, conference, workshop, open day, performance, battle, trial, wedding, tea party, conflagration.

Omeka built-in item types 4 Email A resource containing textual messages and binary attachments sent electronically from one person to another or one person to many people. Lesson Plan Instructional materials. Hyperlink Title, URL, Description or annotation.

Omeka built-in item types 5 Person An individual, biographical data, birth and death, etc. Interactive Resource A resource requiring interaction from the user to be understood, executed, or experienced. Examples include forms on Web pages, applets, multimedia learning objects, chat services, or virtual reality environments

Omeka user types Omeka has user types that are defined in the PHP code. $userRoles = array('admin', 'contributor', 'researcher'); To this we have to add ‘super’ as a super user. We can not change these types, unless we change the PHP code.

http://openlib.org/home/krichel Please shutdown the computers when you are done. Thank you for your attention!

LIS654 lecture 1 omeka installation, system overview Thomas Krichel 2012-01-29.

Similar presentations

Presentation on theme: "LIS654 lecture 1 omeka installation, system overview Thomas Krichel 2012-01-29."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LIS654 lecture 1 omeka installation, system overview Thomas Krichel 2012-01-29.

Similar presentations

Presentation on theme: "LIS654 lecture 1 omeka installation, system overview Thomas Krichel 2012-01-29."— Presentation transcript:

Similar presentations

About project

Feedback