The SIMSSA DB Data Model

Slides:



Advertisements
Similar presentations
Managing Your Learners In this guide you will learn how to: Add classes to the Manage Your Learners page Add learners to the Manage Your Learners page.
Advertisements

Concepts of Version Control A Technology-Independent View.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
OU Digital Library development project Liz Mallett – Project Manager James Alexander – Project Developer 25 January 2012.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Organizing Information Digitally Norm Friesen. Overview General properties of digital information Relational: tabular & linked Object-Oriented: inheritance.
Using Metadata Skills for a Course Inventory Lee Richardson Health Sciences Library University of North Carolina at Chapel Hill ALA Annual Conference June.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Information Systems and Network Engineering Laboratory II DR. KEN COSH WEEK 1.
Access 2013 Microsoft Access 2013 is a database application that is ideal for gathering and understanding data that’s been collected on just about anything.
Information Systems & Databases 2.2) Organisation methods.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
ITGS Databases.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Discover ScholarSphere A repository service collaboration between the University Libraries and ITS.
1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .
Core LIMS Training: Key Concepts & Definitions.
Databases: What they are and how they work
Lesson 10 Databases.
Information Systems and Network Engineering Laboratory II
Implementation Process
About me Civil engineer (not in IT) and self-taught developer
Digital Stewardship Curriculum
1 TOOL DESIGN A Review of Learning Design:
Web Routing Designing an Interface
Mukurtu CMS Review, Enriching DH Items
Data Workflow Data Management Workshop
Using computers to search electronic databases
CIS 155 Table Relationship
Publishing software and data
Software Documentation
Topic J: Gathering evidence 3. Strategic paper gathering
Linking persistent identifiers at the British Library
VI-SEEM Data Repository
Adding Assignments and Learning Units to Your TSS Course
Using the Kilgore College Library Online Resources
An Overview of MPEG-21 Cory McKay.
MVC Framework, in general.
What’s New in Colectica 5.3 Part 1
Using OCLC Work IDs for Discovery
Sophia Lafferty-hess | research data manager
Module 6: Preparing for RDA ...
Data Management: Documentation & Metadata
Using the Kilgore College Library Online Resources
Basic Concepts in Data Management
Increased Efficiency and Effectiveness
FRAD: Functional Requirements for Authority Data
Reporting Based on Data in Archivists’ Toolkit
GCSE Computing Databases.
Hydra: a case study Chris Awre
Metadata - Catalogues and Digitised works
2-1-1 Automated Verifications
Building a Custom Gadget in OU Campus
INFO/CSE 100, Spring 2006 Fluency in Information Technology
MUMT611: Music Information Acquisition, Preservation, and Retrieval
Spreadsheets, Modelling & Databases
Beyond Description: Metadata for Catalogers in the 21st Century
USER MANUAL - WORLDSCINET
Attributes and Values Describing Entities.
Chapter 3 Database Management
UML  UML stands for Unified Modeling Language. It is a standard which is mainly used for creating object- oriented, meaningful documentation models for.
APE EAD3 introduction - DARIAH - Brussels
USER MANUAL - WORLDSCINET
Presentation transcript:

The SIMSSA DB Data Model Presented by Emily Hopkins SIMSSA DB team: Cory McKay, Gustavo Polins Pedro, Yaolong Ju, Ichiro Fujinaga, and Julie Cumming SIMSSA Workshop XVII: Infrastructure for Music Discovery CIRMMT, McGill University, Montreal QC, 1 Dec. 2018 Hello again! In addition to serving as project manager, I’m also part of a team working on the SIMSSA Database. Cory McKay is actually the one who led the development of the first version of this data model, but he couldn’t be here this weekend so I’m presenting instead. The SIMSSA Database is the successor of an older database created as part of Julie’s Digging into Data grant, and it was designed to gather symbolic music files in one place to do computer-aided counterpoint analysis. Since then it’s evolved into part of the SIMSSA Project. I’m going to give you some background on what goes into the conceptual model for this database, and then Yaolong and Gustavo will take you through the implementation and user interface after.

How do we model music? How do we track provenance? First, two big questions: How do we model music and how do we track provenance? In modelling music, it’s important to consider who is using these materials and what they might expect. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Digital musicologists want to study abstract works. So – digital musicologists want to study abstract works! This is definitely a bit of a generalization. But for computational musicology looking at musical content, we mostly want to be able to study large numbers of pieces of music at the same time, organized by composer or time period or key or instrument. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Digital musicologists want to study abstract works. e. g Digital musicologists want to study abstract works. e.g. “Beethoven’s Third Symphony” So an example of this kind of “abstract work” would be a Beethoven symphony. This doesn’t refer to a particular score or edition, it’s just the abstract idea of a combination of musical elements that make up Beethoven’s third. To be clear -- Beethoven didn’t actually sit down to write “Beethoven’s Third Symphony-the-abstract-work”; he wrote a piece of music on particular pieces of paper. This idea of a work existing outside material reality is an abstraction we use to organize things. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Digital musicologists want to study abstract works. e. g Digital musicologists want to study abstract works. e.g. “Beethoven’s Third Symphony” Libraries and databases deal with describing and storing items. On the other hand libraries and databases, deal with describing and storing particular items. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Digital musicologists want to study abstract works. e. g Digital musicologists want to study abstract works. e.g. “Beethoven’s Third Symphony” Libraries and databases deal with describing and storing items. e.g. “IMSLP46066-PMLP02581-Op.55.pdf” We have to store Beethoven’s third as manifested in Breitkopf & Härtel’s 1862 edition as made available to us in a file from IMSLP. It’s tempting to imagine that we can just digitize the abstract notion of Beethoven’s third symphony – but we can’t. We have to digitize a particular edition, using a particular kind of software, making various decisions as we go about how to represent this work. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

IFLA-LRM Work: Beethoven Symphony No. 3 (Inernational Federation of Library Associations and Institutions -- Library Reference Model) Work: Beethoven Symphony No. 3 Expression: piano reduction, study score Manifestation: particular edition (e.g., Breitkopf & Härtel, 1862) Item: specific object (the file as it exists stored on IMSLP) In working out how to translate this to a database, we’ve made some attempt to refer to the IFLA-LRM and other library-based conceptual models. IFLA = International Federation of Library Associations and Institutions, LRM = Library Reference Model In the IFLA-LRM work is “the intellectual or artistic content of a distinct creation” – NOT a material object Expression: “a distinct combination of signs conveying intellectual or artistic content” (translation, piano reduction) Manifestation: a particular edition, by a particular publisher –for example Breitkopf and Härtel in 1862. Item: specific object -- not just a particular edition of a novel, but an actual copy of it; here the file on IMSLP. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

IFLA-LRM SIMSSA DB Data Model Work: Beethoven Symphony No. 3 Expression: piano reduction, study score Manifestation: particular edition (e.g., Breitkopf & Härtel, 1862) Item: specific object (the file as it exists stored on IMSLP) Work: Beethoven Symphony No. 3 Related work: piano reduction Source: particular edition (e.g., Breitkopf & Härtel, 1862) File: specific object (the file as it exists stored on IMSLP) Now here is our model on the right. Essentially we have collapsed work and expression, instead expressing that dynamic through relating works to other works. We have prioritized musical content equivalence over other aspects – so a piano reduction would be a related work, because the music is different, whereas a study score would be equivalent. Source maps quite closely on to manifestation. File then maps on to “item” – because in the end we are primarily storing and describing individual files. Now that we’ve established this basic framework, we can start building our database! Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Entity-Relationship Diagram Enter the SIMSSA DB entity-relationship diagram. This is a type of diagram often used for building databases. Everything in a box is an entity – such as a musical work, a source, or an audio file, and the arrows in between define the relationships between the entities. All the lines, circles, and little chicken feet indicate constraints on these relationships. A line is one, a circle is zero, and the chicken feet represents “many”. FOR EXAMPLE, a section relates to one and only one musical work, but a single musical work can have multiple sections. I won’t take you through every relationship in detail, but I will give you an overview of some of the key features and how they work. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Symbolic Music Files This here is the symbolic music file: the heart of the SIMSSA database. Here you can see we keep track of how it is encoded, what musical works it links to (via the source instantiation), and what research projects it belongs to. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Support for different file types Of course, we want other file types too. Text files allow us to include lyrics to songs, audio files may connect with automated audio transcription in the future, and image files are a key part of connecting this to the OMR process you’ve heard about today. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Modelling Musical Works A Musical work is a complete, abstract work. Musical works can be related to other musical works – for example, a symphony could be made available in a piano reduction. The musical content is different but still clearly related. A musical work can be divided into sections – for example, an opera aria, or a movement in a symphony. We made it possible to relate sections to other sections as well – for example if musical material is reused in different masses, or an opera aria is also available with a piano part as a separate publication. Parts are single voices or instruments – we relate them back to sections as they can vary between movements, for example, but it’s a tricky thing to model well on the user end and we’re still working on it. Most of the files we’ve encountered so far are full scores, not parts, but we do want it to be possible. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Movement 3: Allegro vivace Oboe I Beethoven Symphony No. 3 Movement 3: Allegro vivace Oboe I BeethovenSymphony3-III-allegrovivace.xml Here is how we might model the third movement of Beethoven 3 on the actual diagram so far. We have our abstract work, we have its section, the third movement, the corresponding symbolic music file, and a first oboe part. But wait -- which edition did we use to get our notes from? Where did we get our file? This brings us to the next big question. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

How do we track provenance? Provenance is really important for reusability. How does someone know if something is trustworthy if they don’t know where we got it from, or what we did to get it into the database? Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Tracking Provenance First we need to track sources. Source instantiation and source are basically the same thing; the separate abstraction allows us to be more specific about which files our sections come from. Together they represent the source -- the particular edition or manuscript you got your notes from. You can also relate sources to other sources – so your edition that you consulted could be based on an earlier manuscript, for example. Collection of sources could be something like a hymnal or a fake book – multiple different works gathered together. An archive on the other hand is a repository of documents – so not a published collection, but either a physical library or institution, or an online database where music is stored. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Tracking Provenance Another part of provenance is keeping track of who did what when. We have entities for tracking how different file types are encoded, as well as validation which can track verifying the quality of files and can include details about workflow or reliability. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Movement 3: Allegro vivace Oboe I Beethoven Symphony No. 3 Movement 3: Allegro vivace Oboe I BeethovenSymphony3-III-allegrovivace.xml Beethoven Symphony No. 5, pp. 24-56 of specific edition (movement 3) Encoded in Musescore Here is how we might model the third movement of Beethoven 3 on the actual diagram at this point. You can see we’ve added that the symbolic file was encoded using musescore, and we have added the pages we got our notes from, and the edition, and then maybe it’s actually part of a published collection of all of Beethoven’s symphonies, and we found the file on IMSLP. Collected symphonies Beethoven Symphony No. 5 score published by Breitkopf & Härtel in 1862 IMSLP Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Research corpora & content-based search In addition to being able to track where pieces come from and how we generated files, we can also group them within the database. Users can define a research corpus of files they want to work with as a group – sort of like a music playlist, a piece can belong to many collections simultaneously. In addition, every piece in the database automatically has extracted jSymbolic features. These features can be used for content-based music search, which Gustavo and Yaolong will highlight later. Users can search for music by metadata and by musical content! This research corpus dimension allows allows us to preserve which features were used for a particular study, such as the madrigal study Julie told you about earlier. We can keep datasets together with the features involved in a particular study. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Archiving and re-using datasets GitHub SIMSSA DB It takes a lot of time to build a research corpus and ensure that everything is high quality, so we want to be sure we get to keep it! For our own records, so that someone else could reproduce our experiments, or so that others could use our dataset for new research. Long-term digital preservation is actually quite a tricky problem, which for now we’ve planned to approach in three ways. While datasets are being worked on, they go in GitHub. This allows us to track changes and collaborate really easily, and prevents things from getting lost in Google Drive. SIMSSA DB is great for finished corpus, storing related features, making it discoverable, BUT For citation purposes we need something even more robust, so we have plans to try using Zenodo for “release quality” datasets to cite in papers. Zenodo is an open-access respository run by the folks responsible for CERN and allows us to generate a DOI for a stable, citeable dataset. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Towards the interface Now, on to the database itself! Creating a user interface requires taking a step back from the model I just presented and trying to come up with useful and friendly abstractions that will be intuitive to music researchers. Just because we need a particular field or entity in the backend doesn’t mean the user needs to know! Up next you’ll hear from Gustavo and Yaolong who have been the primary developers of the actual SIMSSA Database. Hopkins, Emily. SIMSSA Workshop XVII: Infrastructure for Music Discovery. CIRMMT, McGill University, Montreal, 1 Dec. 2018

Thank you! https://github.com/ELVIS-Project/simssadb @simssaproject emily.hopkins@mcgill.ca