Download presentation
Presentation is loading. Please wait.
Published byAlvin Darcy Parsons Modified over 9 years ago
1
CAVA: a human Communication Audio-Visual Archive Matt Mahon [1], Suzanne Beeke [1], Merle Mahon [2] and Martin Moyle [3] UCL Departments of Language and Communication [1], Developmental Science [2] and Library Services [3] Clockwise from above: Dissemination-quality video (MPG) [a] ; preservation video (AVI) [b] ; preferred format standards. Data and formats Why is CAVA needed? The CAVA project aims to establish a repository for audio-visual data on real-life human communication for spoken and signed languages. In order to investigate human communication and interaction, researchers need hours of audio-visual data, sometimes recorded over periods of months or years. Collecting and cataloguing such valuable data is time-consuming and expensive. Once it is collected and ready to use, it makes sense to get the maximum value from it by reusing it and sharing it among the research community. File type CapturePreservationDownloadStreamingAudio-only AVI MPGFLVWAV Video Codec[DVSD]DV25MPEG-1On2 VP6N/A Data rate (kbps)28800 3024400N/A Frames/sec25 N/A Frame size720x576 480x360N/A Audio CodecPCM MP2MP3PCM Data rate (kbps)1024 2241281024 Sampling rate (Hz)44100 Channels22222 Sample precision16-bit Metadata It is not enough to simply collect and standardise the quality of the data; it must be readily searchable. Natural audio-visual data tends to defy easy classification, and may lead to idiosyncratic solutions to preservation, metadata and access issues. CAVA uses a modified metadata standard based on the ISLE MetaData Initiative (IMDI), a schema designed for language resources. Principally the UCL Deafness, Cognition and Language Research unit (DCAL) subset, the CAVA subset presents a pragmatic solution. All the information required for the metadata record is information normally collected in the course of research; fields which do not apply may be left blank. Below: A complete metadata record. This record includes an MPEG video file, a WAV audio file and a transcription in Word format. Still images from video: [a, b]: ‘1 AB 10-04 T’, Mahon, M. Department of Health and University College London, EAL Deaf Children study, 2009. [c]: ‘D3RA5’, Beeke, S. University College London, The Evaluation of a Novel Conversation-focused Therapy for Agrammatism study, 2009. Our website: www.ucl.ac.uk/ls/cava The archive: http://digitool.ucl.ac.uk Pilot The CAVA pilot launched in September 2009, with four objects in the archive. The repository, which is still in development, now contains four datasets with over 170 hours of audio-visual data. The CAVA team will also be piloting limited access to datasets through UCL’s VLE, Moodle. The CAVA team are currently accepting data for dissemination from researchers at a variety of institutions, and are considering requests to access data from the repository. If you are interested in including your data in the repository, or accessing the data we hold, please contact the Project Officer at lib-cava@ucl.ac.uk. Above: Preservation-quality video (AVI) [c]. Access Well-implemented access management is crucial to the success of the repository, given the wide range of ethical and copyright restrictions on the data. As the data is collected it is stored using the UCL Library Services Digital Collections service, which runs on the Ex Libris DigiTool platform. Access to Digital Collections requires a unique login and password which will be assigned by the CAVA team upon completion of the end user licence. Video clips, transcripts (where available) and descriptive metadata can be uploaded to the repository in batches, maintaining the relationships between the one or more versions of each video recording. Technical metadata is generated automatically, and appropriate access restrictions and exceptions are applied. All data accepted by the archive will have appropriate permissions for the various types of dissemination. Users will be available to download compressed video or uncompressed audio-only files. Above left: CAVA on the UCL Digital Collections front page. Above right: The CAVA repository main page. Natural data can often be used for more than the purpose its collector intended. Researchers may be able to save time and money, or improve the depth of their observations and conclusions, by reusing existing data instead of collecting their own. What formats will CAVA manage? The data which will be placed in the repository comes from a wide range of sources, in a wide range of formats. Consequently it has a wide range of software requirements, depending on the equipment used to make the recordings. Our aim is to introduce uniformity where practical, ideally archiving an audio-only and a compressed video copy of each recording. As well as the data itself, a small sample video from each data set will be available by streaming at collection level, so that potential users can explore the repository and select the collections most appropriate to their work. Below: A workflow for uploading data and gaining access to the repository. Above: A pilot browse structure. CAVA team receives metadata form, licences and the data itself Prospective user completes licence forms The data is made available through the repository, and appropriate users are given access CAVA team arranges user access to the repository Project officer prepares data for upload to the repository Data is uploaded in batches Depositor completes metadata form and licences (Project officer is available to help with completion of the metadata) START
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.