Case Study: Report from the Front Lines of Digital Asset Management at CNN Kathy Christensen CNN News Archives August 2001
2 CNN Background Multiple products: CNN, Headline News, CNN International, CNN.com et al, CNN/SI, CNNfn, CNN en Espanol, Airport Network, Inflight CNN Library as central resource –Information research –Archive –Footage licensing
3 Whats in the CNN archive? Type of material –10%: programs (Larry King, Crossfire, etc) –90% is raw footage & edited cut items (pkgs, sots, vos) Volume –150,000+ hours of footage in Atlanta plus additional footage in bureaus –1,000,000+items in Atlanta central catalog plus 600,000 across bureau catalogs Growth –2000 items archived per week in Atlanta culled from many times more incoming items 1/3 of items per day are cut (3 hrs) 2/3 of items per day are raw (90 hrs) –30,000 hours archived in 2000
4 Who are the archive clients? CNN –daily news - TV and Interactive –documentary - TV and Interactive –other (Sales, Marketing, PR, Legal, etc) AOL-TW companies (TNT, TBS, Warner Bros) External customers (Imagesource clients)
5 The Archive Project (aka core of CNNs digital future) Purpose –Preserve assets –Extend usage of assets –Create efficiencies –Facilitate new business opportunities –Create media management framework for the digital CNN
6 Pre-Digital Scenario
7 Digital Scenario
8 System goals and challenges –Multiple resolutions captured simultaneously - to serve broadcast, edit and Internet –Generate as much meaningful cataloging data automatically as possible - technology continuing to improve –Support the necessary human cataloging with powerful tools –Support retrieval needs of diverse user communities
9 Our Approach –Assemble a diverse internal team with multidisciplinary expertise R&D, Engineering, IT, Library Science, Users –Co-developers with Sony and IBM Key Principles –Custom solution not desired –Focus on interoperability and standards –Phased development get started and build on it
10 Users drive cataloging & search requirements Production usually demands video of versus stories about –Automatically captured narrative track excellent for finding about but often misses the of-- what do we see in the footage? –Special challenge of raw video -- b-roll often has no track to capture High-pressure, fast turn-around, 24-hour environment requires highly precise results, extremely quickly Long-term documentary production can tolerate more browsing but still requires reliably comprehensive retrieval News domain requires reliance on accuracy of editorial metadata - bad data and inadequate search systems equal journalistic problems
11 Enablers of accuracy, precision, speed, thoroughness Controlled vs Free-form Data Entry - build data entry aids which support consistent entry Adequate size for keyword and video description fields Controlled classification terms with a mechanism for dynamically updating the classifications Fielded Tags for –best of video –about but not seen –natural sound Flexibility in search approaches - free-text, controlled vocabulary, field-specific, user control over precision vs fuzziness, user control over tracks to include, user control over weighting and display of results
12 Technology strengths supplement human weaknesses Automatic capture of closed-caption text improves retrieval of small, specific portions of programming about something -- a viewer need which is not easily met now. Voice-to-text transcription even at 60% accuracy fills a not-easily met need to find specific soundbites in raw speeches, interviews, hearings, etc. Video to video matching supports identification of permutations of the same video piece across the catalog
13 Technology strengths supplements human strengths Making sense of images, putting them into editorial context, and attaching words so they may be retrieved –Automatic scene change detection facilitates speedy review of item by human cataloger –Face recognition software may not know who a particular face is, but can know that the video contains a face which a human can then identify
14 Technology strengths also supplement technology weaknesses Speech-to-text weakness - some of the data most likely to be search on… names of people, companies, places –Phonetic-based search strengths can cover speech-to-text search weakness Phonetic track useful for searching but doesnt provide textual cataloging data –Speech-to-text transcription useful as representation of the content of the asset
15 Food for thought … Responsibilities –to the parent company –to the user communities –to the rightsholders –to posterity??? This means thinking about –Physical integrity of the content (quality, lossless conversions, standards, migration) –Intellectual integrity of content…ethics