2004.11.23 - SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004

Slides:



Advertisements
Similar presentations
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Advertisements

DL:Lesson 11 Multimedia Search Luca Dini
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
A presentation by Modupe Omueti For CMPT 820:Multimedia Systems
Discussion on Video Analysis and Extraction, MPEG-4 and MPEG-7 Encoding and Decoding in Java, Java 3D, or OpenGL Presented by: Emmanuel Velasco City College.
Information Retrieval in Practice
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
EE442—Multimedia Networking Jane Dong California State University, Los Angeles.
SLIDE 1IS246 - SPRING 2003 Lecture 18: Final Project Overview IS246 Multimedia Information (FILM 240, Section 4) Prof. Marc Davis UC Berkeley.
Quicktime Howell Istance School of Computing De Montfort University.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
1 MPEG-21 : Goals and Achievements Ian Burnett, Rik Van de Walle, Keith Hill, Jan Bormans and Fernando Pereira IEEE Multimedia, October-November 2003.
SCA Introduction to Multimedia
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002
ISP 433/633 Week 5 Multimedia IR. Goals –Increase access to media content –Decrease effort in media handling and reuse –Improve usefulness of media content.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
Philips Research France Delivery Context in MPEG-21 Sylvain Devillers Philips Research France Anthony Vetro Mitsubishi Electric Research Laboratories.
SLIDE 1IS 202 – FALL 2003 Lecture 10: Metadata for Media Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am.
Marc Davis Chairman and Chief Technology Officer Representing Video for Retrieval and Repurposing SIMS 202 Information Organization and Retrieval.
MPEG-7 Multimedia Content Description Standard January 8, 2003 John R. Smith Pervasive Media Management Group IBM T. J. Watson Research Center 19 Skyline.
Metadata Presentation by Rick Pitchford Chief Engineer, School of Communication COM 633, Content Analysis Methods Fall 2009.
Overview of Search Engines
HYPERTEXT MARKUP LANGUAGE (HTML)
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Chapter II The Multimedia Sysyem. What is multimedia? Multimedia means that computer information can be represented through audio, video, and animation.
1 Samson Cheung EE 639, Fall 2004 Lecture 1: Applications & Trends Multimedia Information Systems advent: open communicator browser, screen cam, hari’s.
Computer Concepts – Illustrated 8 th edition Unit C: Computer Software.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Multimedia and the Web Chapter Overview  This chapter covers:  What Web-based multimedia is  how it is used today  advantages and disadvantages.
Naresuan University Multimedia Paisarn Muneesawang
1 Seminar Presentation Multimedia Audio / Video Communication Standards Instructor: Dr. Imran Ahmad By: Ju Wang November 7, 2003.
Multimedia Databases (MMDB)
The MPEG-7 Standard - A Brief Tutorial - Ali Tabatabai Sony US Research Laboratories February 27, 2001.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
CHAPTER FOUR COMPUTER SOFTWARE.
Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Semantic Web - Multimedia Annotation – Steffen Staab
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
MULTIMEDIA DEFINITION OF MULTIMEDIA
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002
1 Mpeg-4 Overview Gerhard Roth. 2 Overview Much more general than all previous mpegs –standard finished in the last two years standardized ways to support:
What’s MPEG-21 ? (a short summary of available papers by OCCAMM)
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
MPEG-4: Multimedia Coding Standard Supporting Mobile Multimedia System Lian Mo, Alan Jiang, Junhua Ding April, 2001.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
MPEG-7 Audio Overview Ichiro Fujinaga MUMT 611 McGill University.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Video on the Semantic Web Experiences with Media Streams CWI Amsterdam Joost Geurts Jacco van Ossenbruggen Lynda Hardman UC Berkeley SIMS Marc Davis.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004
Introduction to MPEG  Moving Pictures Experts Group,  Geneva based working group under the ISO/IEC standards.  In charge of developing standards for.
MPEG 7 &MPEG 21.
Information Retrieval in Practice
MPEG-7 What is MPEG-7 ? MPEG-7 is a multimedia content description standard. These descriptions are based on catalogue (e.g., title, creator, rights),
Visual Information Retrieval
Introduction Multimedia initial focus
9/22/2018.
An Overview of MPEG-21 Cory McKay.
Overview What is Multimedia? Characteristics of multimedia
Multimedia Content Description Interface
Presentation transcript:

SLIDE 1IS 202 – FALL 2004 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall SIMS 202: Information Organization and Retrieval Lecture 23: Media Streams and MPEG-7

SLIDE 2IS 202 – FALL 2004 Today’s Agenda Review of Last Time Metadata for Video –Representing Video –Media Streams –MPEG-7 Discussion Questions Action Items for Next Time

SLIDE 3IS 202 – FALL 2004 Today’s Agenda Review of Last Time Metadata for Video –Representing Video –Media Streams –MPEG-7 Discussion Questions Action Items for Next Time

SLIDE 4IS 202 – FALL 2004 The Media Opportunity Vastly more media will be produced Without ways to manage it (metadata creation and use) we lose the advantages of digital media Most current approaches are insufficient and perhaps misguided Great opportunity for innovation and invention Need interdisciplinary approaches to the problem

SLIDE 5IS 202 – FALL 2004 What is the Problem? Today people cannot easily find, edit, share, and reuse media Computers don’t understand media content –Media is opaque and data rich –We lack structured representations Without content representation (metadata), manipulating digital media will remain like word- processing with bitmaps

SLIDE 6IS 202 – FALL 2004 Signal-to-Symbol Problems Semantic Gap –Gap between low- level signal analysis and high-level semantic descriptions –“Vertical off-white rectangular blob on blue background” does not equal “Campanile at UC Berkeley”

SLIDE 7IS 202 – FALL 2004 Signal-to-Symbol Problems Sensory Gap –Gap between how an object appears and what it is –Different images of same object can appear dissimilar –Images of different objects can appear similar

SLIDE 8IS 202 – FALL 2004 M E T A D A T A Traditional Media Production Chain PRE-PRODUCTIONPOST-PRODUCTIONPRODUCTIONDISTRIBUTION Metadata-Centric Production Chain

SLIDE 9IS 202 – FALL 2004 Evolution of Media Production Customized production –Skilled creation of one media product Mass production –Automatic replication of one media product Mass customization –Skilled creation of adaptive media templates –Automatic production of customized media

SLIDE 10IS 202 – FALL 2004 Today’s Agenda Review of Last Time Metadata for Video –Representing Video –Media Streams –MPEG-7 Discussion Questions Action Items for Next Time

SLIDE 11IS 202 – FALL 2004 Representing Video Streams vs. Clips Video syntax and semantics Ontological issues in video representation

SLIDE 12IS 202 – FALL 2004 Video is Temporal

SLIDE 13IS 202 – FALL 2004 Streams vs. Clips

SLIDE 14IS 202 – FALL 2004 Stream-Based Representation Makes annotation pay off –The richer the annotation, the more numerous the possible segmentations of the video stream Clips –Change from being fixed segmentations of the video stream, to being the results of retrieval queries based on annotations of the video stream Annotations –Create representations which make clips, not representations of clips

SLIDE 15IS 202 – FALL 2004 Video Syntax and Semantics The Kuleshov Effect Video has a dual semantics –Sequence-independent invariant semantics of shots –Sequence-dependent variable semantics of shots

SLIDE 16IS 202 – FALL 2004 Ontological Issues for Video Video plays with rules for identity and continuity –Space –Time –Person –Action

SLIDE 17IS 202 – FALL 2004 Space and Time: Actual vs. Inferable Actual Recorded Space and Time –GPS –Studio space and time Inferable Space and Time –Establishing shots –Cues and clues

SLIDE 18IS 202 – FALL 2004 Time: Temporal Durations Story (Fabula) Duration –Example: Brushing teeth in story world (5 minutes) Plot (Syuzhet) Duration –Example: Brushing teeth in plot world (1 minute: 6 steps of 10 seconds each) Screen Duration –Example: Brushing teeth (10 seconds: 2 shots of 5 seconds each)

SLIDE 19IS 202 – FALL 2004 Character and Continuity Identity of character is constructed through –Continuity of actor –Continuity of role Alternative continuities –Continuity of actor only –Continuity of role only

SLIDE 20IS 202 – FALL 2004 Representing Action Physically-based description for sequence-independent action semantics –Abstract vs. conventionalized descriptions –Temporally and spatially decomposable actions and subactions Issues in describing sequence-dependent action semantics –Mental states (emotions vs. expressions) –Cultural differences (e.g., bowing vs. greeting)

SLIDE 21IS 202 – FALL 2004 “Cinematic” Actions Cinematic actions support the basic narrative structure of cinema –Reactions/Proactions Nodding, screaming, laughing, etc. –Focus of Attention Gazing, headturning, pointing, etc. –Locomotion Walking, running, etc. Cinematic actions can occur Within the frame/shot boundary Across the frame boundary Across shot boundaries

SLIDE 22IS 202 – FALL 2004 The Search for Solutions Current approaches to creating metadata don’t work –Signal-based analysis –Keywords –Natural language Need standardized metadata framework –Designed for video and rich media data –Human and machine readable and writable –Standardized and scaleable –Integrated into media capture, archiving, editing, distribution, and reuse

SLIDE 23IS 202 – FALL 2004 Signal-Based Parsing Practical problem –Parsing unstructured, unknown video is very, very hard Theoretical problem –Mismatch between percepts and concepts

SLIDE 24IS 202 – FALL 2004 Perceptual/Conceptual Issue Clown NoseRed Sun Similar Percepts / Dissimilar Concepts

SLIDE 25IS 202 – FALL 2004 Perceptual/Conceptual Issue Car Dissimilar Percepts / Similar Concepts John Dillinger’sTimothy McVeigh’s

SLIDE 26IS 202 – FALL 2004 Signal-Based Parsing Effective and useful automatic parsing –Video Shot boundary detection Camera motion analysis Low level visual similarity Feature tracking Face detection –Audio Pause detection Audio pattern matching Simple speech recognition Speech vs. music detection Approaches to automated parsing –At the point of capture, integrate the recording device, the environment, and agents in the environment into an interactive system –After capture, use “human- in-the-loop” algorithms to leverage human and machine intelligence

SLIDE 27IS 202 – FALL 2004 Keywords vs. Semantic Descriptors dog, biting, Steve

SLIDE 28IS 202 – FALL 2004 Keywords vs. Semantic Descriptors dog, biting, Steve

SLIDE 29IS 202 – FALL 2004 Why Keywords Don’t Work Are not a semantic representation Do not describe relations between descriptors Do not describe temporal structure Do not converge Do not scale

SLIDE 30IS 202 – FALL 2004 Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking. Natural Language vs. Visual Language

SLIDE 31IS 202 – FALL 2004 Natural Language vs. Visual Language Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking.

SLIDE 32IS 202 – FALL 2004 Notation for Time-Based Media: Music

SLIDE 33IS 202 – FALL 2004 Visual Language Advantages A language designed as an accurate and readable representation of time-based media –For video, especially important for actions, expressions, and spatial relations Enables Gestalt view and quick recognition of descriptors due to designed visual similarities Supports global use of annotations

SLIDE 34IS 202 – FALL 2004 Today’s Agenda Review of Last Time Metadata for Video –Representing Video –Media Streams –MPEG-7 Discussion Questions Action Items for Next Time

SLIDE 35IS 202 – FALL 2004 After Capture: Media Streams

SLIDE 36IS 202 – FALL 2004 Media Streams Features Key features –Stream-based representation (better segmentation) –Semantic indexing (what things are similar to) –Relational indexing (who is doing what to whom) –Temporal indexing (when things happen) –Iconic interface (designed visual language) –Universal annotation (standardized markup schema) Key benefits –More accurate annotation and retrieval –Global usability and standardization –Reuse of rich media according to content and structure

SLIDE 37IS 202 – FALL 2004 Media Streams GUI Components Media Time Line Icon Space –Icon Workshop –Icon Palette

SLIDE 38IS 202 – FALL 2004 Media Time Line Visualize video at multiple time scales Write and read multi-layered iconic annotations One interface for annotation, query, and composition

SLIDE 39IS 202 – FALL 2004 Media Time Line

SLIDE 40IS 202 – FALL 2004 Icon Space Icon Workshop –Utilize categories of video representation –Create iconic descriptors by compounding iconic primitives –Extend set of iconic descriptors Icon Palette –Dynamically group related sets of iconic descriptors –Reuse descriptive effort of others –View and use query results

SLIDE 41IS 202 – FALL 2004 Icon Space

SLIDE 42IS 202 – FALL 2004 Icon Space: Icon Workshop General to specific (horizontal) –Cascading hierarchy of icons with increasing specificity on subordinate levels Combinatorial (vertical) –Compounding of hierarchically organized icons across multiple axes of description

SLIDE 43IS 202 – FALL 2004 Icon Space: Icon Workshop Detail

SLIDE 44IS 202 – FALL 2004 Icon Space: Icon Palette Dynamically group related sets of iconic descriptors Collect icon sentences Reuse descriptive effort of others

SLIDE 45IS 202 – FALL 2004 Icon Space: Icon Palette Detail

SLIDE 46IS 202 – FALL 2004 Video Retrieval In Media Streams Same interface for annotation and retrieval Assembles responses to queries as well as finds them Query responses use semantics to degrade gracefully

SLIDE 47IS 202 – FALL 2004 Media Streams Technologies Minimal video representation distinguishing syntax and semantics Iconic visual language for annotating and retrieving video content Retrieval-by-composition methods for repurposing video

SLIDE 48IS 202 – FALL 2004 Non-Technical Challenges Standardization of media metadata (MPEG-7) Broadband infrastructure and deployment Intellectual property and economic models for sharing and reuse of media assets

SLIDE 49IS 202 – FALL 2004 Today’s Agenda Review of Last Time Metadata for Video –Representing Video –Media Streams –MPEG-7 Discussion Questions Action Items for Next Time

SLIDE 50IS 202 – FALL 2004 The Search for Solutions Current approaches to creating metadata don’t work –Signal-based analysis –Keywords –Natural language Need standardized metadata framework –Designed for video and rich media data –Human and machine readable and writable –Standardized and scaleable –Integrated into media capture, archiving, editing, distribution, and reuse

SLIDE 51IS 202 – FALL 2004 Standards Overview Why do we need multimedia standards? –Reliability –Scalability –Interoperability –Layered architecture De facto standards –Not legislated, but widely adopted De jure standards –Legislated, but not necessarily widely adopted

SLIDE 52IS 202 – FALL 2004 Multimedia Standards Process Market dominance –Microsoft Examples: Internet Explorer, Windows Media Player –Sony Examples: VHS, MiniDV –Adobe Examples: PDF International standards organizations –ISO –MPEG –SMPTE

SLIDE 53IS 202 – FALL 2004 MPEG Standards MPEG-1 –Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/sec MPEG-2 –Generic coding of moving pictures and associated audio information MPEG Audio Layer-3 (MP3) –Audio compression MPEG-4 –Standardized technological elements enabling the integration of production, distribution, and content access paradigms

SLIDE 54IS 202 – FALL 2004 MPEG-4 Represents units of aural, visual or audiovisual content, called “media objects” –These media objects can be of natural or synthetic origin (this means they could be recorded with a camera or microphone, or generated with a computer) Describes the composition of these objects to create compound media objects that form audiovisual scenes Synchronizes the data associated with media objects, so that they can be transported over network channels providing a QoS appropriate for the nature of the specific media objects Interacts with the audiovisual scene generated at the receiver’s end

SLIDE 55IS 202 – FALL 2004 MPEG Standards MPEG-7 –Describing the multimedia content data that supports some degree of interpretation of the information’s meaning, which can be passed onto, or accessed by, a device or a computer code MPEG-21 –A normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain

SLIDE 56IS 202 – FALL 2004 MPEG-7 Motivation Create standardized multimedia description framework Enable content-based access to and processing of multimedia information on the basis of descriptions of multimedia content and structure (metadata) Support range of abstraction levels for metadata from low-level signal characteristics to high-level semantic information

SLIDE 57IS 202 – FALL 2004 MPEG-7 Query Examples Play a few notes on a keyboard and retrieve a list of musical pieces similar to the required tune, or images matching the notes in a certain way, e.g., in terms of emotions Draw a few lines on a screen and find a set of images containing similar graphics, logos, ideograms,... Define objects, including color patches or textures and retrieve examples among which you select the interesting objects to compose your design On a given set of multimedia objects, describe movements and relations between objects and so search for animations fulfilling the described temporal and spatial relations Describe actions and get a list of scenarios containing such actions Using an excerpt of Pavarotti’s voice, obtaining a list of Pavarotti’s records, video clips where Pavarotti is singing and photographic material portraying Pavarotti

SLIDE 58IS 202 – FALL 2004 MPEG-7 Sample Application Areas Architecture, real estate, and interior design –(e.g., searching for ideas) Broadcast media selection –(e.g., radio channel, TV channel) Cultural services –(history museums, art galleries, etc.) Digital libraries –(e.g., image catalogue, musical dictionary, bio-medical imaging catalogues, film, video and radio archives) E-Commerce –(e.g., personalized advertising, on-line catalogues, directories of e-shops) Education –(e.g., repositories of multimedia courses, multimedia search for support material) Home Entertainment –(e.g., systems for the management of personal multimedia collections, including manipulation of content, e.g. home video editing, searching a game, karaoke) Investigation services –(e.g., human characteristics recognition, forensics) Journalism –(e.g. searching speeches of a certain politician using his name, his voice or his face) Multimedia directory services –(e.g. yellow pages, Tourist information, Geographical information systems) Multimedia editing –(e.g., personalized electronic news service, media authoring) Remote sensing –(e.g., cartography, ecology, natural resources management) Shopping –(e.g., searching for clothes that you like) Social –(e.g. dating services) Surveillance –(e.g., traffic control, surface transportation, non-destructive testing in hostile environments)

SLIDE 59IS 202 – FALL 2004 MPEG-7 Scope

SLIDE 60IS 202 – FALL 2004 MPEG-7 Metadata Framework Data –“multimedia information that will be described using MPEG-7, regardless of storage, coding, display, transmission, medium, or technology.” Feature –“a distinctive characteristic of the data [that] signifies something to somebody.”

SLIDE 61IS 202 – FALL 2004 MPEG-7 Metadata Framework Descriptor –“A representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation.” Description Scheme –“The structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes.” Description Definition Language (XML Schema) –“A language that allows the creation of new Description Schemes, and, possibly, new Descriptors. It also allows the extension and modification of existing Description Schemes.”

SLIDE 62IS 202 – FALL 2004 MPEG-7 Framework

SLIDE 63IS 202 – FALL 2004 MPEG-7 Standard Parts 1.MPEG-7 Systems –The binary format for encoding MPEG-7 descriptions and the terminal architecture 2.MPEG-7 Description Definition Language –The language for defining the syntax of the MPEG-7 Description Tools and for defining new Description Schemes 3.MPEG-7 Visual –The Description Tools dealing with (only) Visual descriptions 4.MPEG-7 Audio –The Description Tools dealing with (only) Audio descriptions

SLIDE 64IS 202 – FALL 2004 MPEG-7 Standard Parts 5.MPEG-7 Multimedia Description Schemes –The Description Tools dealing with generic features and multimedia descriptions 6.MPEG-7 Reference Software –A software implementation of relevant parts of the MPEG-7 Standard with normative status 7.MPEG-7 Conformance Testing –Guidelines and procedures for testing conformance of MPEG-7 implementations 8.MPEG-7 Extraction and Use of Descriptions –Informative material (in the form of a Technical Report) about the extraction and use of some of the Description Tools (under development)

SLIDE 65IS 202 – FALL 2004 MPEG-7 Description Tools

SLIDE 66IS 202 – FALL 2004 MPEG-7 Top Level Hierarchy

SLIDE 67IS 202 – FALL 2004 MPEG-7 Still Image Description

SLIDE 68IS 202 – FALL 2004 Referencing Temporal Media

SLIDE 69IS 202 – FALL 2004 Spatio-Temporal Region

SLIDE 70IS 202 – FALL 2004 MPEG-7 Video Segments Example

SLIDE 71IS 202 – FALL 2004 MPEG-7 Segment Relationship Graph

SLIDE 72IS 202 – FALL 2004 MPEG-7 Conceptual Description

SLIDE 73IS 202 – FALL 2004 MPEG-7 Summaries

SLIDE 74IS 202 – FALL 2004 MPEG-7 Collections

SLIDE 75IS 202 – FALL 2004 MPEG-7 Application Framework

SLIDE 76IS 202 – FALL 2004 MPEG-7 Applications Today IBM MPEG-7 Annotation Tool –Assists in annotating video sequences with MPEG-7 metadata Ricoh MPEG-7 MovieTool –A tool for creating video content descriptions conforming to MPEG-7 syntax interactively Canon MPEG-7 Speech Recognition engine –Web site allows you to create an MPEG-7 Audio “SpokenContent” description file from an audio file in “wav” format

SLIDE 77IS 202 – FALL 2004 IBM MPEG-7 Annotation Tool

SLIDE 78IS 202 – FALL 2004 IBM MPEG-7 Annotation Tool The IBM MPEG-7 Annotation Tool assists in annotating video sequences with MPEG-7 metadata –Each shot in the video sequence can be annotated with static scene descriptions, key object descriptions, event descriptions, and other lexicon sets –The annotated descriptions are associated with each video shot and are stored as MPEG-7 descriptions in an XML file –Can also open MPEG-7 files in order to display the annotations for the corresponding video sequence –Customized lexicons can be created, saved, downloaded, and updated

SLIDE 79IS 202 – FALL 2004 Ricoh MovieTool Creates an MPEG-7 description by loading video data Provides visual clues to aid the user in creating the structure of the video Automatically reflects the structure in the MPEG-7 descriptions Visually shows the relationship between the structure and MPEG-7 descriptions Presents candidate tags to help choose appropriate MPEG-7 tags Checks the validation of the MPEG-7 descriptions in accordance with MPEG-7 schema Can describe all metadata defined in MPEG-7 Is able to reflect any future changes and extensions made to MPEG-7 schema

SLIDE 80IS 202 – FALL 2004 Canon MPEG-7 ASR Tool

SLIDE 81IS 202 – FALL 2004 MPEG-7 Resources /WileyTitle/productCd html

SLIDE 82IS 202 – FALL 2004 MPEG-7 Future New application specific profiles Integration into media production and reuse cycle –Automated metadata creation in devices –Use of MPEG-7 metadata in multimedia applications MPEG-21

SLIDE 83IS 202 – FALL 2004 Today’s Agenda Review of Last Time Metadata for Video –Representing Video –Media Streams –MPEG-7 Discussion Questions Action Items for Next Time

SLIDE 84IS 202 – FALL 2004 Discussion Questions Jennifer Hastings on Media Streams –Is the “vocabulary problem” completely eliminated by using an iconic language to annotate video content? If not, would training annotators as well as “reusers” of the video reliably remove discrepancies in usage and interpretation?

SLIDE 85IS 202 – FALL 2004 Discussion Questions Jennifer Hastings on Media Streams –How could Media Streams handle the potential sensitivities/contractual stipulations involved in re-using human subjects in a multitude of compositions? Could this be solved with syntactic rules of the iconic language or some other means?

SLIDE 86IS 202 – FALL 2004 Discussion Questions Jennifer Hastings on Media Streams –“The challenge… is to provide a framework for… representing… those aspects of video content whose semantics are invariant and sequence-independent and those whose semantics are variable and sequence-dependent. [MS’s] representational system is optimized to represent that which one sees and hears in a video shot (a physically-based description), rather than what one infers from the syntactic context.” –Could an annotator’s knowledge of the context of what might be considered a semantically invariant/sequence-independent aspect of the video lead to an inaccurate representation of that content? I.e. are shaking hands ever not shaking hands?

SLIDE 87IS 202 – FALL 2004 Discussion Questions Jinghua Luo on MPEG-7 –The MPEG-7 standard supports a range of abstraction levels from low-level signal properties to high-level semantics. How should a MPEG-7 compliant search engine take into account these abstraction levels in determining how relevant a video is to a given query? Which features are more important: visual, audio, genre classification, semantic, or some other features?

SLIDE 88IS 202 – FALL 2004 Discussion Questions Jinghua Luo on MPEG-7 –The Description Definition Language (DDL) gives users the freedom to extend the generic MPEG-7 standard by developing application specific descriptors and description schemes. However, the new extended standard defined by one application can be difficult or even impossible to be understood by another application. Does the flexibility and extensibility afforded by DDL interfere with MPEG-7's goal of achieving interoperability among different applications and application domains?

SLIDE 89IS 202 – FALL 2004 Discussion Questions Sarita Yardi on MPEG-7 –We did not have a problem with reusability in Bob's XML class because we have no emotional or symbolic attachment to documents. How would you feel if your personal wedding video was represented below, so that others could reuse your ideas? John Jane Blah blah blah yadda yadda yadda Does this bother you? If so, is there any way of avoiding it?

SLIDE 90IS 202 – FALL 2004 Discussion Questions Sarita Yardi on MPEG-7 –On page 339 of the reader, Figure 6 shows a woman's image being superimposed in various ways. Considering that we haven't yet figured out a solution for legally sharing music, do you think there are or will be problems with not only sharing video, but also allowing editing and sharing of copyrighted material? For example, do you think our Governor Ahnold would have a problem with Marc showing all of Terminator 2 in class with his own image in place of Arnold's? How about if Marc showed the same movie at his house and charged us all $3 to see it? Or if Marc gave us an assignment to insert ourselves into the movie?

SLIDE 91IS 202 – FALL 2004 Today’s Agenda Review of Last Time Metadata for Video –Representing Video –Media Streams –MPEG-7 Discussion Questions Action Items for Next Time

SLIDE 92IS 202 – FALL 2004 Assignment 7 Revised Consolidated Excel Spreadsheet –Mail to: is202-ta (at) sims.berkeley.edu simonpk (at) sims.berkeley.edu –By: Sunday, November 28, at 2:00 pm RDF Protégé Project File –Mail to: is202-ta (at) sims.berkeley.edu simonpk (at) sims.berkeley.edu –By: Thursday, December 2, at 10:30 am

SLIDE 93IS 202 – FALL 2004 Readings for Next Time Mobile and Context-Aware Mutlimedia Information Systems –Understanding and Using Context (Dey) Helen –Time as essence for photo browsing through personal digital libraries (Graham, Garcia-Molina, Paepcke, Winograd) David –Automatic Organization for Digital Photographs with Geographic Coordinates (Naaman) Kelly –From Context to Content: Leveraging Context to Infer Media Metadata (Davis) Bruce