Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA.

Similar presentations


Presentation on theme: "Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA."— Presentation transcript:

1 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA Villers les Nancy, France Chinese Academy of Sciences Beijing, China 4/29/2015

2 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 2 Outline of contents Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

3 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 3 Ozone WP2 architecture

4 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 4 90 92 94 98 99 01 ? v1 v2 mpeg1 mpeg2 mpeg4 mpeg7 mpeg21 MPEG-3, ever defined, but abandoned MPEG-5 and -6, not defined From MPEG-1 to MPEG-7

5 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 5 MPEG-1 – Coding of moving pictures and audio for digital storage media (CD-ROM, MP3), 11/92 MPEG-2 – Generic Coding of moving pictures and audio information (DVD, Digital TV), 11/94 MPEG-4 – Coding of Audiovisual Objects for MM appls Ver1 09/98, Ver2 11/99 MPEG-7 – Multimedia content description for AV material 08/01 MPEG-21 – Digital AV framework: Integration of multimedia technologies, 11/01 MPEG Family

6 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 6 Why is MPEG-7 needed Digital audiovisual information increasing –more and more available contents –all kinds of sources of information Use of the digital audiovisual information –description of the contents –fast search of the contents

7 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 7 Objective of MPEG-7 Standardize content-based description for various types of audiovisual information –Enable fast and efficient content searching, filtering and identification –Describe several aspects of the content (low-level features, structure, semantic, models, collections, creation, etc.) –Address a large range of applications Types of audiovisual information: –Audio, speech –Moving video, still pictures, graphics, 3D models –Information on how objects are combined in scenes

8 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 8 Scope of MPEG-7 The description generation (feature extraction, indexing process, annotation & authoring tools,...) and consumption (search engine, filtering tool, retrieval process, browsing device,...) are non normative parts of MPEG-7. The goal is to define the minimum that enables interoperability. Description generation Description consumption Scope of MPEG-7 Research and future competition Research and future competition

9 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 9 Scope of MPEG-7 Search Engine: Searching & filtering Classification Manipulation Summarization Indexing MPEG-7 Scope: Description Schemes (DSs) Descriptors (Ds) Language (DDL) Ref: MPEG-7 Concepts Feature Extraction: Content analysis (D, DS) Feature extraction (D, DS) Annotation tools (DS) Authoring (DS)

10 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 10 Audio in MPEG-7 Audio content description (yes) Sound retrieval and classifier (yes) Speech synthesis (no) Speech recognition (no) Probability Models (yes)

11 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 11 Parts of the MPEG-7 Standard ISO / IEC 15938 - 1: Systems ISO / IEC 15938 - 2: Description Definition Language ISO / IEC 15938 - 3: Visual ISO / IEC 15938 - 4: Audio ISO / IEC 15938 - 5: Multimedia Description Schemes ISO / IEC 15938 - 6: Reference Software

12 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 12 Outline of contents Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

13 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 13 Main elements of MPEG-7 Descriptors (D): representations of features, that define the syntax and the semantics of each feature representation (low-level). Description Schemes (DS): that specify the structure and semantics of the relationships between their components, which may be both Ds and DSs (high-level). A Description Definition Language (DDL): based on XML Schema, to allow the creation of new DSs and Ds, and to allow the extension and modification of existing DSs System tools : to support multiplexing of descriptions, synchronization issues, transmission mechanisms, coded representations, management and protection of intellectual property

14 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 14 Relations of main elements DS DDL DS D D DD D D D D

15 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 15 Description Definition Language Description Definition Language (DDL) is a language that define what description is valid, and allows the creation of new Description Schemes and Descriptors. It also allows the extension and modification of existing Description Schemes DDL is used to define a set of formal rules ordering of the elements occurrences of elements ……... XML + MPEG-7 extensions

16 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 16 Why choose XML as the base for the DDL? The popularity of XML The interoperability with other standards in the future Why XML should be extended for MPEG-7? SGML > XML Structural extensions Datatype extensions XML: Base for DDL

17 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 17 DDL parser DDL parser is a software to check if a description is valid Description Parser Schema Yes or No

18 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 18 Outline of contents Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

19 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 19 Type of descriptions Low level description (features, etc) Generic and flexible Intelligent / efficient search engine High level description (structures, concepts,etc) Efficient and powerful Lack of flexibility

20 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 20 Low-level Description Information in the creation and production processes director, title, short feature movie Information related to the usage of the content copyright pointers, usage history, broadcast schedule Information on the storage features of the content storage format, encoding Information about low-level features in the content colors, textures, sound timbres, melody

21 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 21 High-level Description Structural description –video segments, frames, still and moving regions, audio segments –Segment DS (representing the spatial, temporal or spatio-temporal structure) Conceptual (semantic) description –objects, events, and notions –links of the two descriptions

22 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 22 Illustration of descriptions

23 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 23 Basic description Elements –Information containers –containing data and other elements – …… Attributes –Attribute-value pairs used to characterize elements – ……

24 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 24 Structured descriptions Structured descriptions are trees Trees are suitable for retrieval and search DS D D D DD

25 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 25 Description trees Mr Sen 16 rue Laplace Nancy Dear Mr White, … text name letter header address streetcity

26 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 26 Example: Audio description 1.0 The daily news

27 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 27 Outline of contents Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

28 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 28 Audio description Low-level Description –spectrum, parametric, and temporal features High-level Description –Audio signature Description Scheme –Instrument timbre Description Schemes –The melody Description Tools –Sound recognition and indexing Description Tools –Spoken Content Description Tools

29 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 29 Audio low-level descriptors Waveform Loudness Spectral basis Spectral envelope Spectral centroid Spectral spread Fundamental frequency Harmonicity Attack time

30 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 30 Audio descriptor: Basic Two basic audio Descriptors –AudioWaveform Descriptor describes the audio waveform envelope (minimum and maximum) –AudioPower Descriptor describes the temporally-smoothed instantaneous power

31 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 31 Audio descriptor: Basic Spectral AudioSpectrumEnvelope Descriptor –describes the short-term power spectrum AudioSpectrumCentroid Descriptor –describes the center of gravity of the log-frequency power spectrum AudioSpectrumSpread Descriptor –describing the second moment of the log-frequency power spectrum AudioSpectrumFlatness Descriptor –describes the flatness properties of the spectrum

32 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 32 Audio Signature Description AudioSignature Description Scheme provides a unique content identifier for the purpose of robust automatic identification of audio signals Applications include –audio fingerprinting –identification of audio –locating metadata for legacy audio content

33 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 33 Instrument Timbre Description Timbre is defined as the perceptual features that make two sounds having the same pitch and loudness sound different. Timbre Description describes the perceptual features with a reduced set of Descriptors –HarmonicInstrumentTimbre Descriptor –LogAttackTime Descriptor –PercussiveIinstrumentTimbre Descriptor –Combination with Basic Spectral Descriptors

34 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 34 Melody Description Tools The melody Description Tools is to facilitate efficient, robust, and expressive melodic similarity matching MelodyContour Description Scheme –5-step contour representation –basic rhythmic information representation MelodySequence Description Scheme –supporting an expanded descriptor set and high precision of interval encoding

35 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 35 General Sound Recognition and Indexing Description Tools SoundModel (SM) DS –statistical model, such as HMM or GMM –SoundModelStatePath Descriptor consists of a state sequence generated by a SM –SoundModelStateHistogram Descriptor consists of a normalized histogram of the state sequence generated by a SM given an audio segment SoundClassificationModel DS –a trainable multi-way classifier based on SMs speech vs music, male vs female, trumpet vs violin genre classification, voice recognition

36 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 36 Spoken content retrieval Output of ASR –phone lattice or word lattice –spoken content DS stores these lattices instead of plain text –lattices are good for retrieval

37 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 37 Spoken Content Description Tools SpokenContentLattice –representing the actual decoding produced by an ASR engine SpokenContentHeader –contains information about the speakers being recognized and the recognizer itself –WordLexicon Descriptor –PhoneLexicon Descriptor –SpeakerInfo Descriptor –ConfusionInfo Descriptor

38 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 38 Gaussian DS

39 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 39 State-transition model DS

40 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 40 ProbabilityModelClassier DS

41 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 41 SpokenContentLattice DS A lattice structure for an hypothetical (combined phone and word) decoding of the expression “Taj Mahal drawing …”.

42 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 42 Extraction of sound indexes using a sound-recognition classifier. The model reference and state path is stored.

43 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 43 Query-by-example application with a query in media source form. Features must be extracted and projected into the classification space for each model in order to match against the database.

44 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 44 An example search application utilizing a query in DDL format

45 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 45 Extraction of hidden Markov model and basis functions and storage in a DDL representation

46 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 46 Scenario for for the spoken content Description Tools Recall of AV data by memorable spoken events –A film or video recording where a character or person spoke a particular word or sequence of words. The source media would be known, and the query would return a position in the media. Spoken Document Retrieval –There is a database consisting of separate spoken documents. The result of the query is the relevant documents, and optionally the position in those documents of the matched speech Annotated Media Retrieval –Similar to spoken document retrieval. The result of the query is the media which is annotated with speech, and not the speech itself. An example is a photograph retrieved using a spoken annotation.

47 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 47 Outline of contents Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

48 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 48 Multimedia DSs Basic Elements Content Management Content Description Content Organization Navigation and Access User Interaction Multimedia Description Schemes are metadata structures for describing and annotating audio-visual (AV) content

49 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 49 Organization of Multimedia DSs

50 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 50 Content Management Creation and production information –Creation information title, textual annotation, creators, and dates –Classification information genre, subject, purpose, language Media coding, storage and file formats –format, compression, and coding Content usage –usage rights, usage record

51 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 51 Navigation and Access Summaries –hierarchical summaries –sequential summaries Partitions and Decompositions –decompositions in space, time and frequency –used in multi-resolution access and progressive retrieval Variations –selection of the most suitable of an AV program –adapt to the different capabilities of terminal devices, network conditions or user preferences

52 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 52 Hierarchical summary

53 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 53 Illustration of variations

54 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 54 Content Organization Collections –group the contents into clusters –describes statistics and models of the attribute values –describe relationships among collection clusters Models –model the attributes and features of AV content –Probability Model specify statistical functions and structures –Analytic Model specify semantic labels specify the confidence build classifiers

55 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 55 Collection Structure

56 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 56 User Interaction User Preference –context dependency in terms of time and place –relative importance of different preferences –privacy characteristics of the preferences –preferences update by agent or user Usage History –history of actions –used to determine the user's preferences

57 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 57 Outline of contents Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

58 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 58 eXperimentation Model(XM) Simulation platform for: Ds, DSs, CSs, DDL XM applications: the server (extraction) applications the client (search, filtering and/or transcoding) applications CS: Coding Schemes

59 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 59 The XM applications Extraction from Media all low-level Ds or DSs should have an application class of this type Search & Retrieval Application either client application Media Transcoding Application either client application Description Filtering Application either client application

60 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 60 Extraction from Media

61 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 61 Search and retrieval application

62 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 62 Media transcoding application

63 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 63 Description Filtering Application

64 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 64 Interface model for XM app

65 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 65 Real world application MDB = media database, DDB = description database. First, from a media database two features are extracted. Then, basing on the first feature, relevant media files are selected from the media database. The relevant media files are transcoded basing on the second extracted feature.

66 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 66 Storage and retrieval of audiovisual databases (image, film, radio archives) Broadcast media selection (radio, TV programs) Surveillance (traffic control, surface transportation, production chains) E-commerce and Tele-shopping (searching for clothes / patterns) Remote sensing (cartography, ecology, natural resources management) Entertainment (searching for a game, for a karaoke) Cultural services (museums, art galleries) Journalism (searching for events, persons) Personalized news service on Internet (push media filtering) Intelligent multimedia presentations Educational applications nBio-medical applications MPEG-7 application areas

67 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 67 Illustration of applications Users

68 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 68 Information Flow Feature extraction Transmission Storage AV Description Search/query Browse Filter Users Pull Push Manual/automatic Decoding Encoding

69 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 69 Push and Pull applications Push applications –Example: Search engines for internet and DBs –Advantage: Many search engines work on standardized descriptions Pull applications –Example: Broadcast of video, Interactive TV –Advantage: Intelligent agents filter standardized descriptions

70 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 70 Example: Pull application MPEG-7Database

71 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 71 Example: Push application

72 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 72 Example: queries Text (keywords): –Find AV material with subject corresponding to some keywords Semantic description: –Find AV material corresponding to a specified semantic Image as an example: –Find an image with similar characteristics (global or local) A few notes of music: –Find corresponding musical pieces or movies Low level features (example: motion): –Find video with specific object motion trajectories

73 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 73 Integration of MPEG-7 into XML Fernado Morientes Spain vs. Sweden soccer match

74 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 74 Outline of contents Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information

75 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 75 MPEG-7 and other Standards MPEG-1, -2, and -4 are designed to represent the information itself, while MPEG-7 is meant to represent information about the information. MPEG-1, -2, and -4 make content available, while MPEG-7 allows you to find the content you need.

76 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 76 Ultimate ambition of MPEG-7 To make the web as searchable for multimedia content as it is searchable for text today To improve the use of computer systems as easy as possible

77 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 77 MPEG-7 beyond To mould computers around human requirements and not humans around computer requirements To enable content disclosure based on facts, rather than on human annotations To find information by rich spoken queries, hand- drawn images and address what most people expect computers to be able to do

78 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 78 More Information on WWW Major MPEG-7 documents http://www.cselt.it/mpeg/, semi-official website http://www.mpeg-7.com, official website Others http://www.elsevier.com/locate/image

79 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 79 Conclusion AV contents Structures Features Ds DSs DDLDs, DSs User

80 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 80 Thanks

81 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 81

82 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 82 Low level AV descriptors Video segments Color Camera motion Motion activity Mosaic Moving regions Color Motion trajectory Parametric motion Spatio-temporal shape Still regions Color Shape Position Texture Audio segments Spoken content Spectral feature Timbre

83 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 83 Face Recognition Descriptor Projection of a face vector onto a set of basis vectors (face patterns) Feature set is extracted from a normalized face image Normalized face image –56 lines with 46 intensity values in each line –The centers of the two eyes are located on the 24th row and the 16th and 31st column for the right and left eye respectively

84 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 84 Segment Decomposition

85 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 85 MPEG-7 Normative Interfaces

86 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 86 Example: Content description MPEG-7Database Indexing Fea extrac Searchretrieval High level process Low level process

87 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 87 Segment DS Segment DS describes the result of a spatial, temporal, or spatio-temporal partitioning of the AV content. It has nine major subclasses: Multimedia Segment DS AudioVisual Region DS AudioVisual Segment DS Audio Segment DS Still Region DS Still Region 3D DS Moving Region DS Video Segment DS Ink Segment DS

88 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 88 Examples: T/S segments

89 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 89 Example: Segment trees

90 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 90 Illus of conceptual description Object DS Event DS Concept DS Semantic state DS Semantic place DS Semantic time DSAV content Semantic DS Semantic container DS Semantic base DS

91 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 91 Visual description Basic structures –Grid layout, Time series, Multiple view, Spatial 2D coordinates, Temporal interpolation Descriptors –Color, Texture, Shape, Motion, Localization

92 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 92 Example: Color Descriptors Color space Color Quantization Dominant Colors Scalable Color Color Layout Color-Structure GoF/GoP Color

93 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 93 Example: Color space R,G,B Y,Cr,Cb H,S,V HMMD Linear transformation matrix with reference to R, G, B Monochrome

94 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 94 Audio Framework

95 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 95 Descriptor Definition A Descriptor (D) is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation. Notes A descriptor allows an evaluation of the corresponding feature via the descriptor value. It is possible to have several descriptors representing a single feature. Examples For example for the color feature, possible descriptors are: the color histogram, the average of the frequency components, the motion field, the text of the title, etc.

96 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 96 Descriptor Value Definition A Descriptor Value is an instantiation of a Descriptor for a given data set (or subset thereof). Notes Descriptor Values are combined via the mechanism of a Description Scheme to form a Description.

97 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 97 Description Scheme Definition A Description Scheme (DS) specifies the structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes. Examples A movie, structured as scenes and shots, including some textual descriptors at the scene level, and color, motion and some audio descriptors at the shot level. Note Ds contain only basic data types, and does not refer to others D or DSs.

98 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 98 DS: XML Scheme & Extensions XML Scheme Data types Simple and Complex types Elements Inheritance, Abstract types MPEG-7 extensions Array and Matrix datatype Enumerated datatypes for MimeType, CountryCode, RegionCode, CurrencyCode and CharacterSetCode Typed references

99 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 99 Basic elements of DS Constructs for linking media files Localizing pieces of content Describing –time, places, persons, individuals, groups, organizations, and textual annotation, etc –Who? What object? What action? Where? When? Why? and How?

100 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 100 Content recognition tools No speech or face or gesture recognition engines included in MPEG-7 Content recognition tools is a task for industries, not a standard –coding tools in MPEG-1, -2, -4 were for research purposes, not part of the standard –no tools were part of the MPEG standard

101 Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document 101


Download ppt "Chinese Academy of Sciences, Beijing, China Speech and Language Processing Techniques Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA."

Similar presentations


Ads by Google