B. Prabhakaran1 Multimedia Metadata Multimedia information needs to be “interpreted” Popular example: “A picture is worth thousand words” Who will “write”

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

Applications of one-class classification
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
電腦視覺 Computer and Robot Vision I
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 4 – Digital Image Representation Klara Nahrstedt Spring 2009.
Learning Techniques for Video Shot Detection Under the guidance of Prof. Sharat Chandran by M. Nithya.
Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.
Automated Shot Boundary Detection in VIRS DJ Park Computer Science Department The University of Iowa.
電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.
COLORCOLOR A SET OF CODES GENERATED BY THE BRAİN How do you quantify? How do you use?
Content-based Video Indexing and Retrieval
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
A presentation by Modupe Omueti For CMPT 820:Multimedia Systems
Discussion on Video Analysis and Extraction, MPEG-4 and MPEG-7 Encoding and Decoding in Java, Java 3D, or OpenGL Presented by: Emmanuel Velasco City College.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Golnaz Abdollahian, Cuneyt M. Taskiran, Zygmunt Pizlo, and Edward J. Delp C AMERA M OTION -B ASED A NALYSIS OF U SER G ENERATED V IDEO IEEE TRANSACTIONS.
EE 7730 Image Segmentation.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Text Detection in Video Min Cai Background  Video OCR: Text detection, extraction and recognition  Detection Target: Artificial text  Text.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Chapter 2: Pattern Recognition
SWE 423: Multimedia Systems
Processing Digital Images. Filtering Analysis –Recognition Transmission.
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
E.G.M. PetrakisVideo Processing1  Video is a rich information source  frames (individual images)  links between frames (cuts, fades, dissolves)  changes.
1 Motion in 2D image sequences Definitely used in human vision Object detection and tracking Navigation and obstacle avoidance Analysis of actions or.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Image and Video Compression
CS 376b Introduction to Computer Vision 04 / 02 / 2008 Instructor: Michael Eckmann.
TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.
Multimedia and Time-series Data
Information Extraction from Cricket Videos Syed Ahsan Ishtiaque Kumar Srijan.
The MPEG-7 Color Descriptors
Multimedia Databases (MMDB)
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Multimedia Information Retrieval
OBJECT RECOGNITION. The next step in Robot Vision is the Object Recognition. This problem is accomplished using the extracted feature information. The.
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
Image Retrieval Part I (Introduction). 2 Image Understanding Functions Image indexing similarity matching image retrieval (content-based method)
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Digital Image Processing Lecture 16: Segmentation: Detection of Discontinuities Prof. Charlene Tsai.
Autonomous Robots Vision © Manfred Huber 2014.
Content-Based Image Retrieval QBIC Homepage The State Hermitage Museum db2www/qbicSearch.mac/qbic?selLang=English.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
Digital Image Processing Lecture 16: Segmentation: Detection of Discontinuities May 2, 2005 Prof. Charlene Tsai.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 4 – Audio and Digital Image Representation Klara Nahrstedt Spring 2010.
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
VISUAL INFORMATION RETRIEVAL Presented by Dipti Vaidya.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Automatic Caption Localization in Compressed Video By Yu Zhong, Hongjiang Zhang, and Anil K. Jain, Fellow, IEEE IEEE Transactions on Pattern Analysis and.
Motion Estimation Multimedia Systems and Standards S2 IF Telkom University.
Visual Information Retrieval
- photometric aspects of image formation gray level images
Automatic Video Shot Detection from MPEG Bit Stream
Data Compression.
Data Compression.
Image Segmentation Techniques
Digital Image Processing
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
Presentation transcript:

B. Prabhakaran1 Multimedia Metadata Multimedia information needs to be “interpreted” Popular example: “A picture is worth thousand words” Who will “write” those thousand words? (Metadata generation) Will you “accept” those thousand words or will you write another thousand words? (Personalization) Where will you store those words and how will you use them? (Storage, organization, and retrieval) Need for analyzing media information, (semi-)automatically Some info are media “independent” E.g., author of media, date & time of creation, etc.

B. Prabhakaran2 Text Metadata: Example XML (eXtensible Markup Language) DTD (Document Type Definition) providing metadata:

B. Prabhakaran3 Speech Metadata

B. Prabhakaran4 Speech Pattern Matching Comparison of a speech sample with a reference template simple If compared directly, by summing the distances between respective speech frames. Summation provides overall distance measure of similarity Complicated by non-linear variations in timing produced from utterance to utterance. Results in misalignment of the frames of the spoken word with those in the reference template.

B. Prabhakaran5 Dynamic Time Warping Template can be stretched or compressed at appropriate places to find an optimum match. Dynamic programming used to find the best “warp” that minimizes the sum of distances in the template comparison.

B. Prabhakaran6 HMM Example Hidden Markov Models (HMMs) have underlying stochastic finite state machines (FSMs) Stochastic state models are defined by : A set of states An output alphabet A set of transition and output probabilities

B. Prabhakaran7 HMM.. Term hidden for this model: due to fact that the actual state of the FSM cannot be observed directly, only through the alphabets emitted. Probability distribution can be discrete or continuous. For isolated word recognition, each word in the vocabulary has a corresponding HMM. For continuous speech recognition, the HMM represents the domain grammar. This grammar HMM is constructed from word-model HMMs. HMMs have to be trained to recognize isolated words or continuous speech.

B. Prabhakaran8 Artificial Neural Networks

B. Prabhakaran9 Image Metadata Concept of Scene and Objects Scene: collection of zero or more objects Object: part of a scene E.g., car is an object in a traffic scene Both scene and objects have visual features: color, texture, shape, location, and relationships among edges (above, below, behind, …) Question: how can the visual features be (semi-) automatically detected?

B. Prabhakaran10 Color Histograms

B. Prabhakaran11 Image Color Color distribution: histogram of intensity values (color “bins” in X-axis and intensity in Y-axis) (“Good” property) Histograms: invariant to translation, rotation of scenes/objects. (“Bad” property) Only “global” property, does not describe specific objects, shapes, etc. Let Q and I be 2 histograms, having N bins in the same order Intersection of the 2 histograms: ∑ i =1 N min(I i, Q i ) Normalized intersection ∑ i =1 N min(I i, Q i ) / ∑ i =1 N Q i Varies from 0 to 1. Insensitive to changes in image resolution, histogram size, depth, and view point.

B. Prabhakaran12 Color Histograms Computationally expensive: O(NM), N: no. of histogram bins; M: no. of images. Reducing search time: reduce N ? “Cluster” the colors? Overcoming “global” property: divide image into sub- areas; calculate histogram for each of those sub- areas. Increases storage requirements and comparison time (or number of comparisons)

B. Prabhakaran13 Image Texture Contrast: Statistical distribution of pixel intensities Coarseness: Measure of “granularity” of texture Based on “moving” averages: over a window of different sizes, 2 k X 2 k, 0 < k < 5. “Moving” average: computing “gray-level” at a pixel location Carrying out the computation in horizontal and vertical directions. Directionality: “Gradient” vector is calculated at each pixel Magnitude and angle of this vector can be computed for a “small” array centered on a pixel. Computation for all pixels yields a histogram Flat histogram: non-directional image; Sharp peaks in histograms: highly directional images.

B. Prabhakaran14 Image Segmentation Thresholding Technique: All pixels with gray level at or above a threshold are assigned to object. Pixels below the threshold fall outside the object. Threshold value influences the boundary position as well as the overall size of the object. Region Growing Technique: Proceeds as though the interior of the object grows until their borders correspond with the edges of the objects. Image divided into a set of tiny regions which may be single pixel or a set of pixels. Properties that distinguish the objects (e.g., gray levels, color or texture) are identified A value for these properties are assigned for each region.

B. Prabhakaran15 Face Recognition

B. Prabhakaran16 Video Metadata Generation Identify logical information units in the video Identify different types of video camera operations Identify the low-level image properties of the video (such as lighting) Identify the semantic properties of the parsed logical unit Identify objects and their properties (such as object motion) in the video frames

B. Prabhakaran17 Video Shots/Clips Logical unit of video information to be parsed automatically: camera shot or clip. Shot: a sequence of frames representing a contiguous action in time and space. Basic idea: frames on either side of a camera break shows a significant change in the information content. Algorithm needs a quantitative metric to capture the information content of a frame. Shot identification: whether the difference between the metrics of two consecutive video frames exceed a threshold. Gets complex when fancy video presentation techniques such as dissolve, wipe, fade-in or fade-out are used.

B. Prabhakaran18 Shot Identification Metrics Comparison of corresponding pixels or blocks in the frame Comparison of histograms based on color or gray-level

B. Prabhakaran19 Shot Identification Metrics Comparison of histograms based on color or gray-level

B. Prabhakaran20 Algorithms for Uncompressed Video Histogram Based Algorithm Extracted color features of a video frame are stored in the form of color bins with the histogram value indicating the percentage (or normalized population) of pixels that are most similar to a particular color. Each bin: typically a cube in the 3-dimensional color space (corresponding to the basic colors red, green, and blue). Any two points in the same bin represent the same color.

B. Prabhakaran21 Algorithms for Uncompressed Video …. Histogram Based Algorithm … Shot boundaries identified by comparing the following features between two video frames : gray level sums, gray level histograms, and color histograms. Here, video frames partitioned into sixteen windows and the corresponding windows in two frames are compared based on the above features. Division of frames helps in reducing errors due to object motion or camera movements. Does not consider gradual transition between shots.

B. Prabhakaran22 Algorithms for Compressed Video Motion JPEG Video Video frame grouped into data units of 8 * 8 pixels and a Discrete Cosine Transform (DCT) applied to these data units. DCT coefficients of each frame mathematically related to the spatial domain and hence represents the contents of the frames. Video shots in motion JPEG identified based on correlation between the DCT coefficients of video frames. Apply a skip factor to select the video frames to be compared Select regions in the selected frames. Decompress only the selected regions for further comparison

B. Prabhakaran23 Selective Video Decoder

B. Prabhakaran24 Algorithms for Compressed Video Frame selector uses a skip factor to determine the subsequent number of frames to be compared. Region selector employs a DCT coefficients based approach to identify the regions for decompression and for subsequent image processing. Algorithm adopts a multi-pass approach with the first approach isolating the regions of potential cut points. Frames that cannot be classified based on DCT coefficients comparison are decompressed for further examination by color histogram approach.

B. Prabhakaran25 MPEG Video Selection To achieve high rate of compression, redundant information in the subsequent frames are coded based on the information in the previous frames. Such frames are termed P and B frames. To provide fast random access, some of the frames are compressed independently, called I frames. MPEG video stream to be of the following sequence of frames : IBBPBBPBBIBBPBBP....

B. Prabhakaran26 Parsing MPEG Video A difference metric for comparison of DCT coefficients between video frames needed. Difference metric using the DCT coefficients can be applied only on the I frames of the MPEG video (since only those are coded with DCT coefficients). Motion information coded in the MPEG data can be used for parsing. Basic idea: B and P frames coded with motion vectors; the residual error after motion compensation is transformed and coded with DCT coefficients. Residual error rates are likely to be very high at shot boundaries.  Number of motion vectors in the B or P frame is likely to be very few.

B. Prabhakaran27 Parsing MPEG Video Motion information coded in the MPEG data can be used for parsing. Algorithm detects a shot boundary if the number of motion vectors are lower than a threshold value. Can lead to detection of false boundaries because a shot boundary can lie between two successive I frames. Advantage is that the processing overhead is reduced as number of I frames are relatively fewer. Algorithm also does partitioning of the video frames based on motion vectors.

B. Prabhakaran28 Shots With Transitions Hybrid approach of employing both the DCT coefficient based comparison and motion vector based comparison. First step: apply a DCT comparison to I frames with a large skip factor to detect regions of potential gradual transitions. In the second pass, the comparison is repeated with a smaller skip factor to identify shot boundaries that may lie in between. Then the motion vector based comparison is applied as another pass on the B and P frames of sequences containing potential breaks and transitions. This helps in refining and confirming the shot boundaries detected by the shot boundaries.

B. Prabhakaran29 Camera Operations, Object Motions Induce a specific pattern in the field of motion vectors. Panning and tilting (horizontal or vertical rotation) of the camera causes the presence of strong motion vectors corresponding to the direction of the camera movement. Disadvantage (for detection of pan and tilt operations): movement of a large object or a group of objects in the same direction can also result in a similar pattern for the motion vectors. Motion field of each frame can be divided into a number of macro blocks and then motion analysis can be applied to each block.

B. Prabhakaran30 Camera Operations, Object Motions.. If the direction of all the macro blocks agree, it is considered as arising due to camera operation (pan/tilt). Otherwise it is considered as arising due to object motion. Zoom operation: a focus center for motion vectors is created, resulting in the top and bottom vertical components of motion vectors with opposite signs. Similarly, leftmost and rightmost horizontal components of the motion vectors will have the opposite sign. This information is used for identification of zooming operation.

B. Prabhakaran31 3D Models: Shape-based Retrieval Arizona State University: 3dk.asu.edu CMU: amp.ece.cmu.edu/projects/3DModelRetrieval/ Drexel Univ: Heriot-Watt University: IBM: Informatics & Telematics Inst.: 3d-search.iti.gr/ National Research Council, Canada: Princeton: shape.cs.princeton.edu Purdue: tools.ecn.purdue.edu/~cise/dess.html

B. Prabhakaran32 Metadata & Multiple Media