ACM Multimedia Conference 2006

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

QR Code Recognition Based On Image Processing

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.

Computer Vision Lecture 16: Texture

Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,

Chapter 5 Raster –based algorithms in CAC. 5.1 area filling algorithm 5.2 distance transformation graph and skeleton graph generation algorithm 5.3 convolution.

Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.

Robust video fingerprinting system Daniel Luis

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 

ICME 2008 Huiying Liu, Shuqiang Jiang, Qingming Huang, Changsheng Xu.

Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.

Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.

ADVISE: Advanced Digital Video Information Segmentation Engine

Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.

Segmentation Divide the image into segments. Each segment:

Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,

Creating a MagicInfo Pro Screen Template

Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.

A Fast and Robust Fingertips Tracking Algorithm for Vision-Based Multi-touch Interaction Qunqun Xie, Guoyuan Liang, Cheng Tang, and Xinyu Wu th.

Computer vision.

What’s Making That Sound ?

Information Extraction from Cricket Videos Syed Ahsan Ishtiaque Kumar Srijan.

Multimedia Databases (MMDB)

A Generic Virtual Content Insertion System Based on Visual Attention Analysis H. Liu 1, 2, S. Jiang 1, Q. Huang 1, 2, C. Xu 2, 3 1 Institute of Computing.

Multiscale Moment-Based Painterly Rendering Diego Nehab and Luiz Velho

Multimodal Information Analysis for Emotion Recognition

HP-PURDUE-CONFIDENTIAL Final Exam May 16th 2008 Slide No.1 Outline Motivations Analytical Model of Skew Effect and its Compensation in Banding and MTF.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Understanding The Semantics of Media Chapter 8 Camilo A. Celis.

An MPEG-7 Based Content- aware Album System for Consumer Photographs 2003/12/18 Chen-Hsiu Huang, Chih-Hao Shen, Chun-Hsiang Huang and Ja-Ling Wu Communication.

Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.

Action as Space-Time Shapes

Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )

Creating Better Thumbnails Chris Waclawik. Project Motivation Thumbnails used to quickly select a specific a specific image from a set (when lacking appropriate.

Using Cross-Media Correlation for Scene Detection in Travel Videos.

An MPEG-7 Based Semantic Album for Home Entertainment Presented by Chen-hsiu Huang 2003/08/12 Presented by Chen-hsiu Huang 2003/08/12.

POSTER TEMPLATE BY: Background Objectives Psychophysical Experiment Smoothness Features Project Pipeline and outlines The purpose.

MultiModality Registration Using Hilbert-Schmidt Estimators By: Srinivas Peddi Computer Integrated Surgery II April 6 th, 2001.

Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.

Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.

Tae Young Kim and Myung jin Choi

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

DIGITAL SIGNAL PROCESSING

User-Oriented Approach in Spatial and Temporal Domain Video Coding

Feature description and matching

Fitting Curve Models to Edges

Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.

Image Segmentation Techniques

Hu Li Moments for Low Resolution Thermal Face Recognition

Ying Dai Faculty of software and information science,

Do humans beat computers at pattern recognition? Andra Miloiu Costina

○　Hisashi Shimosaka (Doshisha University)

Ying Dai Faculty of software and information science,

Ying Dai Faculty of software and information science,

Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1 Yuzhi Chen, Eyal Seidemann Neuron Volume.

EE 492 ENGINEERING PROJECT

Ying Dai Faculty of software and information science,

Charts A chart is a graphic or visual representation of data

Donghui Zhang, Tian Xia Northeastern University

Measuring the Similarity of Rhythmic Patterns

Introduction to Artificial Intelligence Lecture 22: Computer Vision II

Presentation transcript:

ACM Multimedia Conference 2006 Tiling Slideshow Jun-Cheng Chen, Wei-Ta Chu, Jin-Hau Kuo, Chung-Yi Weng, and Ja-Ling Wu wtchu@cmlab.csie.ntu.edu.tw National Taiwan University Good morning, ladies and gentlemen. It’s my great honor to present our work here. I am Wei-Ta Chu, coming from Taiwan. The topic of this paper is Tiling slideshow. We try to propose a new kind of audiovisual presentation for consumer photos. Tiling slideshow lively presents organized photos with music beats, and significantly enhances the viewing experience. 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 Motivation Large amounts of unorganized photos burden information access Conventional methods Content-based image retrieval Digital photo album Photo browsers Issues Organized presentation Presenting one-by-one makes users boring Photo quality We know that large amounts of photos can be easily created by digital cameras. People are used to take photos at will, and there may be thousands of photos after a trip. Numerous unorganized photos often burden information access. Recently, many methods have been proposed to facilitate photo accessing or browsing, like content-based image retrieval, photo album, or browser. However, many issues have not been discussed and addressed in an integrated system. For example, conventional photo album doesn’t take care of how to systematically present photos. If you have many photos, conventional slideshow just presents photo one-by-one, and may make users boring. Moreover, few systems are specially designed for consumer photos, and the photo quality issue is not considered. 2006/10/24 ACM Multimedia Conference 2006

Goal Automatically generate well-organized and lively photo presentation Generator Tiling slideshow In our work, we try to develop a system to automatically generate well-organized and lively photo presentation. Given a set of photos and a user-selected music, this system performs visual and music analysis, and generates an audiovisual presentation.

ACM Multimedia Conference 2006 Photographic Story Paragraph: describe by text Contains a topic sentence and several supportive sentences. Tiling slideshow: describe by photos Contains a topic photo and several supportive photos Topic photo Supportive photos 2006/10/24 ACM Multimedia Conference 2006

The Proposed Slideshow Music Beats Here is an example of the proposed slideshow. Photos that have similar characteristics would be put at the same frame to emphasize the visual perception. Driven by music beat information, photos that are differentiated as different sizes are displayed sequentially. Moreover, displaying with music even enhances the viewing experience. The presentation is like to stick vary-size tiles on a wall, and that’s why we call it tiling slideshow. 1 2 3 4 5 6 7 8 Time 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 Outline System Overview Visual Processing Music Analysis Tiling Slideshow Composition Evaluation Conclusion Based on this idea, we would like to give you the system overview first. Then visual processing and music analysis are described. The most important component of this work is the composition process. After that, some evaluation and conclusion will be given. 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 System Overview Photos Music Orientation cor. Blur detection Overexposure underexposure detection Preprocessing Beat detection Time-based & Content-based clustering Clustering Temp. selection ROI detection Temporal & spatial composition For photos, we first perform some preprocess, such as orientation correction, blur detection, overexposure and underexposure detection, to filter out ill-quality photos. For the remaining photos, time-based and content-based clustering are applied. The photos that are categorized in the same cluster would be displayed at the same frame. For music, we perform beat detection, and accordingly segment music into smaller pieces. Finally, we address temporal and spatial composition issues, and generate the final result. Composition Tiling slideshow 2006/10/24 ACM Multimedia Conference 2006

Photo Processing Orientation correction Photo Filtering EXIF (Exchangeable Image File Format) metadata Photo Filtering Blur detection Check edge information in diff. resolutions Overexposure/Underexposure detection Check intensity information of each photo In photo preprocessing, we first correct the orientation of photos. This information can be extracted from the EXIF metadata that is embedded in the file header. And we can correct it accordingly. To filter out ill-quality photos, we check edge information in different resolutions and estimate whether a photo is blurred or not. On the other hand, we check the intensity information and filter out photos that are largely occupied by very bright or dark pixels. Blurred photo Underexposure photo

Time-based Clustering Check the time gap between adjacent photos 15 sec 47 sec 30 sec 7hr After photo filtering, we try to organize photos in terms of time and content characteristics. The time gap between adjacent photos are calculated. The photos that are taken near to each other are clustered together. If there is a large time gap between two consecutive photos, they should be categorized into different ones. We apply a sliding window on photo sequences, and check whether the local maximum exceeds an adaptive threshold. 2006/10/24 ACM Multimedia Conference 2006

Content-based Clustering (1/3) Given a time-based photo cluster, finer clustering is performed based on content-based features. (dominant color and color layout) Within-cluster distance: Between-cluster distance: d(.) is the average of normalized dominant color and color layout distances. After time-based clustering, a cluster may consist of more than twenty or thirty photos. We try to perform finer organization based on content-based features. Given a time-based cluster, we would like to try different clustering cases and find the best one. Basically, we prefer that photos in the same cluster are similar, and photos in different clusters are distinct as much as possible. Therefore, we define the within-cluster distance and between cluster distance. The goodness of a clustering case is measured by between-cluster distance over within-cluster distance. Larger value means better clustering. Goodness of a clustering case: 2006/10/24 ACM Multimedia Conference 2006

Content-based Clustering (2/3) case 1 Clustering case 2 Let me show you an example. Given a set of photos, we can categorize them into two clusters like case 1. On the other hand, we also can categorize them into three clusters. According to the goodness value defined above, we can evaluate different clustering cases and select the best one. 2006/10/24 ACM Multimedia Conference 2006

Content-based Clustering (3/3) Clustering Results Here is a real example. X-axis denotes different clustering cases, and y-axis denotes their goodness. We pick the case that have the largest goodness. From this example, we can see this case is better than others. Photos having similar characteristics are clustered together.

Music Analysis Beat detection Music segmentation For frame switching and photo displaying Frame 1 Frame 2 2 4 1 3 5 (4 seconds) r1 (6 seconds) Search range for frame switching r2 Sound Energy Difference For music signals, we first detect beat information and sound energy difference, which will be the basis for switching frames or displaying photos. If the first fame starts at time t1, we would like to search from t1 plus r1 to plus r2 to find appropriate timing for frame switching. The beat that corresponds to the largest energy difference would be our choice. From t1 to t4, we can averagely dispatch smaller pieces for displaying photos or find the beats that correspond to large energy difference. 1 Frame 1 starts t1 2 t2 3 t3 4 Frame 2 starts t4 5 t5 Music Beats

ACM Multimedia Conference 2006 Short Summary Photo Filter out defective photos Organize photos in terms of time and content characteristics Music Segment into smaller pieces Photos Music Preprocessing Beat detection Clustering For now, we have filtered out defective photos and organized them in terms of time and content characteristics. Moreover, music has been segmented into smaller pieces. In the following, we would like to put the photos in the same content-based cluster into the same frame, which will be displayed within the time duration of a music segment. Composition Tiling slideshow 2006/10/24 ACM Multimedia Conference 2006

Tiling Slideshow Composition Problem 1 Given a time-limited music clip, only a subset of photo clusters can be displayed. Problem 2 For a cluster of photos to be displayed, more important photos should occupy larger space. Problem 3 Photos should be smartly manipulated to fit in with the limited displaying space. We have to solve three problems to achieve the final results. First, the user-selected music is time-limited, and we are not able to present all the hundreds or thousands of photos. Only part of photo clusters would be selected for display. Second, for a cluster of photos selected to be presented, more important photos should be allocated larger space. Photos at the same frame should be differentiated. The third problem is: Because the displaying space is also limited, smart manipulation such as cropping or resizing should be performed. 2006/10/24 ACM Multimedia Conference 2006

Cluster Selection (for Problem 1) Cluster-based importance Defined based on “photo per minute (PPM)” and “photo conformance (PC)” For each content-based cluster Cg in a time-based cluster ─ Shooting frequency ─ Opposite to within-cluster distance For problem 1, we define the cluster-based importance to be the metric for cluster selection. It is defined by photo per minute and photo conformance. Basically, PPM denotes shooting frequency and PC is opposite to the within-cluster distance. The photo cluster that has high shooting frequency and large photo similarity would be preferred in cluster selection. ─ Nonlinear fusion scheme 2006/10/24 ACM Multimedia Conference 2006

Template Determination (for Problem 2) Templates importance ─ Template importance vector 3-cell Template Topic cell For problem 2, templates with different sizes of cells are designed. Here show some templates for three and four photos. Intuitively, if the content-based cluster contains four photos, we just select the template with four cells. To further differentiate the photos at the same frame, we prefer to match the photo importance with template importance and generate more elaborate layout. We first define template importance, which is the ratio of a cell to the whole frame. After sorting, the template importance vector is packed by individual cells. Supportive cell 4-cell Template Topic cell Topic cell 2006/10/24 ACM Multimedia Conference 2006

Template Determination (for Problem 2) Photo-based importance Defined based on “face region (FR)” and “attention value (AV)” Correspondingly, we define photo-based importance, based on face region and attention value. Then, the photo importance vector is packed by individual photo. ─ Photo importance vector 2006/10/24 ACM Multimedia Conference 2006

Template Determination (for Problem 2) Find the most matching between template importance and photo importance Find the minimum included angle between them To find the best match between photos and various templates, we just calculate the included angle between them and find the minimum one. The minimal included angle means that the importance distributions between them is the most similar. Because the importance values are sorted, which photo should be put into which cell has been determined after this process. 2006/10/24 ACM Multimedia Conference 2006

Composition (for Problem 3) Find the region that conveys most “content value” and conforms to the aspect ratio of the targeted cell. Top-down case: (photo with face) Bottom-up case: (photo without face) Finally, we have to stick photos on the targeted cells. Instead of directly resizing the whole photo to fit in with the targeted cell, we would like to find a region that conveys most content values and conforms to the aspect ratio of the targeted cell. According to whether the photo has face or not, we first find the region-of-interest from two perspectives. In the top-down case, the selected region starts from centroid of face region. In the bottom-up case, the selected region starts from centroid of ROI. 2006/10/24 ACM Multimedia Conference 2006

Composition (for Problem 3) 1. Find ROI 2. Extend 480 pixels 720 pixels 3. Crop 4. Resize Let’s see a real example in composition. After determining the targeted cell of each photo, we first find the ROI from top-down or bottom-up perspectives. Then the regions are extended to the boundary of photo, according the aspect ratio of its targeted cell. The selected regions are then cropped out and resized to fit in with the targeted cell.

ACM Multimedia Conference 2006 Demo 2006/10/24 ACM Multimedia Conference 2006

Evaluation Data set Data Set 1: Data Set 2: Data Set 3: 780 photos Music: 3m31s 522 photos Music: 4m38s 1257 photos Music: 4m06s Osaka, Kyoto, Kobe, Nagoya, Tokyo (Japan) Melbourne, Brisbane (Australia) Amsterdam (Netherlands) Osaka, Kyoto, Kobe (Japan) We evaluate the performance based on three different photo sets. They are all taken by amateurs in different trips, and different lengths of music are selected to generate tiling slideshows. 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 User Study Compare the satisfaction of ACDSee, PhotoStory, and Tiling slideshow Questionnaire Q1: How do you feel the photo variety in a time unit? Q2: Do you think it's a funny presentation? Q3: Do you think the sequence helps you experience this trip? Q4: Are you willing to use it to generate your own slideshow? Q5: How do you feel the audiovisual effects of this slideshow? Because viewing experience is hard to be quantified, a user study was conducted in this paper. We compare the satisfaction of ACDSee, Microsoft PhotoStory, and Tiling slideshow. Five questions are asked to twenty-seven evaluators, including how do you feel the photo variety in a time unit, do you think it’s a funny presentation, do you think the sequence helps you experience this trip, are you willing to use it to generate your own slideshow, and how do you feel the audiovisual effects of this slideshow. 2006/10/24 ACM Multimedia Conference 2006

Subjective Scores Questions Sequence 1 Sequence 2 Sequence 3 Because ACDSee only provides the most ordinary slideshow, we set its scores as five to be the baseline. We can see that tiling slideshow has better subjective acceptance than others. The performance difference between PhotoStory and tiling slideshow is slightly smaller for question 5. We guess it’s because the beat detection is still not perfect, and sometimes the coordination between music beat and photo presentation is not good enough.

ACM Multimedia Conference 2006 Objective Tests (1/2) Clustering performance evaluation #frames # photos # frame with clustering error Avg. number of photos in a frame Slideshow 1 37 127 1 3.43 Slideshow 2 48 172 3.58 Slideshow 3 43 184 2 4.28 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 Objective Tests (2/2) Cropping performance evaluation # photos # ill-cropped photos # ill-cropped photos in topic cell Slideshow 1 127 5 1 Slideshow 2 172 Slideshow 3 184 6 3 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 Summary We propose a new type of audiovisual presentation for consumer photos. Perform both visual and music analysis for organized presentation. We deal with issues on content selection and smart manipulation to display qualified content in limited time and limited space. Semantic features or user intervention can be added to enhance the performance. 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 Backup Slides 2006/10/24 ACM Multimedia Conference 2006

System Overview Tiling slideshow Quality Estimation Photo Filtering Content-based Clustering Time-based Clustering Beat Detection Tiling Slideshow Composition Photos Music Orientation Correction ROI Determination Preprocess Analysis Composition Music Segmentation Tiling slideshow

An EXIF Example File name : IMG_1770.JPG File size : 2062120 bytes File date : 2005:11:16 10:04:20 Camera make : Canon Camera model : Canon PowerShot S60 Date/Time : 2005:11:16 10:04:21 Resolution : 2592 x 1944 Orientation : rotate 90 Flash used : No (auto) Focal length : 5.8mm (35mm equivalent: 29mm) CCD width : 7.19mm Exposure time: 0.0100 s (1/100) Aperture : f/2.8 Whitebalance : Auto Metering Mode: matrix

Blur Detection

Blur Detection + H. Tong, M. Li, H.-J. Zhang, and C. Zhang, “Blue detection for digital images using wavelet transform,” Proc. of ICME, pp. 17-20, 2004.

Time-based Clustering Adaptive threshold clustering algorithm gi is the time gap between photo i and photo i+1 K is a suitable threshold (K=log(17)) d is the size of sliding windows (d = 5) 11 time gaps g1 g2 gN J.C. Platt, M. Czerwinski, and B.A. Field, “PhotoTOC: automating clustering for browsing personal photographs,” Proc. of PCM, pp. 6-10, 2003. 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 (b) 2006/10/24 ACM Multimedia Conference 2006

Beat Detection Music Signal Frequency Filterbank . . . . Envelope Extractor Envelope Extractor . . . . First-Order Differentiator First-Order Differentiator . . . . Half-wave Rectifier Half-wave Rectifier . . . . Comb Filterbank Comb Filterbank . . . . . . . . Energy Energy Energy Energy . . . . ∑ ∑ Beat Peak Picking E.D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” Journal of Acoustical Society of America, vol. 103, no. 1, pp. 588-601, 1998.

Tiling Slideshow Composition Cluster Selection Cluster-based importance Template Determination Photo-based importance Spatial Composition Smart cropping Temporal Composition 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 ROI Determination Top-Down Attention Detection Face detection 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 ROI Determination Bottom-up Attention Detection Salience Map Generation Attentive Center and Region Extraction 2006/10/24 ACM Multimedia Conference 2006

Composition (for Problem 3) Region selection C(Ri) = content value of the region Ri Top-down case: Bottom-up case: IMP(x,y): Applying a 2D Gaussian to the point (x,y), which is the centroid of face region or saliency map. 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 User Study 2 Evaluate the performances in terms of content-based clustering and template determination. Questionnaire Q6: How do you feel the visual coherence of photos in the same frame? Q7: How do you feel the layout of display? 2006/10/24 ACM Multimedia Conference 2006

ACM Multimedia Conference 2006 User Study 2 Question 6 Question 7 2006/10/24 ACM Multimedia Conference 2006