Download presentation
Presentation is loading. Please wait.
1
ACM Multimedia Conference 2006
Tiling Slideshow Jun-Cheng Chen, Wei-Ta Chu, Jin-Hau Kuo, Chung-Yi Weng, and Ja-Ling Wu National Taiwan University Good morning, ladies and gentlemen. It’s my great honor to present our work here. I am Wei-Ta Chu, coming from Taiwan. The topic of this paper is Tiling slideshow. We try to propose a new kind of audiovisual presentation for consumer photos. Tiling slideshow lively presents organized photos with music beats, and significantly enhances the viewing experience. 2006/10/24 ACM Multimedia Conference 2006
2
ACM Multimedia Conference 2006
Motivation Large amounts of unorganized photos burden information access Conventional methods Content-based image retrieval Digital photo album Photo browsers Issues Organized presentation Presenting one-by-one makes users boring Photo quality We know that large amounts of photos can be easily created by digital cameras. People are used to take photos at will, and there may be thousands of photos after a trip. Numerous unorganized photos often burden information access. Recently, many methods have been proposed to facilitate photo accessing or browsing, like content-based image retrieval, photo album, or browser. However, many issues have not been discussed and addressed in an integrated system. For example, conventional photo album doesn’t take care of how to systematically present photos. If you have many photos, conventional slideshow just presents photo one-by-one, and may make users boring. Moreover, few systems are specially designed for consumer photos, and the photo quality issue is not considered. 2006/10/24 ACM Multimedia Conference 2006
3
Goal Automatically generate well-organized and lively photo presentation Generator Tiling slideshow In our work, we try to develop a system to automatically generate well-organized and lively photo presentation. Given a set of photos and a user-selected music, this system performs visual and music analysis, and generates an audiovisual presentation.
4
ACM Multimedia Conference 2006
Photographic Story Paragraph: describe by text Contains a topic sentence and several supportive sentences. Tiling slideshow: describe by photos Contains a topic photo and several supportive photos Topic photo Supportive photos 2006/10/24 ACM Multimedia Conference 2006
5
The Proposed Slideshow
Music Beats Here is an example of the proposed slideshow. Photos that have similar characteristics would be put at the same frame to emphasize the visual perception. Driven by music beat information, photos that are differentiated as different sizes are displayed sequentially. Moreover, displaying with music even enhances the viewing experience. The presentation is like to stick vary-size tiles on a wall, and that’s why we call it tiling slideshow. 1 2 3 4 5 6 7 8 Time 2006/10/24 ACM Multimedia Conference 2006
6
ACM Multimedia Conference 2006
Outline System Overview Visual Processing Music Analysis Tiling Slideshow Composition Evaluation Conclusion Based on this idea, we would like to give you the system overview first. Then visual processing and music analysis are described. The most important component of this work is the composition process. After that, some evaluation and conclusion will be given. 2006/10/24 ACM Multimedia Conference 2006
7
ACM Multimedia Conference 2006
System Overview Photos Music Orientation cor. Blur detection Overexposure underexposure detection Preprocessing Beat detection Time-based & Content-based clustering Clustering Temp. selection ROI detection Temporal & spatial composition For photos, we first perform some preprocess, such as orientation correction, blur detection, overexposure and underexposure detection, to filter out ill-quality photos. For the remaining photos, time-based and content-based clustering are applied. The photos that are categorized in the same cluster would be displayed at the same frame. For music, we perform beat detection, and accordingly segment music into smaller pieces. Finally, we address temporal and spatial composition issues, and generate the final result. Composition Tiling slideshow 2006/10/24 ACM Multimedia Conference 2006
8
Photo Processing Orientation correction Photo Filtering
EXIF (Exchangeable Image File Format) metadata Photo Filtering Blur detection Check edge information in diff. resolutions Overexposure/Underexposure detection Check intensity information of each photo In photo preprocessing, we first correct the orientation of photos. This information can be extracted from the EXIF metadata that is embedded in the file header. And we can correct it accordingly. To filter out ill-quality photos, we check edge information in different resolutions and estimate whether a photo is blurred or not. On the other hand, we check the intensity information and filter out photos that are largely occupied by very bright or dark pixels. Blurred photo Underexposure photo
9
Time-based Clustering
Check the time gap between adjacent photos 15 sec 47 sec 30 sec 7hr After photo filtering, we try to organize photos in terms of time and content characteristics. The time gap between adjacent photos are calculated. The photos that are taken near to each other are clustered together. If there is a large time gap between two consecutive photos, they should be categorized into different ones. We apply a sliding window on photo sequences, and check whether the local maximum exceeds an adaptive threshold. 2006/10/24 ACM Multimedia Conference 2006
10
Content-based Clustering (1/3)
Given a time-based photo cluster, finer clustering is performed based on content-based features. (dominant color and color layout) Within-cluster distance: Between-cluster distance: d(.) is the average of normalized dominant color and color layout distances. After time-based clustering, a cluster may consist of more than twenty or thirty photos. We try to perform finer organization based on content-based features. Given a time-based cluster, we would like to try different clustering cases and find the best one. Basically, we prefer that photos in the same cluster are similar, and photos in different clusters are distinct as much as possible. Therefore, we define the within-cluster distance and between cluster distance. The goodness of a clustering case is measured by between-cluster distance over within-cluster distance. Larger value means better clustering. Goodness of a clustering case: 2006/10/24 ACM Multimedia Conference 2006
11
Content-based Clustering (2/3)
case 1 Clustering case 2 Let me show you an example. Given a set of photos, we can categorize them into two clusters like case 1. On the other hand, we also can categorize them into three clusters. According to the goodness value defined above, we can evaluate different clustering cases and select the best one. 2006/10/24 ACM Multimedia Conference 2006
12
Content-based Clustering (3/3)
Clustering Results Here is a real example. X-axis denotes different clustering cases, and y-axis denotes their goodness. We pick the case that have the largest goodness. From this example, we can see this case is better than others. Photos having similar characteristics are clustered together.
13
Music Analysis Beat detection Music segmentation
For frame switching and photo displaying Frame 1 Frame 2 2 4 1 3 5 (4 seconds) r1 (6 seconds) Search range for frame switching r2 Sound Energy Difference For music signals, we first detect beat information and sound energy difference, which will be the basis for switching frames or displaying photos. If the first fame starts at time t1, we would like to search from t1 plus r1 to plus r2 to find appropriate timing for frame switching. The beat that corresponds to the largest energy difference would be our choice. From t1 to t4, we can averagely dispatch smaller pieces for displaying photos or find the beats that correspond to large energy difference. 1 Frame 1 starts t1 2 t2 3 t3 4 Frame 2 starts t4 5 t5 Music Beats
14
ACM Multimedia Conference 2006
Short Summary Photo Filter out defective photos Organize photos in terms of time and content characteristics Music Segment into smaller pieces Photos Music Preprocessing Beat detection Clustering For now, we have filtered out defective photos and organized them in terms of time and content characteristics. Moreover, music has been segmented into smaller pieces. In the following, we would like to put the photos in the same content-based cluster into the same frame, which will be displayed within the time duration of a music segment. Composition Tiling slideshow 2006/10/24 ACM Multimedia Conference 2006
15
Tiling Slideshow Composition
Problem 1 Given a time-limited music clip, only a subset of photo clusters can be displayed. Problem 2 For a cluster of photos to be displayed, more important photos should occupy larger space. Problem 3 Photos should be smartly manipulated to fit in with the limited displaying space. We have to solve three problems to achieve the final results. First, the user-selected music is time-limited, and we are not able to present all the hundreds or thousands of photos. Only part of photo clusters would be selected for display. Second, for a cluster of photos selected to be presented, more important photos should be allocated larger space. Photos at the same frame should be differentiated. The third problem is: Because the displaying space is also limited, smart manipulation such as cropping or resizing should be performed. 2006/10/24 ACM Multimedia Conference 2006
16
Cluster Selection (for Problem 1)
Cluster-based importance Defined based on “photo per minute (PPM)” and “photo conformance (PC)” For each content-based cluster Cg in a time-based cluster ─ Shooting frequency ─ Opposite to within-cluster distance For problem 1, we define the cluster-based importance to be the metric for cluster selection. It is defined by photo per minute and photo conformance. Basically, PPM denotes shooting frequency and PC is opposite to the within-cluster distance. The photo cluster that has high shooting frequency and large photo similarity would be preferred in cluster selection. ─ Nonlinear fusion scheme 2006/10/24 ACM Multimedia Conference 2006
17
Template Determination (for Problem 2)
Templates importance ─ Template importance vector 3-cell Template Topic cell For problem 2, templates with different sizes of cells are designed. Here show some templates for three and four photos. Intuitively, if the content-based cluster contains four photos, we just select the template with four cells. To further differentiate the photos at the same frame, we prefer to match the photo importance with template importance and generate more elaborate layout. We first define template importance, which is the ratio of a cell to the whole frame. After sorting, the template importance vector is packed by individual cells. Supportive cell 4-cell Template Topic cell Topic cell 2006/10/24 ACM Multimedia Conference 2006
18
Template Determination (for Problem 2)
Photo-based importance Defined based on “face region (FR)” and “attention value (AV)” Correspondingly, we define photo-based importance, based on face region and attention value. Then, the photo importance vector is packed by individual photo. ─ Photo importance vector 2006/10/24 ACM Multimedia Conference 2006
19
Template Determination (for Problem 2)
Find the most matching between template importance and photo importance Find the minimum included angle between them To find the best match between photos and various templates, we just calculate the included angle between them and find the minimum one. The minimal included angle means that the importance distributions between them is the most similar. Because the importance values are sorted, which photo should be put into which cell has been determined after this process. 2006/10/24 ACM Multimedia Conference 2006
20
Composition (for Problem 3)
Find the region that conveys most “content value” and conforms to the aspect ratio of the targeted cell. Top-down case: (photo with face) Bottom-up case: (photo without face) Finally, we have to stick photos on the targeted cells. Instead of directly resizing the whole photo to fit in with the targeted cell, we would like to find a region that conveys most content values and conforms to the aspect ratio of the targeted cell. According to whether the photo has face or not, we first find the region-of-interest from two perspectives. In the top-down case, the selected region starts from centroid of face region. In the bottom-up case, the selected region starts from centroid of ROI. 2006/10/24 ACM Multimedia Conference 2006
21
Composition (for Problem 3)
1. Find ROI 2. Extend 480 pixels 720 pixels 3. Crop 4. Resize Let’s see a real example in composition. After determining the targeted cell of each photo, we first find the ROI from top-down or bottom-up perspectives. Then the regions are extended to the boundary of photo, according the aspect ratio of its targeted cell. The selected regions are then cropped out and resized to fit in with the targeted cell.
22
ACM Multimedia Conference 2006
Demo 2006/10/24 ACM Multimedia Conference 2006
23
Evaluation Data set Data Set 1: Data Set 2: Data Set 3: 780 photos
Music: 3m31s 522 photos Music: 4m38s 1257 photos Music: 4m06s Osaka, Kyoto, Kobe, Nagoya, Tokyo (Japan) Melbourne, Brisbane (Australia) Amsterdam (Netherlands) Osaka, Kyoto, Kobe (Japan) We evaluate the performance based on three different photo sets. They are all taken by amateurs in different trips, and different lengths of music are selected to generate tiling slideshows. 2006/10/24 ACM Multimedia Conference 2006
24
ACM Multimedia Conference 2006
User Study Compare the satisfaction of ACDSee, PhotoStory, and Tiling slideshow Questionnaire Q1: How do you feel the photo variety in a time unit? Q2: Do you think it's a funny presentation? Q3: Do you think the sequence helps you experience this trip? Q4: Are you willing to use it to generate your own slideshow? Q5: How do you feel the audiovisual effects of this slideshow? Because viewing experience is hard to be quantified, a user study was conducted in this paper. We compare the satisfaction of ACDSee, Microsoft PhotoStory, and Tiling slideshow. Five questions are asked to twenty-seven evaluators, including how do you feel the photo variety in a time unit, do you think it’s a funny presentation, do you think the sequence helps you experience this trip, are you willing to use it to generate your own slideshow, and how do you feel the audiovisual effects of this slideshow. 2006/10/24 ACM Multimedia Conference 2006
25
Subjective Scores Questions Sequence 1 Sequence 2 Sequence 3
Because ACDSee only provides the most ordinary slideshow, we set its scores as five to be the baseline. We can see that tiling slideshow has better subjective acceptance than others. The performance difference between PhotoStory and tiling slideshow is slightly smaller for question 5. We guess it’s because the beat detection is still not perfect, and sometimes the coordination between music beat and photo presentation is not good enough.
26
ACM Multimedia Conference 2006
Objective Tests (1/2) Clustering performance evaluation #frames # photos # frame with clustering error Avg. number of photos in a frame Slideshow 1 37 127 1 3.43 Slideshow 2 48 172 3.58 Slideshow 3 43 184 2 4.28 2006/10/24 ACM Multimedia Conference 2006
27
ACM Multimedia Conference 2006
Objective Tests (2/2) Cropping performance evaluation # photos # ill-cropped photos # ill-cropped photos in topic cell Slideshow 1 127 5 1 Slideshow 2 172 Slideshow 3 184 6 3 2006/10/24 ACM Multimedia Conference 2006
28
ACM Multimedia Conference 2006
Summary We propose a new type of audiovisual presentation for consumer photos. Perform both visual and music analysis for organized presentation. We deal with issues on content selection and smart manipulation to display qualified content in limited time and limited space. Semantic features or user intervention can be added to enhance the performance. 2006/10/24 ACM Multimedia Conference 2006
29
ACM Multimedia Conference 2006
Backup Slides 2006/10/24 ACM Multimedia Conference 2006
30
System Overview Tiling slideshow Quality Estimation Photo Filtering
Content-based Clustering Time-based Clustering Beat Detection Tiling Slideshow Composition Photos Music Orientation Correction ROI Determination Preprocess Analysis Composition Music Segmentation Tiling slideshow
31
An EXIF Example File name : IMG_1770.JPG File size : 2062120 bytes
File date : 2005:11:16 10:04:20 Camera make : Canon Camera model : Canon PowerShot S60 Date/Time : 2005:11:16 10:04:21 Resolution : 2592 x 1944 Orientation : rotate 90 Flash used : No (auto) Focal length : 5.8mm (35mm equivalent: 29mm) CCD width : 7.19mm Exposure time: s (1/100) Aperture : f/2.8 Whitebalance : Auto Metering Mode: matrix
32
Blur Detection
33
Blur Detection + H. Tong, M. Li, H.-J. Zhang, and C. Zhang, “Blue detection for digital images using wavelet transform,” Proc. of ICME, pp , 2004.
34
Time-based Clustering
Adaptive threshold clustering algorithm gi is the time gap between photo i and photo i+1 K is a suitable threshold (K=log(17)) d is the size of sliding windows (d = 5) 11 time gaps g1 g2 gN J.C. Platt, M. Czerwinski, and B.A. Field, “PhotoTOC: automating clustering for browsing personal photographs,” Proc. of PCM, pp. 6-10, 2003. 2006/10/24 ACM Multimedia Conference 2006
35
ACM Multimedia Conference 2006
(b) 2006/10/24 ACM Multimedia Conference 2006
36
Beat Detection Music Signal Frequency Filterbank Envelope Extractor Envelope Extractor First-Order Differentiator First-Order Differentiator Half-wave Rectifier Half-wave Rectifier Comb Filterbank Comb Filterbank . . Energy Energy Energy Energy ∑ ∑ Beat Peak Picking E.D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” Journal of Acoustical Society of America, vol. 103, no. 1, pp , 1998.
37
Tiling Slideshow Composition
Cluster Selection Cluster-based importance Template Determination Photo-based importance Spatial Composition Smart cropping Temporal Composition 2006/10/24 ACM Multimedia Conference 2006
38
ACM Multimedia Conference 2006
ROI Determination Top-Down Attention Detection Face detection 2006/10/24 ACM Multimedia Conference 2006
39
ACM Multimedia Conference 2006
ROI Determination Bottom-up Attention Detection Salience Map Generation Attentive Center and Region Extraction 2006/10/24 ACM Multimedia Conference 2006
40
Composition (for Problem 3)
Region selection C(Ri) = content value of the region Ri Top-down case: Bottom-up case: IMP(x,y): Applying a 2D Gaussian to the point (x,y), which is the centroid of face region or saliency map. 2006/10/24 ACM Multimedia Conference 2006
41
ACM Multimedia Conference 2006
User Study 2 Evaluate the performances in terms of content-based clustering and template determination. Questionnaire Q6: How do you feel the visual coherence of photos in the same frame? Q7: How do you feel the layout of display? 2006/10/24 ACM Multimedia Conference 2006
42
ACM Multimedia Conference 2006
User Study 2 Question 6 Question 7 2006/10/24 ACM Multimedia Conference 2006
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.