Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany.

Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Meta Data and Visual Data on the Social Web Meta Data: Tags Title Descriptions Timestamps Geo-Tags Comments Numerical Ratings Users and Social Links Visual Data: Photos Videos How to exploit combined information from visual data and meta data?

Example 1: Photos in Flickr

Example 2: Videos in Youtube

Social Web Environments as Graph Structure User 1 Video 1 Video 2 Video 3 User 3 User 2 tag1 tag2 tag3 Group 2 Entities (Nodes): Rescources (Videos, Photos) Users Tags Groups Relationships (Edges): User-User: Contacts, Friendship User-Resources: Ownership, Favorite Assignment, Rating User-Groups: Membership Resource-Resource: visual similarity, meta data similarity

User Feedback on the Social Web Numeric Ratings, Favorite Assignments Comments Clicks/Views Contacts, Friendships Community Tagging Blog Entries Upload of Content How can exploit the community feedback?

Outline Part 1: Photos on the Social Web 1.1) Photo Attractiveness 1.2) Generating Photo Maps 1.3) Sentiment in Photos Part 2: Videos on the Social Web Video Tagging

Part I: Photos on the Social Web

1.1) Photo Attractiveness * * Stefan Siersdorfer, Jose San Pedro Ranking and Classifying Attractiveness of Photos in Folksonomies 18th International World Wide Web Conference, WWW 2009, Madrid, Spain

10 Attractiveness of Images LandscapePortraitFlower Which factors influence the human perception of attractiveness?

11 Attractiveness Visual Features Human visual perception mainly influenced by Color distribution Coarseness These are complex concepts Convey multiple orthogonal aspects Necessity to consider different low level features

12 Attractiveness Visual Features Color Features Brightness Contrast Luminance, RGB Colorfulness Naturalness Saturation Mean, Variance Intensity of the colors Saturation is 0 for grey scale images

13 Visual Features Coarseness Resolution + Acutance Sharpness Critical importance for final appearance of photos [Savakis 2000]

Textual Features We consider user generated meta data Correlation of topics with image appealing (ground truth: favorite assignments) Tags seem appropriate to capture this information

Attractiveness of Photos Community-based models for classifying/ranking images according to their appeal. [WWW´09] Content (visual features) Metadata (textual features) Community Feedback (photo’s interestingness) Classification & Regression Attractiveness Models Generator Inputs Flickr Photo Stream cat, fence, house #views #comments #favorites...

16 Classification & Regression Models

17 Experiments

1.2) Generating Photo Maps * *Work and illustrations from David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, Mapping the World's Photos, 18th International World Wide Web Conference, WWW 2009, Madrid, Spain

Outline: Photos maps Use geo-location, tags, and visual features of photos to Identify popular locations and landmarks Find out location of photos Estimate representative images

Spatial Clustering Each data point corresponds to (longitude,latidue) of an image Mean shift clustering is applied to get hierarchical structure Most distinctive popular tags are used as labels (# photos tag in cluster/ # photos with tag in overall set) london paris eiffel louvre trafalgarsquare tatemodern

Estimating Location of Photos without tags Train SVMs on Clusters Positive Examples: Photos in Clusters Negative Examples: Photos outside the Cluster Feature Representation Tags Visual features (SIFT) Best Performance for Combination of Tags and SIFT features

Finding Representative Images Construct Weighted Graph: -Weight based on visual similarity of images (using SIFT features) -Use Graph Clustering (e.g. spectral clustering) to identify tightly connected components -Choose image from this connected component

Example 1: Europe

Example 2: New York

1.2) Sentiment in Photos * * Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan Deng Analyzing and Predicting Sentiment of Images on the Social Web 18th ACM Multimedia Conference (MM 2010), Florence, Italy

Sentiment Analysis of Images Data: more than 500,000 Flickr Photos Image Features  Global Color Histogram: a color is present in the image  Local Color Histogram: a color is present at a particular location  SIFT Visual Terms: b/w patterns rotated and scaled Image Sentiment  SentiWordNet: provides sentiment values for terms  e.g. (pos, neg, obj) = (0.875, 0.0, 0.125) for term „good“  used for obtaining sentiment categories  training set + ground truth for experiments

Which are the most discriminative visual terms? Use Mutual Information Measure to determine these features: Probabilities (estimated through counting in image corpus): P(t): Probability that visual term t occurs in image P(c): Probability that image has sentiment category c („pos“ or „neg“) P(t,c): Prob. that image is in category c and has visual term t Intuition: „Terms that have high co-occurence with a category are more characteristic for that category.“

Most Discriminative Features Most discriminative visual features: Extracted using the Mutual Information measure [ACM MM’11]

Part 2: Videos on the Social Web * * Stefan Siersdorfer, Jose San Pedro, Mark Sanderson Content Redundancy in YouTube and its Application to Video Tagging ACM Transactions on Information Systems (TOIS), 2011 Stefan Siersdorfer, Jose San Pedro, Mark Sanderson Automatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009

Near-duplicate Video Content Youtube: most important video sharing environment [SIGCOM’07]: 85 M videos, 65 k videos/day, 100 M downloads per day, Traffic to/from Youtube = 10% / 20% of the Web total Redundancy: 25% of the videos are near duplicates Can we use reduandancy to obtain richer video annotations?  Automatic tagging

Automatic Tagging What is it good for? Additional information  Better user experience Richer feature vectors for...  Automatic data organization (classification and clustering)  Video Search  Knowledge Extraction (  creating ontologies)

Overlap Graph Video 1 Video 3 Video 2 Video 5 Video 4 Video 1 Video 5 Video 2 Video 3 Video 4

Neighbor-based Tagging (1): Idea Video 4 contains original tags A, B; tags F,E are obtained from neighbors Criteria for automatic tagging: Prefer tags used by many neighbors Prefer tags from neighbors with a strong link Video 1Video 2Video 3 Video 4 ABCABC AEAE BEFBEF ABFEABFE automatically generated

Neighbor-based Tagging (2): Formal Weights correspond to overlap Indicator function Sum over all neighbors

Neighbor-based Tagging (3) Apply additional smoothing for redundant regions Number of neighbors with tag t Subsets of neighbors Smoothing factor Overlap Region

TagRank Takes also transitive relationships into account PageRank-like weight propagation

Applications of Extended Tag Respresentation Use relevancies rel( t, vi) for constructing enriched feature vectors for videos: combine original tags with new tags weighted by relevance values automatic annotation : use thresholding to select most relevant tags for a given videos Manual assessment of tags show their relavance Data organization: Clustering and Classification experiments (Ground truth: Youtube categories of videos) Improved performance through enriched feature representation

Summary Social Web contains visual information (photos, videos) and meta data (tags, time stamps, social links, spatial information,..) A large variety of users provide explicit and implict feedback in social web environments (ratings, views, favorite assignments, comments, content of uploaded material) Visual Information & annotations can be combined to obtain enhanced feature representations Visual information can help to establish links between resources such as videos (application: information propagation) Feature representations in combination with community feedback can be used for machine learning (appliciation: classification, mapping).

References Stefan Siersdorfer, Jose San Pedro, Mark Sanderson Content Redundancy in YouTube and its Application to Video Tagging ACM Transactions on Information Systems (TOIS), 2011 Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan Deng Analyzing and Predicting Sentiment of Images on the Social Web 18th ACM Multimedia Conference (MM 2010), Florence, Italy Stefan Siersdorfer, Jose San Pedro, Mark Sanderson Automatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009 Stefan Siersdorfer, Jose San Pedro Ranking and Classifying Attractiveness of Photos in Folksonomies 18th International World Wide Web Conference, WWW 2009, Madrid, Spain David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg Mapping the World's Photos 18th International World Wide Web Conference, WWW 2009, Madrid, Spain

Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany.

Similar presentations

Presentation on theme: "Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany.

Similar presentations

Presentation on theme: "Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany."— Presentation transcript:

Similar presentations

About project

Feedback