April 2014 SEWM 20141 Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Three things everyone should know to improve object retrieval
Large-Scale Entity-Based Online Social Network Profile Linkage.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Machine learning continued Image source:
Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
K nearest neighbor and Rocchio algorithm
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Exercise Session 10 – Image Categorization
Bag-of-Words based Image Classification Joost van de Weijer.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Person-Specific Domain Adaptation with Applications to Heterogeneous Face Recognition (HFR) Presenter: Yao-Hung Tsai Dept. of Electrical Engineering, NTU.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Presented by Tienwei Tsai July, 2005
The identification of interesting web sites Presented by Xiaoshu Cai.
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Workshop on Social Events in Web Multimedia, ICMR 2014 Social Event Detection at MediaEval: a three-year retrospect of tasks and results Georgios Petkos,
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.
Source-Selection-Free Transfer Learning
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
1 Mining the Web to Determine Similarity Between Words, Objects, and Communities Author : Mehran Sahami Reporter : Tse Ho Lin 2007/9/10 FLAIRS, 2006.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Dense Color Moment: A New Discriminative Color Descriptor Kylie Gorman, Mentor: Yang Zhang University of Central Florida I.Problem:  Create Robust Discriminative.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
System for Semi-automatic ontology construction
Supervised Time Series Pattern Discovery through Local Importance
Using Transductive SVMs for Object Classification in Images
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Using Uneven Margins SVM and Perceptron for IE
Hierarchical, Perceptron-like Learning for OBIE
Semi-Automatic Data-Driven Ontology Construction System
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

April 2014 SEWM Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University, Swiss  Minh-Son Dao, University of Information Technology, Vietnam  Riccardo Mattivi, Trento University, Italy  Francesco G.B. De Natale, Trento University, Italy SEWM – ICMR – 2014 Glasgow, UK

Outline  Social Event and Web-media  User-centric Parallel Split-n-merge for Events Clustering  Composite Kernel for Event Classification  Ongoing work  Conclusion April 2014 SEWM 20142

April 2014 SEWM Tsunami -Miyagi, Japan -Mar 11, Tsunami -Miyagi, Japan -Mar 11, 2011

Observations  Time-Location: Users cannot attend two events at the same time at different places whose locations are far away each other  Theme: Users in the same community tend to TAG the same event with similar words Users tend to take series of images in a short interval time for what they pay attention Images related to an event of a given type share some common visual features that are characteristic for that event type  Spatio-Temporal-Theme April 2014 SEWM 20144

User-centric Parallel Split-n-merge April 2014 SEWM Web media collection A crawled from Social Networks Convert A to UT-image Split each row of UT-image into clusters {b i } Merge {b i } using {location, time, theme} Merge {b i } using {location, time, theme} and Common-sense Merge {b i } using visual information

UT-Image April 2014 SEWM photo_url username dateTaken title description tags locations users time Sort by time for each row. Those pixels (in the same row) do not have time will be grouped and put at the beginning of the row

Split by TIME April 2014 Truc-Vien T. Nguyen7 If no time information, each pixel is treated as one cluster If there is time information

Merge by spatio-time-theme April 2014 Truc-Vien T. Nguyen8 for selected cluster b k, create -time-taken-boundary T k -Location-union L k -Document (tag, title, description) D k for any pair of clusters (b k, b l ), merge if 2/3 following conditions are hold -Tdistance(T k, T l ) ≤α -Ldistance(L k,L l ) ≤ β -JaccardIndex(D k, D l ) ≥ γ

Merge by common-sense April 2014 Truc-Vien T. Nguyen9 Process tf-idf on D k and select the most COMMON key-words to create ND k With any pair of cluster (b k,b l ), merge if JaccardIndex(ND k, ND l ) ≥ γ

Merge with Visual features April 2014 Truc-Vien T. Nguyen10 with any pair of cluster (b k, b l ), merge if JaccardIndex(BoW k, BoW l ) ≥ θ

Results – Events clustering April 2014 SEWM MediaEval 2013 dataset and participants

Result - Events Clustering April 2014 SEWM The first run (Split, Merge by spatio-location-them) α=24 hours, β=5km, γ=0.2 -The second run (as the first) α=8 hours, β=2km, γ=0.2 -The third run (as the first plus common-sense merging) -The last run, as the third plus visual feature θ= 0.3

April 2014 SEWM Classification Problems  Supervised Learning: learn a function  : → from examples  Binary Classification: = {-1, +1}  Multi-class Classification: = {1,2,…,k}  Event Classification: Each member of has a set of features

April 2014 SEWM SVM- Multiclass Classification  Support Vector Machines (SVMs) Binary classification Computing a function (Kernel) between each pair of samples One Vs. Rest  Multi-class Classification

April 2014 SEWM Event Categories ClassEvent Type 0Conference 1Fashion 2Concert 3Non_event 4Sports 5Protest 6Other 7Exhibition 8Theater_dance

April 2014 SEWM Composite Kernel text features Coefficient visual features  ,1),(),(EEKEEKEECK VT  

April 2014 SEWM Text Features  NLP basic features: the word, its lower-case, four prefixes, four suffixes, orthographic feature, word form feature.  Ontological features: obtained by matching w i with a knowledge base, for ex. “Washington”->City  Encyclopedic features: obtained by associating w i with Wikipedia, for ex. “Washington”->

An excerpt from the ontology April 2014 SEWM

Visual Features April 2014 SEWM Dense RGB-SIFT - SVM with histogram intersection kernel - the SVMs have been trained with the images given in the SED training set - codebook for the bag of words with 4096 visual words

Results – Events Classification April 2014 SEWM Run with test-set cross-validation on the training set

Ongoing work April 2014 SEWM Events clustering Web media Events classification Training data -Set of instances of events -Have ability of automatically annotating events -Extend to “automatically annotation images” Topic modeling (apply on set of document D k ) name clusters classifiers events Improve events clustering qualification

Conclusion April 2014 SEWM Event clustering -Simple and easy to develop -Can develop to run on parallel mode -Need to find the way to automatically adjust parameters 2.Event classification -Composite kernel combined both text and visual features -The combination has proved its robustness with a significant improvement in performance (from 45.83% to 53.58% with basic features, and from 47.61% to 54.86% with our new features) -Encyclopedic knowledge such as Wikipedia, could provide a great additional resource

Thanks for your attention April 2014 SEWM Q & A

April 2014 Truc-Vien T. Nguyen24 Features  w i is text of the title, description, or the tag in each event  l i is the word w i in lower-case  p1 i, p2 i, p3 i, p4 i are the four prefixes of w i  s1 i, s2 i, s3 i, s4 i are the four suffixes of w i  f i is the part-of-speech of w i  g i is the orthographic feature that test whether a word contains all upper-cased, initial letter upper-cased, all lower-cased.  k i is the word form feature that test whether a token is a word, a number, a symbol, a punctuation mark.  o i is the ontological features. We used an ontology and knowledge base that contains 355 classes, 99 properties, and more than 100,000 entities. Given a full ontology, w i is be matched to the deepest subsumed child class.