Pooria Taghizadeh : Dr. Hadi Tabatabaee : Dr. Mona Ghassemian :

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Improved TF-IDF Ranker

Learning more about Facebook and Twitter. Introduction  What we’ve covered in the Social Media webinar series so far  Agenda for this call Facebook.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.

SNOW Workshop, 8th April 2014 Real-time topic detection with bursty ngrams: RGU participation in SNOW 2014 challenge Carlos Martin and Ayse Goker (Robert.

Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

Web Mining Research: A Survey

Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.

On-Site Strategies for Optimizing Your Local Business. sunclouddesign.com/talks.

Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.

TwitterSearch : A Comparison of Microblog Search and Web Search

Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.

A Media-based Social Interactions Analysis Procedure Alan Keller Gomes and Maria da Graça Campos Pimentel SAC’12 17 March 2015 Hyewon Lim.

Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.

Master Thesis Defense Jan Fiedler 04/17/98

Making the most of social historic data Aleksander Kolcz Twitter, Inc.

Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,

1 Characterizing Botnet from Spam Records Presenter: Yi-Ren Yeh ( 葉倚任 ) Authors: L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten,

Web Optimization- Review. Web Optimization- Metrics ( ROI)  What is ROIROI Return on Investment (Finance) ROI = Profit – Costs / Costs.

1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.

Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.

Module 5 A system where in its parts perform a unified job of receiving inputs, processes the information and transforms the information into a new kind.

Microblogs: Information and Social Network Huang Yuxin.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 6: Information Retrieval and Web Search

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Search Engine Architecture

Social Media for Nonprofits – Making an Impact

How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.

SEO & Analytics The Grey and the Hard Numbers. Introduction  Build a better mouse trap and the world will beat a path to your door  Mouse Trap -> Website.

Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.

Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.

Chapter 8: Web Analytics, Web Mining, and Social Analytics

Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

Uncovering Social Spammers: Social Honeypots + Machine Learning

Queensland University of Technology

Item-to-Item Recommender Network Optimization

Topical Authority Detection and Sentiment Analysis on Top Influencers

TweetIng…Professionally

Arabic Text Categorization Based on Arabic Wikipedia

Search Engine Architecture

Personalized Social Image Recommendation

Summary Presented by : Aishwarya Deep Shukla

Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi

MID-SEM REVIEW.

Generative Model To Construct Blog and Post Networks In Blogosphere

Information Retrieval

#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),

REVEAL Total cost: EUR EU contribution: EUR

An introduction to Bayesian reasoning Learning from experience:

Blogging in the Classroom

iSRD Spam Review Detection with Imbalanced Data Distributions

EQ: What are the characteristics of science?

A Network Science Approach to Fake News Detection on Social Media

Web Mining Department of Computer Science and Engg.

Chapter 5: Information Retrieval and Web Search

Ying Dai Faculty of software and information science,

Search Engine Architecture

Sentiment Analysis In Student Learning Experience By Obinna Obeleagu

Sentiment Analysis In Student Learning Experience By Obinna Obeleagu

TOP 10 SOCIAL MEDIA MARKETING TIPS FOR LEARNERS PRESENTED BY:- RITIKA GOSWAMI.

Tips for AP Exam!.

Presentation transcript:

Quality of Claim Metrics in Social Sensing Systems: A case study on IranDeal Pooria Taghizadeh : pooria.tgh@gmail.com Dr. Hadi Tabatabaee : h_tabatabaee@sbu.ac.ir Dr. Mona Ghassemian : m_ghassemian@sbu.ac.ir Dr. Hamed Haddadi : hamed.haddadi@qmul.ac.uk

Outline Introduction Sources of claim uncertainty and invalidity Quality of claim metrics Datasets Evaluation and analysis Conclusion Quality of Claim Metrics in Social Sensing Systems

Introduction What is a social sensing system? The main components Social Sensing is referred to systems that use people as sensors and claim the events happening in their surroundings. The main components Quality of Claim Metrics in Social Sensing Systems

Uncertainty and Invalidity Spam Gossip User inaccuracy Sensor inaccuracy Problems Quality of Claim Metrics in Social Sensing Systems

Sources of Claim Uncertainty & Invalidity Sources of claim uncertainty and invalidity: Gossip Regular expressions “is (that | this | it) true” “wh[a]*t[?!][?1]*” Spam In web-based systems: CAPTCHA In social networks: by analyzing the inputs such as tags, links, tips and comments Quality of Claim Metrics in Social Sensing Systems

Sources of Claim Uncertainty & Invalidity (Cont.) Inaccuracy of users People are the core element of the social sensing system Main weak points of the system: Human errors Claims cannot be fully trusted Quality of Claim Metrics in Social Sensing Systems

Sources of claim uncertainty & invalidity (Cont.) Claim validation assessment: How to identify valid claims? This issue was introduced on web before: Sums, Average Log, Investment. Some possible solutions: machine learning natural language processing data mining clustering methods Quality of Claim Metrics in Social Sensing Systems

Quality of claim metrics Content Measure: The richness of the claim contents facilitates the back-end applications. Feedback (Popularity) Measure Each claim published on a social network may provoke reactions users judgments redistributing the claim Quality of Claim Metrics in Social Sensing Systems

Content Measure Content diversity User tagging The diversity of the type of information Text, Video, Image User tagging users can be mentioned and notified by each other provides new information about the importance of the claim mentioning can be analyzed to find debates between users Quality of Claim Metrics in Social Sensing Systems

Content Measure (Cont.) Quantity of used keywords The set of keywords is dependent on the subject The set of keywords needs a prior knowledge The set can be extracted by preprocessing the claims The higher number of used keywords will increase the value of the claims Geo-tagging It is used to pin the locations of the users The information is valuable in location base analysis to cluster the reporting user Quantity of used hashtags Analyzing hashtags are easier than the keywords one of the main approaches to query the posted claims over a specific period of time Quality of Claim Metrics in Social Sensing Systems

Feedback Measure Opinion reaction Redistribution This parameter can help validate the information by unknown users. In some of the systems, users may rate by giving stars Redistribution The number of reclaims shows the popularity of the claim Quality of Claim Metrics in Social Sensing Systems

Social Network Support Quality of Claim Metrics in Social Sensing Systems

Datasets Two hashtag-centric and user-centric datasets are gathered by the crawler for the evaluation The first dataset is extracted from the Twitter based on IranDeal hashtag 260,000 tweets 66,238 users The second dataset is extracted from the Foursquare social network 7,402 users 40,741 Tips 35,503 restaurants Quality of Claim Metrics in Social Sensing Systems

Evaluation: Comments/User The users are grouped according to the number of reported claims About 14% of the users (36663 users) post exactly 1 tweet. Only 4% have two posts. The percentage decreases as the number of tweets increases. Quality of Claim Metrics in Social Sensing Systems

Popularity of comments The number of likes for each comment shows its popularity the comments are categorized based on their number of likes A large fraction of tweets (93%) does not get any favorites The portion of tweets that gets 1 and 2 favorites are 3.4% and 1.1% respectively Quality of Claim Metrics in Social Sensing Systems

Re-Tweets One of the other popularity metrics is the rate of sharing a comment. It expresses the dependency between the QoC metrics and the way the dataset is crawled people who follow the hashtag are eager to share the news headline The sparsity of the data for the values of higher than 500 affects the results Quality of Claim Metrics in Social Sensing Systems

Tagged user / comment The tags provide extra information that boosts claims processing applications The highest frequency belongs to the comments with a single tagged user (140191 tweets) The highest population of tagged users in a tweet is mentioned to be 12 people Around 15% of tweets tagged exactly two users and the values decrease in higher numbers Quality of Claim Metrics in Social Sensing Systems

Evaluation and analysis Power law distribution We used the Zipf law. S shows the degree of curve slope. Comparing the value of s for these datasets implies that the nature of the used social network affects the characteristics of the dataset. Quality of Claim Metrics in Social Sensing Systems

Conclusion We Review the Sources of claim uncertainty and invalidity Defines a new set of quality of claims metrics The analysis show that most of the metrics follow the power law. But it is not a general rule The degree of power law is dependent to the nature of dataset and the social network Quality of Claim Metrics in Social Sensing Systems

Questions Quality of Claim Metrics in Social Sensing Systems