MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

Slides:



Advertisements
Similar presentations
Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
PoliWeb project (PEPS'14) Geraldine Castel CEMRA, Université Stendhal, France Genoveva Vargas-Solar CNRS, LIG-LAFMIA, France Towards a cloud infrastructure.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
August 23, 2013 Social Media Audit. Overview  Goals –Evaluate current social networking status –Identify trending topics and social influencers –Provide.
EVitae: An Event-Based Electronic Chronicle Bin Wu Rahul Singh Punit Gupta Ramesh Jain
Information Retrieval in Practice
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
THE UNIVERSITY OF HONG KONG WEB BY DANIEL CHURCHILL 2.0.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Data Analysis Statistics. Inferential statistics.
Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Overview of Search Engines
Top 5 Facebook Tips Mark Smith Rosemary Turner. What is Facebook? Users create a personalised profile for themselves and then add people as friends to.
Towards Boosting Video Popularity via Tag Selection Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu University of British Columbia -
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School.
Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po.
How do I decide whom to follow on Twitter ? IARank: Ranking Users on Twitter in Near Real-time, Based on their Information Amplification Potential.
Qualitative and Quantitative Research Quantitative Deductive: transforms general theory into hypothesis suitable for testing Deductive: transforms general.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
HOW TO MAKE A SURVEY WITH SURVEY MONKEY Directions with Diagrams Professional Development Webinar Survey Monkey Logo.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
HOW TO WRITE RESEARCH PROPOSAL BY DR. NIK MAHERAN NIK MUHAMMAD.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Chapter 6: Information Retrieval and Web Search
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
--He Xiangnan PhD student Importance Estimation of User-generated Data.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
TREND REPORT AIRBUS Where Are Online Conversations Happening? The highest volume of online activity surrounding this story are happening in the NEWS and.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.
CRESST ONR/NETC Meetings, July 2003, v1 17 July, 2003 ONR Advanced Distributed Learning Greg Chung Bill Bewley UCLA/CRESST Ontologies and Bayesian.
Presenting and Analysing your Data CSCI 6620 Spring 2014 Thesis Projects: Chapter 10 CSCI 6620 Spring 2014 Thesis Projects: Chapter 10.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : YUNG-MING LI, TSUNG-YING LI 2013, DSS Deriving market intelligence from microblogs.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Copyright 2012 Adobe Systems Incorporated. All rights reserved. ® REUSABLE DESIGN.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Cs Future Direction : Collaborative Filtering Motivating Observations:  Relevance Feedback is useful, but expensive a)Humans don’t often have time.
IoT Meets Big Data Standardization Considerations
Extracting value from grey literature Processes and technologies for aggregating and analysing the hidden Big Data treasure of the organisations.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Exploring Social Influence via Posterior Effect of Word-of-Mouth Recommendations Junming Huang, Xue-Qi Cheng, Hua-Wei Shen, Tao Zhou, Xiaolong Jin WSDM.
What is Seo? SEO stands for “search engine optimization.” It is the process of getting traffic from the “free,” “organic,” “editorial” or “natural” search.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Chapter 2 Frequency Distributions and Percentiles.
Indexing The World Wide Web: The Journey So Far Abhishek Das, Ankit Jain 2011 Paper Presentation : Abhishek Rangnekar 1.
Social Media Marketing Lecture 1 Final Project Overview.
English Extension 1 Preliminary Course. A Word From BOS  2 English (Extension) 12.1 Structure  The Preliminary English (Extension) course consists of.
PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Information Retrieval in Practice
Automatic Extraction of Malicious Behaviors
Uncovering Social Spammers: Social Honeypots + Machine Learning
Search Engine Architecture
Chapter 5 Interpreting and Summarizing Published Research
Personalized Social Image Recommendation
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Multimodal rhetoric January 29, 2018.
CS 594: Empirical Methods in HCC Social Network Analysis in HCI
Measuring Complexity of Web Pages Using Gate
Understanding Statistical Inferences
Presentation transcript:

MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity

ALETHIOMETER FRAMEWORK CCContributorontentontext 3

C1 CONTRIBUTOR 4

5 Contributor modalities Reputation - Analyse comments in the course of time, discover sentiments and opinions towards a source. - Measured by the number of upvotes or likes. History - Information about activity on different social media platforms, combined with validity data. - Measured by the update frequency of valid posts. Popularity - Information about following source activity (readings, recommendations). - Measured by the number of friends/followers, and the number of responses.

6 Contributor modalities Influence - Information about activities triggered by this source (re-posts, discussions or comments). - Measured by number of retweets/shares, Klout influence score. Presence - Information about type of source (individual, organisation,officially verified account, fake identity, etc.) and its presence on multiple social media platforms. - Measured by the number of accounts in different social media.

C2 CONTENT 7

8 Reputation of linked web content - Measured in terms of domain reputation, page rank (GoogleRank or Alexa PageRank), or properties of the contributors to the content. Provenance - Finding the original occurrence of the content and its whole path across sources, places and time, and measuring the reputation of these sources. Popularity - Information about how many people are following this content. - Measured by the number of followers, and the number of responses. Content modalities

9 Influence - Analyse if this content is triggering discussions or other actions in the social sphere. - Measured by number of retweets/shares. Originality - Check whether the content or parts thereof have been used in the past (e.g., reused text or images that have appeared in the past). Authenticity - Check whether the content has been changed with respect to its original state (e.g., changed text or attached multimedia content) Objectivity and Diversity - Measured by the variation of opinions found for people, content, or general entities. Content modalities

C3 CONTEXT 10

11 Cross-checking - Measured by the number of different reports or mentions about the same thing coming from independent sources Coherence - Measurement of text coherence (e.g., Coh-Metrix) and coherence between the content and tags, attached web-links, or attached multimedia. Proximity - Measurement of coherence between reference location/time and publication location/time. Context modalities

12 How to combine all these parameters?

13 Approach for rating of modality parameters Rate parameters on 5-point discrete scale, from 0 to 4 - [0, a 0 ) → 0, [a 0, a 1 ) →1, [a 1, a 2 ) → 2, [a 2, a 3 ) → 3, [a 3, ∞) → 4. - a 0 : 20 th percentile, a 1 : 40 th percentile, a 2 : 60 th percentile, a 3 : 80 th percentile (adjust the scale so it follows a uniform distribution). Weight the rating of parameters for deriving a total score uniformly or based on their significance

14

15 Parameters studied Number of followers Number of tweets User account age Sample: ~10 M tweets, 5 K users Collection period: July-September 2013 Preliminary statistical results

16 Empirical distributions Heavy-tailed distributions Multimodal heavy-tailed distributions with three different peaks (6.7 months, 23.3 months, 4.4 yrs)

17 Correlation coefficients Friends - followers: Friends - tweets: 0.08 Followers - tweets: Conclusion: - all parameters relatively independent from one-another - need to be studied independently

18 Summary Defined Alethiometer: a framework taking into account all aspects: Contributor, Content and Context Showed an approach for combining the ratings of all parameters Attested the relative independence of parameters and the need to consider a variety of measures (also previously emphasized in the literature) Future work Investigate statistical properties of other modalities Extract the significance of modalities Study correlation between content, contributor and context modalities Summary and future work

 find us at  follow Questions & Answers