Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Market Research Ms. Roberts 10/12. Definition: The process of obtaining the information needed to make sound marketing decisions.
Research Challenges in the CarTel Mobile Sensor System Samuel Madden Associate Professor, MIT.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
GrooveSim: A Topography- Accurate Simulator for Geographic Routing in Vehicular Networks 簡緯民 P
VISIT: Virtual Intelligent System for Informing Tourists Kevin Meehan Intelligent Systems Research Centre Supervisors: Dr. Kevin Curran, Dr. Tom Lunney,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Evaluating Search Engine
Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event Detection Takehi Sakaki Makoto Okazaki @ymatsuo.
Search Engines and Information Retrieval
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1
Reliable Range based Localization and SLAM Joseph Djugash Masters Student Presenting work done by: Sanjiv Singh, George Kantor, Peter Corke and Derek Kurth.
A reactive location-based service for geo-referenced individual data collection and analysis Xiujun Ma Department of Machine Intelligence, Peking University.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Bayesian Filtering for Location Estimation D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello Presented by: Honggang Zhang.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
THE SECOND LIFE OF A SENSOR: INTEGRATING REAL-WORLD EXPERIENCE IN VIRTUAL WORLDS USING MOBILE PHONES Sherrin George & Reena Rajan.
Remote Sensing Laboratory Dept. of Information Engineering and Computer Science University of Trento Via Sommarive, 14, I Povo, Trento, Italy Remote.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
A Comparative Evaluation of HTML5 as a Pervasive Media Platform By Tom Melamed HP Ben Clayton HP Labs.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Search Engines and Information Retrieval Chapter 1.
Uichin Lee, Jihyoung Kim *, Eunhee Yi **, Juyup Sung, Mario Gerla * KAIST Knowledge Service Engineering * UCLA Computer Science ** LG UX R&D Lab
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Mirco Nanni, Roberto Trasarti, Giulio Rossetti, Dino Pedreschi Efficient distributed computation of human mobility aggregates through user mobility profiles.
Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.
Ubiquitous Human Computation
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
KSE631: Content Networking Uichin Lee KAIST KSE Feb. 07, 2012.
©2010 John Wiley and Sons Chapter 6 Research Methods in Human-Computer Interaction Chapter 6- Diaries.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Hiding in the Mobile Crowd: Location Privacy through Collaboration.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
KSE631: Content Networking Uichin Lee Feb. 07, 2011.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
1.Research Motivation 2.Existing Techniques 3.Proposed Technique 4.Limitations 5.Conclusion.
Providing User Context for Mobile and Social Networking Applications A. C. Santos et al., Pervasive and Mobile Computing, vol. 6, no. 1, pp , 2010.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Supervised Time Series Pattern Discovery through Local Importance
Preface to the special issue on context-aware recommender systems
Erasmus University Rotterdam
Summary Presented by : Aishwarya Deep Shukla
Social Media as Sensors
iSRD Spam Review Detection with Imbalanced Data Distributions
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Presentation transcript:

Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Outline Review recent papers: – Crowd-Sourced Sensing and Collaboration Using Twitter, WoWMOM 2010 – Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event Detection, WWW 2010 – Location-based Crowdsourcing: Extending Crowdsourcing to the Real World, NordiCHI 2010 – Social Sensors and Pervasive Services: Approaches and Perspectives, PerCol 2011 Understand the potential of ubiquitous human computation (+social networking)

Crowd-Sourced Sensing and Collaboration Using Twitter Murat Demirbas, Murat Ali Bayir, Cuneyt Gurcan Akcora, Yavuz Selim Yilmaz SUNY Buffalo WoWMOM 2010 Slides are based on

Cellphones! 3-4B cellphone users worldwide 1.13 billion phones sold in 2009 (36 per sec) vs 0.3 billion PCs 174M were smartphones – 15% (up from 12.8% in 2008) – Expected to exceed # feature phones

Status quo in cellphones Each device connects to the Internet – to download/upload data and – to accomplish a task that does not require collaboration and coordination

What is missing? An infrastructure to assist mobile users to perform collaboration and coordination ubiquitously Any user should be able to search & aggregate the data published by other users in a region

Our goal To provide a crowdsourced sensing and collaboration service using Twitter To enable aggregation and sharing of data; dynamically assign sensing tasks to other cellphone users

Why Twitter? Open publish-subscribe system: 105 million users, over 30 million users in US, 55 million tweets 600 million search queries everyday Each tweet has 140 char limit Twitter provides an open source search API and a REST API (that enables developers to access tweets, timelines, and user data) Different actors may integrate published data differently and can offer new services in unanticipated ways

Crowdsourcing architecture

Sensweet Employs the smartphone’s ability to work in the background without distracting a mobile user – Sense the surrounding environment and send the resulting data to Twitter To search and process sensor values on Twitter, we need to agree on a standard for publishing these sensor readings – Bio-code: Uses Twitter bio sections & allows users to search for the sensors they are looking for on-the-fly – TweetML: Uses pre-defined hashtags to improve searchability

Askweet Accepts a question from Twitter – tries to answer the question using the data on Twitter, potentially data published by Sensweets – if that is not possible, Askweet finds experts on Twitter and forwards the question to these experts (not clear how this was done in the paper) Parallelizable, easy to “cloudify” for scalable service provisioning

Applications Crowdsourced weather Noise map application Location-based queries (with Foursquare)

1. Crowdsourced weather Current weather, everybody on Twitter can be an expert Question to Askweet: “?Weather Loc:Buffalo,NY” Forwarded question:“How is the weather there now? reply 0 for sunny, 1 for cloudy, 2 for rainy, and 3 for snowy

Experimental results for NYC in different time slices

2. Noise map application Implemented a Sensweet client for the Nokia N97 Smartphone series Sensweet client detects a noise level of the surrounding environment and forwards this data to Twitter in the TweetML format Sound sample is classified into: Low, Medium, High state – Each level is modeled using normal distribution – Input signal is compared with 3 distributions (Low, Medium, and High)

Noise map application

Noise levels for a user

3. Location based queries Factual vs. non-factual queries – Factual: “hotels in Miami” – Non-factual: “Anyone knows any cheap, good hotel, price ranges between 100 to 200 dollars in Miami?” Traditional search engine performs poorly! Significant fraction of location-based queries (in Twitter) is non-factual – e.g., 63% of the queries were non-factual, while only 37% of them were factual (manual classification of 269 queries) Crowdsourcing Location-based Queries, Bulut et al., Pervasive Collaboration and Social Networking,

Location based queries Aardvark uses a social network of the asker to find suitable answerers for the query and forwards this query to the answerers, and returns any answer back to the asker. How about Twitter + Foursquare? – Use Foursquare to determine users that frequent the queried locale and that have interests on the queried category (e.g., food, nightlife) – Find a right set of people to ask!

[Questions to be asked] [Users] [Valid questions][Valid answers] [Questions detected][Answer detected] [Answer to be forwarded] Moderator Asker tweet starting with ? keyword checking (anyone, suggestion, where) label the category and quality of questions forwards validated questions to appropriate people (using Twitter bio or Foursquare info) Constantly polling Twitter account to check answers

Experiment Setup Question dataset consists of 269 questions that the system collected over Twitter and validated as acceptable by the moderators. Manually categorize questions as factual and nonfactual: 63% - non-factual; 37% factual Some examples of questions for each type.

Foursquare Reply Rate vs. Random User Reply Rate Foursquare

Response Time 13 minutes median response time which is comparable with Aardvark 50% of the answers were received within the first 20 minutes.

Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event Detection Takehi Sakaki Makoto Okazaki @ymatsuo Tokyo University WWW 2010 Conference

What’s happening? Twitter – is one of the most popular microblogging services – has received much attention recently Microblogging – is a form of blogging that allows users to send brief text updates – is a form of micromedia that allows users to send photographs or audio clips In this research, we focus on an important characteristic real-time nature

Real-time Nature of Microblogging – Twitter users write tweets several times in a single day. – There is a large number of tweets, which results in many reports related to events – We can know how other users are doing in real-time – We can know what happens around other users in real- time. social events parties baseball games presidential campaign disastrous events storms fires traffic jams riots heavy rain-falls earthquakes

Our Goals propose an algorithm to detect a target event – do semantic analysis on Tweet to obtain tweets on the target event precisely – regard Twitter user as a sensor to detect the target event to estimate location of the target produce a probabilistic spatio-temporal model for – event detection – location estimation propose Earthquake Reporting System using Japanese tweets

Twitter and Earthquakes in Japan a map of earthquake occurrences world wide a map of Twitter user world wide The intersection is regions with many earthquakes and large twitter users.

Twitter and Earthquakes in Japan Other regions: Indonesia, Turkey, Iran, Italy, and Pacific coastal US cities

Event detection algorithms do semantic analysis on Tweet – to obtain tweets on the target event precisely regard Twitter user as a sensor – to detect the target event – to estimate location of the target

Semantic Analysis on Tweet Search tweets including keywords related to a target event – Example: In the case of earthquakes “shaking”, “earthquake” Classify tweets into a positive class or a negative class – Example: “Earthquake right now!!” --- positive “Someone is shaking hands with my boss” --- negative – Create a classifier

Semantic Analysis on Tweet Create classifier for tweets – use Support Vector Machine(SVM) Features (Example: I am in Japan, earthquake right now!) – A: Statistical features (7 words, the 5 th word) the number of words in a tweet message and the position of the query within a tweet – B: Keyword features ( I, am, in, Japan, earthquake, right, now) the words in a tweet – C: Word context features (Japan, right) the words before and after the query word

Tweet as Sensor Data ・・・ tweets ・・・ Probabilistic model Classifier observation by sensors observation by twitter users target event target object Probabilistic model values Event detection from twitter Object detection in ubiquitous environment the correspondence between tweets processing and sensor data processing for event detection

Tweet as Sensor Data some users posts “earthquake right now!!” some earthquake sensors responses positive value We can apply methods for sensory data detection to tweets processing ・・・ tweets Probabilistic model Classifier observation by sensors observation by twitter users target event target object Probabilistic model values Event detection from twitter Object detection in ubiquitous environment ・・・ search and classify them into positive class detect an earthquake earthquake occurrence

Tweet as Sensor Data We make two assumptions to apply methods for observation by sensors Assumption 1: Each Twitter user is regarded as a sensor – a tweet → a sensor reading – a sensor detects a target event and makes a report probabilistically – Example: make a tweet about an earthquake occurrence “earthquake sensor” return a positive value Assumption 2: Each tweet is associated with time and location info – time : posting timestamp – location : GPS data or location information in user’s profile By processing time and location information, we can detect target events and find events’ locations

Probabilistic Model Why we need probabilistic models? – Sensor readings are noisy and sometimes sensors work incorrectly – We cannot judge whether a target event occurred or not from a single tweet – We have to calculate the probability of an event occurrence from a series of data We propose probabilistic models for – event detection from time-series data – location estimation from a series of spatial information

Temporal Model We must calculate the probability of an event occurrence from a set of sensor readings We examine the actual time-series data to create a temporal model

Temporal Model with Exponential Dist. Example: Earthquake and Typhoon

Spatial Model We must calculate the probability distribution of location of a target We apply Bayes filters to this problem which are often used in location estimation by sensors – Kalman Filters – Particle Filters

Bayesian Filters for Location Estimation Kalman Filters – are the most widely used variant of Bayes filters – approximate the probability distribution which is virtually identical to a uni-modal Gaussian representation – advantages: computational efficiency – disadvantages: limited to accurate sensors or sensors with high update rates

Bayesian Filters for Location Estimation Particle Filters – represent the probability distribution by sets of samples, or particles – advantages: able to represent arbitrary probability densities particle filters can converge to the true posterior even in non-Gaussian, nonlinear dynamic systems. – disadvantages: difficult to apply to high-dimensional estimation problems

Information Diffusion Related to Real-time Events Proposed spatiotemporal models need to meet one condition that – sensors are assumed to be independent We check if information diffusions about target events happen because – if an information diffusion happened among users, Twitter user sensors are not independent, they affect each other (correlation!)

Information Diffusion Related to Real-time Events Nintendo DS Game an earthquakea typhoon Information Flow Networks on Twitter In the case of an earthquake and a typhoon, very little information diffusion takes place on Twitter, compared to Nintendo DS Game → We assume that Twitter user sensors are independent about earthquakes and typhoons

Experiments and Evaluation We demonstrate performances of – tweet classification – event detection from time-series data → show this result in “application” – location estimation from a series of spatial information

Evaluation of Semantic Analysis Queries – Earthquake query: “shaking” and “earthquake” – Typhoon query:”typhoon” Examples to create classifier – 597 positive examples

Evaluation of Semantic Analysi We obtain highest F-value when we use Statistical features and all features. Keyword features and Word Context features don’t contribute much to the classification performance A user becomes surprised and might produce a very short tweet It’s apparent that the precision is not so high as the recall FeaturesRecallPrecisionF-Value Statistical87.50%63.64%73.69% Keywords87.50%38.89%53.85% Context50.00%66.67%57.14% All87.50%63.64%73.69%

Evaluation of Spatial Estimation Target events – earthquakes 25 earthquakes from August.2009 to October 2009 – typhoons name: Melor Baseline methods – weighed average simply takes the average of latitudes and longitudes – median simply takes the median of latitudes and longitudes Metric: distance from an epicenter – The smaller the better!

Evaluation of Spatial Estimation Tokyo Osaka actual earthquake center Kyoto estimation by median estimation by particle filter balloon: each tweets color : post time

Evaluation of Spatial Estimation Typhoon

Discussions of Experiments Particle filter performs better than other methods If the center of a target event is in an oceanic area, it’s more difficult to locate it precisely from tweets It becomes more difficult to make good estimation in less populated areas

Results of Earthquake Detection JMA intensity scale2 or more3 or more4 or more Num of earthquakes78253 Detected70(89.7%)24(96.0%)3(100.0%) Promptly detected*53(67.9%)20(80.0%)3(100.0%) Promptly detected: detected in a minutes JMA intensity scale: the original scale of earthquakes by Japan Meteorology Agency Period: Aug.2009 – Sep Tweets analyzed : 49,314 tweets Positive tweets : 6291 tweets by 4218 users We detected 96% of earthquakes that were stronger than scale 3 or more during the period.

Conclusions We investigated the real-time nature of Twitter for event detection Semantic analyses were applied to tweets classification We consider each Twitter user as a sensor and set a problem to detect an event based on sensory observations Location estimation methods such as Kaman filters and particle filters are used to estimate locations of events We developed an earthquake reporting system, which is a novel approach to notify people promptly of an earthquake event We plan to expand our system to detect events of various kinds such as rainbows, traffic jam etc.

Location-based Crowdsourcing: Extending Crowdsourcing to the Real World Alt et al. NordiCHI 2010

Motivation Crowdsourcing beyond the digital? – Seeker and solvers – Important aspects: right time and location for matchmaking. Several scenarios: – Recommendations on demand (e.g., buying something?) – Recording on demand (e.g., missing lectures?) – Remotely looking around? (e.g., apartment?) – Real-time weather information – Translations on demand

System Architecture

The mobile client screenshots: (a) Main menu where users can search tasks. (b) A sample task retrieved from the database.

Lessens learned Users prefer address-based task selection (GPS is too hard to parse) Picture tasks are most popular (easy to handle) Tasks were mainly solved at or close to home Tasks are solved after work Response times vary

Lessens learned Informative tasks are as popular as picture tasks Time-critical tasks are out of interest Solution should be achievable in 10 minutes Tasks are still solved after work Mid-day breaks are good times to search for task Solving a task can take up to one day Home and surrounding areas are the most favorite places for solving tasks Voluntary tasks have lower chance (monetary rewards: 77%) Users search for tasks in their current location

Social Sensors and Pervasive Services: Approaches and Perspectives Rosi et al., PerCol 2011

Social Sensors? Device intelligence with various on-board sensors such as GPS Human intelligence with “social sensors” – Twitter posts, Facebook status updates, pictures posted on Flickr – Personal information: shopping patterns, place visit patterns, etc. (with some potential social interactions)

Approaches to integrate social sensing and pervasive services

A: Extracting data from social networks – Detecting crowded sites (Fujisaka et al., 2010) – Mining landmarks from blogs (Ji et al., 2009) – Event detection using Flickr (Zhao et al., 2006) B: Exploiting social networks as a socio- pervasive middleware – Twitter with sensors (Demirbas et al., 2010) – S-Sensors with micro-blogging (Baqer et al., 2009) – Status update feeds to social networks (CenceMe)

Approaches to integrate social sensing and pervasive services C: Pervasive overlays on social networks – Interconnecting and sharing data sensed from personal devices with the rest of the world – SenseFace: Capture and process (local), and disseminate data (social nets) Dynamically mash-up sensor data and social networks D: App-specific socio-pervasive networks – Fusing mobile, sensor, and social data to fully enable context-aware computing

Some Issues Key issues – Rich data, yet comes at the cost of understanding the data – Sheer size (raw facts and data produced by sensors) Un-structured, noisy data – Unified data representation and interpretation – Overcoming uncertainty of data No guarantee on the delivery of specific info about facts and at specific times by social sensors – Systems require “critical mass”; heterogeneous popularity based on location (e.g., rural area vs. urban area) Completely out-of-loop of system managers and app developers