Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.

Slides:



Advertisements
Similar presentations
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Advertisements

Enhancing Policy Decision Making with Large-Scale Digital Traces Vanessa Frias-Martinez University of Maryland NFAIS, February 2014.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
Kohonen Self Organising Maps Michael J. Watts
National Institute for Statistics and Geography (INEGI) is, from 2008, an autonomous institute in Technical and Managing matters. According to Mexican.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
Frank Yu Australian Bureau of Statistics Unstructured Data 1.
Tweetool ( version) Final Report Yilei Qian Computer Science University of Southern California A Twitter Recommend System.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin
United Nations Economic Commission for Europe Statistical Division NTTS 2015 – Satellite Workshop on Big Data March 9, 2015 The Big Data Project – The.
Scalable Text Mining with Sparse Generative Models
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
WORKING GROUP ON INTERNATIONAL CLASSIFICATIONS (GTCI) Meeting of the Expert Group on International Statistical Classifications New York, United States.
United Nations Economic Commission for Europe Statistical Division Big Data International Cooperation Steven Vale UNECE
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
United Nations Statistics Division National CensusInfo Training, INEGI, Aguascalientes, Mexico, 18 – 22 July 2011.
2.3 Methods for Big Data What is “Big Data”? Summarizing Big Data.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Measuring violence against women: the case of Mexico International Seminar on Gender Statistics, Incheon Korea 13 November 2013 Jimena Tovar.
UN Global Working Group on Big Data for Official Statistics NTTS Workshop 9 March 2015.
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
1 Growth strategy in Mexico Tokyo Japan, May 2014.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
CAPPS (CERN Apps forum) meeting #8 Sebastian Lopienski October 2014 Welcome CERN mobile.
National Datawarehouse for Traffic Information – Big Data supplier Els Rijnierse.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Big Data Quality, Partnerships and Privacy Teams.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
HBS Results, Topics  HBS general characteristic  Indicators obtained based on HBS  The main characteristics of households under the survey 
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
February, CONTEXT  CONSTITUTIONAL AMENDMENTS  Creation of the Statistical and Geographical Information System (SNIEG)  INEGI’s Autonomy (July.
Recent Mexican data sources in the study of international migration Seminar: The 2013 High-level Dialogue on International Migration and Development: Implications.
TEXT CLASSIFICATION USING MACHINE LEARNING Student: Hung Vo Course: CP-SC 881 Instructor: Professor Luo Feng Clemson University 04/27/2011.
Generating vector data and statistics from the Stamp survey Dr Humphrey Southall & Dr Brian Baily University of Portsmouth.
Figure 1 – Social Media Landscape 2015 (Source: FredCavazza.net)
Prediction of Influencers from Word Use Chan Shing Hei.
Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.
Unsupervised Learning
Topic Modeling using Latent Dirichlet Allocation
7th International Forum on Tourism Statistics Stockholm, Sweden, 9-11 June 2004 Current developments in expansion of Australia’s tourism data Stan Fleetwood.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
HIMMELI - Heuristic three-level Instrument combining urban Morphology, Mobility, service Environments and Locational Information Morphologically structured.
October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY.
1 Value Added of Export in Global Manufacturing (Global Manufacturing Production) May 2014.
Topic cluster of Streaming Tweets based on GPU-Accelerated Self Organizing Map Group 15 Chen Zhutian Huang Hengguang.
Culture satellite account of Mexico. Developing :  Environmental - economic accounts of ecosystems  Green Growth Indicators  Impact of mining on the.
Big Data Using Big Data for Cultures and Communities Jeremy Reffin Simon Wibberley CASM, University of Sussex Carl Miller CASM, Demos July 2014.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
INEGI´s Plans for CSPA Role of CSPA in the INEGI’s ITC Strategic Plan.
P.Demestichas (1), S. Vassaki(2,3), A.Georgakopoulos(2,3)
Twitter Data Mining and Sentiment Analysis
Sentiment analysis algorithms and applications: A survey
System for Semi-automatic ontology construction
February 16, 2012 – Carmen Brenner
Application of Classification and Clustering Methods on mVoC (Medical Voice of Customer) data for Scientific Engagement Yingzi Xu, Department of Statistics,
DATA SCIENCE Online Training at GoLogica
Self organizing networks
United Nations Development Account 10th Tranche Statistics and Data
Machine Learning for High-Throughput Stress Phenotyping in Plants
The Work programme
Ola Nordbeck Statistics Norway
Introduction to Sentiment Analysis
Big Data Big Data first appeared towards the end of the 1990’s and has become a buzz word in the last few years.
Presentation transcript:

Mobility analysis from Twitter data NTTS satellite Workshop on Big Data

Twitter as data source NoSQL Database Filter by: Geo-referenced Only México Real-time Tweets INEGI Twitter

Why Tweeter? Availability 1% of Tweets available without cost Around 12 M accounts in Mexico 700,000 accounts are geo-referenced Collection of 150 M of tweets since January 2014

Devices generating tweets in Mexico Android iPhone

Tweet collection infrastructure Unix “Red Hat” NoSql Database “Elasticsearch” Cluster (Hydra) Big Data Layers Test of Concept

General Process Every Day Collection Store Geo-Referenced Tweets 15M Set an Objective Filter and Process Generate outputs

Topics Mobility –Internal flows –Tourism –Borders commuting –National Roads Networks: Use of roads (planned) –Urban influence zones (planned) Subjective wellness –Based on text –Based on emoticons

Geo-referenced Tweets 2014

DF Internal mobility (from-to) México State To Mexico City From Mexico City Where we go when tweeting?

Internal Tourism Origin of Tourists visiting Guanajuato (1-3 February 2014)

Internal Tourism Origin of Tourists visiting Puebla (1-3 February 2014)

Use of twitter in long weekends Displacements to Puebla and Guanajuato before, on and after 1-3 February period

Border commuting México USA

National Roads Network

Urban Influence zones

Subjective Wellness Complement of existing survey –Subjective perceived wellness (monthly) Two approaches –Based on emoticons (possible international comparability) Netherlands experiments –Based on text (diversity of analysis, regionalisms) Text analysis infrastructure development

Methods and Tools Pioanalisis: Tool for collection of the training set (crowdsourcing) Machine learning (supervised and unsupervised), Support Vector Machines, Incremental Learning Random forest, Latent Dirchlet Allocation (LDA) SOM Neuronal Networks (SOM: Self Organizing Map) Classification Methods: Naive Bayes, Support Vector Machines (SVM), KNN, Word Count Dictionaries:Spanish Emotion Lexicon (SEL), KNN, AFINN, WordNet, ANEW

Partnerships International –UNECE ICHEC –UNSD –LAMBDoop –University of Pensylvania National –KioNetworks Dattlas –TecMilenioINFOTEC –Centro Geo –CIDE –CIMAT –Sectur Internal –INEGI General Directions

Conclusions We are in a discovery stage: –Findings going from ‘interesting’ to ‘valuable’ Lot of research needed: –… but we are getting a lot of knowledge and experience Partnerships are a must Combining other big data sources is an imminent next step New challenges and threats will appear –Costs increase? –Legal issues? –Methodologies and quality frameworks re-engineering)? –Evolution of traditional statistics? A lot of etcetera?

New statistics production landscape?

Conociendo México INEGI Informa