Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data.

Similar presentations


Presentation on theme: "Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data."— Presentation transcript:

1 Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

2 Twitter as data source NoSQL Database Filter by: Geo-referenced Only México Real-time Tweets INEGI Twitter

3 Why Tweeter? Availability 1% of Tweets available without cost Around 12 M accounts in Mexico 700,000 accounts are geo-referenced Collection of 150 M of tweets since January 2014

4 Devices generating tweets in Mexico Android iPhone

5 Tweet collection infrastructure Unix “Red Hat” NoSql Database “Elasticsearch” Cluster (Hydra) Big Data Layers Test of Concept

6 General Process Every Day Collection Store Geo-Referenced Tweets 15M Set an Objective Filter and Process Generate outputs

7 Topics Mobility –Internal flows –Tourism –Borders commuting –National Roads Networks: Use of roads (planned) –Urban influence zones (planned) Subjective wellness –Based on text –Based on emoticons

8

9 Geo-referenced Tweets 2014

10

11 DF Internal mobility (from-to) México State To Mexico City From Mexico City Where we go when tweeting?

12 Internal Tourism Origin of Tourists visiting Guanajuato (1-3 February 2014)

13 Internal Tourism Origin of Tourists visiting Puebla (1-3 February 2014)

14 Use of twitter in long weekends Displacements to Puebla and Guanajuato before, on and after 1-3 February period

15 Border commuting México USA

16 National Roads Network

17 Urban Influence zones

18 Subjective Wellness Complement of existing survey –Subjective perceived wellness (monthly) Two approaches –Based on emoticons (possible international comparability) Netherlands experiments –Based on text (diversity of analysis, regionalisms) Text analysis infrastructure development

19 Methods and Tools Pioanalisis: Tool for collection of the training set (crowdsourcing) Machine learning (supervised and unsupervised), Support Vector Machines, Incremental Learning Random forest, Latent Dirchlet Allocation (LDA) SOM Neuronal Networks (SOM: Self Organizing Map) Classification Methods: Naive Bayes, Support Vector Machines (SVM), KNN, Word Count Dictionaries:Spanish Emotion Lexicon (SEL), KNN, AFINN, WordNet, ANEW

20 Partnerships International –UNECE ICHEC –UNSD –LAMBDoop –University of Pensylvania National –KioNetworks Dattlas –TecMilenioINFOTEC –Centro Geo –CIDE –CIMAT –Sectur Internal –INEGI General Directions

21 Conclusions We are in a discovery stage: –Findings going from ‘interesting’ to ‘valuable’ Lot of research needed: –… but we are getting a lot of knowledge and experience Partnerships are a must Combining other big data sources is an imminent next step New challenges and threats will appear –Costs increase? –Legal issues? –Methodologies and quality frameworks re-engineering)? –Evolution of traditional statistics? A lot of etcetera?

22 New statistics production landscape?

23 Conociendo México 01 800 111 46 34 www.inegi.org.mx atencion.usuarios@inegi.org.mx @inegi_informa INEGI Informa


Download ppt "Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data."

Similar presentations


Ads by Google