Download presentation
Presentation is loading. Please wait.
Published bySharleen Quinn Modified over 9 years ago
1
Ilyoung Hong Namseoul Univ Dep of GIS engineering
Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering
2
Geosocial data Social Media- Tweeter, Facebook is the killer app for Smartphone Smart Phone with GPS generates lots of geotagged social data Social data with geotagged is called geosocial data Such as GeoTweet - geotagged tweet, 4sq Venues
3
Geosocial Data Researches
Fujita, Hideyuki. "Geo-tagged Twitter collection and visualization system." Cartography and Geographic Information Science 40.3 (2013): =>Computational method, data collection Jung, Jin‐Kyu. "Code clouds: Qualitative geovisualization of geotweets." The Canadian Geographer/Le Géographe canadien 59.1 (2015): => qualitative approach, with content analysis Li, Linna, Michael F. Goodchild, and Bo Xu. "Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr." Cartography and Geographic Information Science 40.2 (2013): Spatial statistical analysis with geodemographic data, Mitchell, Lewis, et al. "The geography of happiness: Connecting twitter sentiment and expression, demographics, and objective characteristics of place." (2013): e64417. =>Sentimental analysis, computational linguistics approach, : 3,476,059 geoTweet , 14,858 geotweets ../ geotweet was 1% 19,758,954 tweet, 10 million geotagged tweets
4
Multi Disciplinary Aspects of geosocial data analysis
Statistics Linguistics Text Mining Sociology Journalism Media Data Management Qualitative Analysis GeoSocial Data Data Collection Data Visualization Data Analyzing Quantitative Analysis Web programming Database Management Geography, Cartography, GIS
5
challenges of geosocial research
different data source, format Tweet, foursquare, Facebook, different analysis environment, difference software Java, php, Python, C, R, ArcGIS, web-programming, database programming, statistics, geovisualizatrion, different domain knowledge, multidisciplinary research methods Computation, geography, sociology, psychology, statistics, linguistics, media, journalism Need interdisciplinary cooperation, Are there any way to Integrate these methods?
6
Why python/foss4g for Geosocial Big Data?
Integrated analysis environment in software, library Python is free and open. Object-oriented programming (OOP) in Python WinPython, Anaconda(SCIPY,Ipython), Enthought Canopy for Python 2.7 large amount of libraries, support different domain knowledge PyPI - the Python Package Index, currently 66086 packages Simple Coding environment Quick to Learn and to code Readability The syntax of Python is readable and clear.
7
Research Purpose Data collection, management
Introduce the intergrated platform to analysize the GeoSocial using python & FOSS4G Data collection, management Data Analysis, Qualitative & Quantitave methods Sentimenal Analysis Geovisualizaing Present the Case Study with Korean Geosocail Data GeoTweet distribution Spatial Patterns of Fousquare Venues Sentimenal Anlysis of Korean GeoTweet
8
Architecteture, at beginning
Socail Media JSON Excel csv Shape ArcGIS Twitter/ Foursquare API
9
Data Collection Python Streaming API, tweepy
limited rates for one user However, there is a restriction on data collection from Twitter: the method call of Twitter API is limited by 350 calls per hour for one authorized developer account switch to the other user id when reach to the limits unnecessary data.. filtering geotweet data is just 1% of total tweet
10
Columns from Tweet ● Date and Time => temporal analysis
● Tweet text; => qualitative approach, text mining, keword filter, sentimental analysis, ● Tweet ID; User ID; Destination user ID (only for tweets with ID”); User profile (including location name input by user); => behavioral features, heavy user feature, social network, ● Location coordinates (only for tweets tagged with the location coordinates). Geovisualization, Spatial Analysis using GIS ● Date and Time => temporal analysis
11
until now, made two researches
Spatial Analysis of Location-Based Social Networks in Seoul, Korea, Journal of Geographic Information System, 2015, 7, Spatial Distribution of Korean Geotweets* Journal of the Korean Cartographic Association, 2015, 15(2),
12
Spatial Analysis of Location-Based Social Networks in Seoul,
The purpose of this study is to analyze the spatial patterns of location-based social network (LBSN) data in Seoul using the spatial analysis techniques of geographic information system (GIS). The study explores the applications of LBSN data by analyzing the association between Seoul’s Foursquare venues data created based on user participation and the city’s characteristics. The data regarding Foursquare venues were compiled with a program we created based on Foursquare’s Python API. The compiled information was converted into GIS data, which in turn was depicted as a heat map. Cluster analysis was then performed based on hotspots and the correlation with census variables was analyzed for each administrative unit using geographically weighted regression (GWR). Based on analytical results, we were able to identify venue clusters around city centers, as well as differences in hotspots for various venue categories and correlations with census variables.
13
about 230,000 venue data were collected for analysis between March 15 and 21, 2015
15
Spatial Distribution of Korean Geotweets*
In this study, we analyzed the distribution of Korean geotweet. Geotweet was analyzed, which was collected at November 2014 through Twitter Streaming API. Using the Python programming, it was carried out to analyze the collected data and GIS data conversion. Twitter use and distribution are concentrated at Seoul and the metropolitan areas and a few heavy users were creating a large number of tweets. Time series analysis showed the characteristics of the tweets that make up the highest point on the Weekend and forms the highest point at 14:00 during the day. In addition, differences in the content that appears every high percentage of retweets and regions through text analysis were also identified. Key Words : Tweeter API, Geotweet, Spatial distribution
16
Nov, 2014, over 2 million tweet was collected.
Distribution of geotweet, Nov 2014 Spatial Distribution of geotweet Daily Distribution of geotweet, Nov 2014
17
Text analysis high percentage of retweet
some keyword that represent regional features PyTag, Word_cloud
18
Problems Using Exoplanary Statistic Analysis, Repeated Works but the process is not automated Takes times, Data Error As time goes by, the data comes to be too big to handle. Need to be managed at database, not as a text file Data and Software show be compatible at the same environment for the automated analysis
19
Python & FOSS4G integrated analysis environment
large amount of libraries, support different domain knowledge create the automated scripts for analysis
20
Social Media Server Twitter API - Tweepy pyspatialite GIS Data Server
Data Collection Data Parsing GIS Data Server pyspatialite pyspatialite Data Conversion Spatialite Visualize Client Analysis Client Geovisualization Quantum GIS Shape/Text Sentiment Analysis Python NLTK PANDAS for Data Analysis WodCloud pytagcloud Statistical Analysis PySAL
21
Analysis Process GIS Social Database Media Data HeatMap
Text Mining Word Clouds GIS Database Quantitatives Visualiing Method? Setiment Analysis Data Type? Social Media Data Analysis method? HeatMap Thematic Mapping Hotspot GWR GeoTaged? Quantatives Spatial Analysis Statisitcal Analysis
22
Spatialite Database, Why -Standalone & File Based Database: easy to handle - Compatable, interoperability: Python, QGIS, ArcGIS, export/import to any format - Easy to useability, GUI pyspatialite
23
Sentiment Analysis with Python NLTK Text Classification
sentiment analysis using a NLTK Tweet Text => POS, NEU, NEG values
24
Heatmap using Quantum GIS 2015, July, geotweet
25
Hot, Best Postive Place Jongro HongDae youngsan
26
Word Cloud Jongro HongDae youngsan
27
Best Positive Tweet Happy Pride from Kat! #seoul #gaypride #kqcf2015 #korea Seoul City Hall Korea #seoulgayprideparade HAPPY PRIDE DAY KOREA!!!! #rainbow #lgbt #love #happy #seoul Seoul Plaza Good times and more Korean BBQ with the Samsung team #MobLabs Gangnam, Seoul, Korea Happy Sunday Myeongdong Cathedral We go by the zoo via the "Elephant Train" to the Seoul Grand Park Zoo Korean food is the best food #korea #food Seoul ,Korea Have a beautiful and fruitful week IG fam! #MondayLook Hongdae Seoul Happy the 4th of July to all my American friends! Thursday Party in Seoul) And with Elizaveta from Russia Trickeye Museum Quick tour of a Korean Hongdae Seoul South Korea ..
28
Conclusion and Future Work
Aanalysis of Geosocial Data is the complex, multidiciplanary process In this research, present the integrated architecture using Python & FOSS4G Future work automated processing with Python scripts Need more work on QGIS and PySAL for more advanced analysis and visualization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.