Ilyoung Hong Namseoul Univ Dep of GIS engineering

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

ArcGIS and the Web Scott Morehouse, ESRI 8 March 2010.
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
1 Using GIS to Understand Behavior Patterns of Twitter Users Yue Li M.S. Civil/Geomatics Engineering Purdue University Committee: Dr.Jie Shan (Chair),
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Harlan Shannon Meteorologist U.S. Department of Agriculture Office of the Chief Economist World Agricultural Outlook Board Washington D.C., U.S.A. An Overview.
IS 466 ADVANCED TOPICS IN INFORMATION SYSTEMS LECTURER : NOUF ALMUJALLY 20 – 11 – 2011 College Of Computer Science and Information, Information Systems.
GIS Overview. What is GIS? GIS is an information system that allows for capture, storage, retrieval, analysis and display of spatial data.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Evolution & Application of GIS
Geographic Information System - ArcView University at Buffalo Summer Institute 2003 May 12, 2003.
Chapter 14 The Second Component: The Database.
Pierangelo MASSA & Michele CAMPAGNA University of Cagliari, DICAAR COST ACTION TD1202 ESR EVENT 23 – 24 April 2015 IMREDD - Nice, FRANCE Mapping and the.
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
WPS Application Patterns at the Workshop “Models For Scientific Exploitation Of EO Data” ESRIN, October 2012 Albert Remke & Daniel Nüst 52°North Initiative.
Grid-based Analysis in GIS
Geographic Information Systems Cloud GIS. ► The use of computing resources (hardware and software) that are delivered as a service over the Internet ►
Python: An Introduction
Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.
material assembled from the web pages at
Future. Market One of the fastest growth job sectors in U.S. Geospatial Technology is a leader in innovation, with nanotechnology and biotechnology Geospatial.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
The 2000 Decennial Census School District Project: Using Census Data for the School District Mapping System **** Development and Implementation Tai A.
Enabling Technology for Participatory Spatial Decision Making Hans Voss Gennady Andrienko Natalia Andrienko Spatial Decision Support Team
1 3. Computing System Fundamentals 3.1 Language Translators.
Technical Workshops | Esri International User Conference San Diego, California Creating Geoprocessing Services Kevin Hibma, Scott Murray July 25, 2012.
Deploying a VGI application in one day Tom Brenneman.
1 CSC 321: Data Structures Fall 2013 See online syllabus (also available through BlueLine2): Course goals:  To understand.
Web based Hydrology and Water Resources Information System for India
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Applying Spatial Analysis Techniques to Make Better Decisions
John Pickford IBM H11 Wednesday, October 4, :30. – 14:30. Platform: Informix Practical Applications of IDS Extensibility (Part 2 of 2)
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Topic: What is a GIS?. Spatial Data: Data with a “spatial component” describing where something is located in on the earth. Formal Definition of GIS:
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
WFM 6202: Remote Sensing and GIS in Water Management © Dr. Akm Saiful IslamDr. Akm Saiful Islam WFM 6202: Remote Sensing and GIS in Water Management Dr.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Document Name CONFIDENTIAL Version Control Version No.DateType of ChangesOwner/ Author Date of Review/Expiry The information contained in this document.
Google maps engine and language presentation Ibrahim Motala.
William Perry U.S. Geological Survey Western Ecological Research Center Geography 375 Final Project May 22, 2013.
Slide 1 © 2016, Lera Technologies. All Rights Reserved. SAP BO vs SPLUNK vs OBIEE By Lera Technologies.
SEA LEVEL RISE TOOLBOX A Web/Mobile Application to Project the Rising Sea in South Florida Zhaohui Jennifer Fu Dan Mcgillicuddy, GIS Center, FIU Susan.
Part 1 The Basics of Information Systems. Purpose of Information Systems Information systems ◦ Collects, stores and organizes information ◦ Retrieves.
First appeared Features Popular uses Basic This language emphasises on ease of use, allowing general purpose programming to those with a small amount of.
Twitter Based Research Benny Bornfeld Mentors Professor Sheizaf Rafaeli Dr. Daphne Raban.
Mary Ganesan and Lora Strother Campus Tours Using a Mobile Device.
The BOP (Billion Object Platform) and WorldMap / Dataverse Integration Harvard Center for Geographic Analysis Tuesday, July 12, 2016 Ben Lewis, Mercè Crosas,
How to Get Started With Python
GIS Mapping for K-12 Students
CSC 222: Object-Oriented Programming
Twitter Data Mining and Sentiment Analysis
Eric Shook Department of Geography Kent State University
Corpus Linguistics I ENG 617
YangSun Lee*, YunSik Son**
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Weichuan Dong Qingsong Liu Zhengyong Ren Huanyang Zhao
Network Visualization
Preparing your Data using Python
CS & CS Capstone Project & Software Development Project
Preparing your Data using Python
NWT Centre for Geomatics
Course Introduction CSC 576: Data Mining.
Sentiment Analysis of Social Netizens
Capacity building on the use of Geospatial Data and Technologies
Big DATA.
Presentation transcript:

Ilyoung Hong Namseoul Univ Dep of GIS engineering Geosocial big data analysis using python and FOSS4G with the case study of Korean data Ilyoung Hong Namseoul Univ Dep of GIS engineering

Geosocial data Social Media- Tweeter, Facebook is the killer app for Smartphone Smart Phone with GPS generates lots of geotagged social data Social data with geotagged is called geosocial data Such as GeoTweet - geotagged tweet, 4sq Venues

Geosocial Data Researches Fujita, Hideyuki. "Geo-tagged Twitter collection and visualization system." Cartography and Geographic Information Science 40.3 (2013): 183-191. =>Computational method, data collection Jung, Jin‐Kyu. "Code clouds: Qualitative geovisualization of geotweets." The Canadian Geographer/Le Géographe canadien 59.1 (2015): 52-68. => qualitative approach, with content analysis Li, Linna, Michael F. Goodchild, and Bo Xu. "Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr." Cartography and Geographic Information Science 40.2 (2013): 61- 77. Spatial statistical analysis with geodemographic data, Mitchell, Lewis, et al. "The geography of happiness: Connecting twitter sentiment and expression, demographics, and objective characteristics of place." (2013): e64417. =>Sentimental analysis, computational linguistics approach, : 3,476,059 geoTweet , 14,858 geotweets ../ geotweet was 1% 19,758,954 tweet, 10 million geotagged tweets 

Multi Disciplinary Aspects of geosocial data analysis Statistics Linguistics Text Mining Sociology Journalism Media Data Management Qualitative Analysis GeoSocial Data Data Collection Data Visualization Data Analyzing Quantitative Analysis Web programming Database Management Geography, Cartography, GIS

challenges of geosocial research different data source, format Tweet, foursquare, Facebook, different analysis environment, difference software Java, php, Python, C, R, ArcGIS, web-programming, database programming, statistics, geovisualizatrion, different domain knowledge, multidisciplinary research methods Computation, geography, sociology, psychology, statistics, linguistics, media, journalism Need interdisciplinary cooperation, Are there any way to Integrate these methods?

Why python/foss4g for Geosocial Big Data? Integrated analysis environment in software, library Python is free and open. Object-oriented programming (OOP) in Python WinPython, Anaconda(SCIPY,Ipython), Enthought Canopy for Python 2.7 large amount of libraries, support different domain knowledge PyPI - the Python Package Index,  currently 66086 packages Simple Coding environment Quick to Learn and to code Readability The syntax of Python is readable and clear.

Research Purpose Data collection, management Introduce the intergrated platform to analysize the GeoSocial using python & FOSS4G Data collection, management Data Analysis, Qualitative & Quantitave methods Sentimenal Analysis Geovisualizaing Present the Case Study with Korean Geosocail Data GeoTweet distribution Spatial Patterns of Fousquare Venues Sentimenal Anlysis of Korean GeoTweet

Architecteture, at beginning Socail Media JSON Excel csv Shape ArcGIS Twitter/ Foursquare API

Data Collection Python Streaming API, tweepy limited rates for one user However, there is a restriction on data collection from Twitter: the method call of Twitter API is limited by 350 calls per hour for one authorized developer account switch to the other user id when reach to the limits unnecessary data.. filtering geotweet data is just 1% of total tweet

Columns from Tweet ● Date and Time => temporal analysis ● Tweet text; => qualitative approach, text mining, keword filter, sentimental analysis, ● Tweet ID; User ID; Destination user ID (only for tweets with “@user ID”); User profile (including location name input by user); => behavioral features, heavy user feature, social network, ● Location coordinates (only for tweets tagged with the location coordinates). Geovisualization, Spatial Analysis using GIS ● Date and Time => temporal analysis

until now, made two researches Spatial Analysis of Location-Based Social Networks in Seoul, Korea, Journal of Geographic Information System, 2015, 7, 259-265 Spatial Distribution of Korean Geotweets* Journal of the Korean Cartographic Association, 2015, 15(2), 93-101

Spatial Analysis of Location-Based Social Networks in Seoul, The purpose of this study is to analyze the spatial patterns of location-based social network (LBSN) data in Seoul using the spatial analysis techniques of geographic information system (GIS). The study explores the applications of LBSN data by analyzing the association between Seoul’s Foursquare venues data created based on user participation and the city’s characteristics. The data regarding Foursquare venues were compiled with a program we created based on Foursquare’s Python API. The compiled information was converted into GIS data, which in turn was depicted as a heat map. Cluster analysis was then performed based on hotspots and the correlation with census variables was analyzed for each administrative unit using geographically weighted regression (GWR). Based on analytical results, we were able to identify venue clusters around city centers, as well as differences in hotspots for various venue categories and correlations with census variables.

about 230,000 venue data were collected for analysis between March 15 and 21, 2015

Spatial Distribution of Korean Geotweets* In this study, we analyzed the distribution of Korean geotweet. Geotweet was analyzed, which was collected at November 2014 through Twitter Streaming API. Using the Python programming, it was carried out to analyze the collected data and GIS data conversion. Twitter use and distribution are concentrated at Seoul and the metropolitan areas and a few heavy users were creating a large number of tweets. Time series analysis showed the characteristics of the tweets that make up the highest point on the Weekend and forms the highest point at 14:00 during the day. In addition, differences in the content that appears every high percentage of retweets and regions through text analysis were also identified. Key Words : Tweeter API, Geotweet, Spatial distribution

Nov, 2014, over 2 million tweet was collected. Distribution of geotweet, Nov 2014 Spatial Distribution of geotweet Daily Distribution of geotweet, Nov 2014

Text analysis high percentage of retweet some keyword that represent regional features PyTag, Word_cloud

Problems Using Exoplanary Statistic Analysis, Repeated Works but the process is not automated Takes times, Data Error As time goes by, the data comes to be too big to handle. Need to be managed at database, not as a text file Data and Software show be compatible at the same environment for the automated analysis

Python & FOSS4G integrated analysis environment large amount of libraries, support different domain knowledge create the automated scripts for analysis

Social Media Server Twitter API - Tweepy pyspatialite GIS Data Server Data Collection Data Parsing GIS Data Server pyspatialite pyspatialite Data Conversion Spatialite Visualize Client Analysis Client Geovisualization Quantum GIS Shape/Text Sentiment Analysis Python NLTK PANDAS for Data Analysis WodCloud pytagcloud Statistical Analysis PySAL

Analysis Process GIS Social Database Media Data HeatMap Text Mining Word Clouds GIS Database Quantitatives Visualiing Method? Setiment Analysis Data Type? Social Media Data Analysis method? HeatMap Thematic Mapping Hotspot GWR GeoTaged? Quantatives Spatial Analysis Statisitcal Analysis

Spatialite Database, Why -Standalone & File Based Database: easy to handle - Compatable, interoperability: Python, QGIS, ArcGIS, export/import to any format - Easy to useability, GUI pyspatialite

Sentiment Analysis with Python NLTK Text Classification sentiment analysis using a NLTK Tweet Text => POS, NEU, NEG values

Heatmap using Quantum GIS 2015, July, geotweet

Hot, Best Postive Place Jongro HongDae youngsan

Word Cloud Jongro HongDae youngsan

Best Positive Tweet Happy Pride from Kat! #seoul #gaypride #kqcf2015 #korea #hugagaytoday @ Seoul City Hall Korea https://t.co/81TiNdqCMH #seoulgayprideparade HAPPY PRIDE DAY KOREA!!!! #rainbow #lgbt #love #happy #seoul #korea @ Seoul Plaza https://t.co/FUCkHxmIsc Good times and more Korean BBQ with the Samsung team #MobLabs #GangnamStyle @ Gangnam, Seoul, Korea https://t.co/NyIa440NZ3 Happy Sunday :) @ Myeongdong Cathedral https://t.co/TezVZTVtDH We go by the zoo via the "Elephant Train" to the museum @ Seoul Grand Park Zoo https://t.co/imXCgPrcBG Korean food is the best food #korea #food #nofiilter @ Seoul ,Korea https://t.co/MqVDHqqoEy Have a beautiful and fruitful week IG fam! #MondayLook #mamichoux @ Hongdae Seoul https://t.co/lVM5NdLJyp Happy the 4th of July to all my American friends! (@ Thursday Party in Seoul) https://t.co/CG27beaCQl And with Elizaveta from Russia :) @ Trickeye Museum https://t.co/7NCrGUYOF1 Quick tour of a Korean apartment @ Hongdae Seoul South Korea https://t.co/yTy8mAVCZk ..

Conclusion and Future Work Aanalysis of Geosocial Data is the complex, multidiciplanary process In this research, present the integrated architecture using Python & FOSS4G Future work automated processing with Python scripts Need more work on QGIS and PySAL for more advanced analysis and visualization