African Centre for Statistics

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

Frank Yu Australian Bureau of Statistics Unstructured Data 1.
Open Government Vlora Ademi, Business Development Manager-Edu, Microsoft Macedonia &Kosovo
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
Big Data and Official Statistics The UN Global Working Group
Alexander Consulting Enterprise 9/7/2015 Marketing Research in Global Markets Why Bother?
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
The Johns Hopkins Center for Civil Society Studies OUT OF THE SHADOWS: Putting Civil Society on the Economic Map of the World Lester M. Salamon.
Web Engineering we define Web Engineering as follows: 1) Web Engineering is the application of systematic and proven approaches (concepts, methods, techniques,
Statistical Capacity Building Project Database Technical Assistance in Statistics Team Development Economic Data Group World Bank.
Chapter 6: Getting the Marketing Information We Need.
Big and Open Data: Challenges and Issues
Learning Objectives Understand the concepts of Information systems.
Bed Linen Markets in the World to 2017 Bharat Book Bureau.
Global Powered Lawn Mower Market to Market Size, Growth, and Forecasts in Nearly 70 Countries “This comprehensive publication enables readers the.
Global Potassic Fertilizer Market to Market Size, Growth, and Forecasts in Nearly 60 Countries “This comprehensive publication enables readers the.
Global Printing Ink Market to Market Size, Growth, and Forecasts in Over 70 Countries “This comprehensive publication enables readers the critical.
Summary of Annual Activities Related to Disability Statistics Cordell Golden National Center for Health Statistics United States Fifteenth Meeting of the.
Internet of Things – Getting Started
Global Golf Equipment Market to 2019 The report focuses on global major leading industry players with information such as company profiles, product picture.
New data sources (such as Big Data) and Traditional Sources Work Package 2.
Big Data and Official Statistics: Philippine Context Erniel B. Barrios.
Global Vitamin and Provitamin Market Size, Share, Global Trends, Company Profiles, Demand, Insights, Analysis, Research, Report, Opportunities, 2018 Published.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
Internet of Everything (IoE) Market to Global Analysis and Forecasts by End-user Verticals and Technologies No of Pages: 150 Publishing Date: Jan.
Context Rich Systems Market to Global Analysis and Forecasts by Component, Device and Vertical No of Pages: 150 Publishing Date: Feb 2017 Single.
Assessment Of The Global Construction Market And Growth Trends In Global Economy, 2021 Published: Apr 2017 Single User PDF: US$ 4950 Order this report.
Marketing Research in Global Markets
Date: March. 30, Monday Evening.
Discovering Computers 2011: Living in a Digital World Chapter 3
PRIMARY DATA vs SECONDARY DATA RESEARCH Lesson 23 June 2016
Objectives Overview Identify the four categories of application software Describe characteristics of a user interface Identify the key features of widely.
Discovering Computers 2010: Living in a Digital World Chapter 14
Priorities and coordination of capacity building in Azerbaijan
Application Software Chapter 6.
Brexit and Trump – globalisation gets a bashing
Objectives Overview Identify the four categories of application software Differentiate among the seven forms through which software is available Explain.
© 2013 Jones and Bartlett Learning, LLC, an Ascend Learning Company All rights reserved. Page 1 Fundamentals of Information Systems.
WELCOME Mobile Applications Testing
Forest Products Conversion Factors
Application of the Internet
Social Media Data Mining
Basic Introduction to Computers
the Need for Data Integration
ICT for development and E-Commerce
United Nations Development Account 10th Tranche Statistics and Data
The importance of administrative data in the era of SDGs
Marek Šturc European Commission - Eurostat
Business Communication
2.2 Characteristics of units
Investigation of the Potential of Big Data in EGYPT
Scanning the environment: The global perspective on the integration of non-traditional data sources, administrative data and geospatial information Sub-regional.
Sub-regional workshop on integration of administrative data, big data
Uses of web scraping for official statistics
IT Megatrends that shape the Digital Future…
Big Data ESSNet WP 1: Web scraping / Job Vacancies Pilot
Business Intelligence
International Statistics
Computer Hardware Global Market Report Segments And Insights To
Global Patient Monitoring Devices Market Report Segments And Insights To
Big DATA.
Ðì SA Effective Monitoring and Evaluation of Progress on the SDGs Monitoring SDGs : the perspective of Armstat Learning Conference: Implementing.
Welcome to The World of Internet of Things
Ethical Implications of using Big Data for Official Statistics
Status of implementation of e-agriculture in Europe, including Western Balkans Mihaly Csoto, FAO Consultant / National University of Public Service (NUPS)
Mobile Commerce and Ubiquitous Computing
Country Report of the Statistical Center of Iran for Workshop on Integrated Economic Statistics and Informal Sector for ECO Member Countries November.
Big Data in Official Statistics: Generalities
Public Safety Analytics Market Research Report By Forecast to 2023 Industry Survey, Growth, Competitive Landscape and Forecasts to 2023 PREPARED BY Market.
Presentation transcript:

African Centre for Statistics New data sources, including big data, in official statistics – Trend and work being done at global level Sub-regional workshop on integration of administrative data, big data and geospatial information for the compilation of SDG indicators  23 - 25 April 2018 Molla Hunegnaw African Centre for Statistics

Contents Data Revolution Big Data vs big data Official data sources New data sources Potential data sources Overview of Big Data projects Issues to consider as a way forward

A Definition of Data Revolution An explosion in the volume of data, the speed with which data are produced, the number of producers of data, the dissemination of data, and the range of things on which there is data, coming from new technologies such as mobile phones and the “internet of things”, and from other sources, such as qualitative data, citizen-generated data and perceptions data... Drawing representing the Internet of things (IoT). The Internet of Things (IoT) is the network of physical devices, vehicles, home appliances and other items embedded with electronics, software, sensors, actuators, and connectivity which enables these objects to connect and exchange data.[1][2][3] Each thing is uniquely identifiable through its embedded computing system but is able to inter-operate within the existing Internet infrastructure. [SG’s Data Revolution Group in “A World that Counts”]

Concern for Africa With this definition, we are going to be left behind by the data revolution These things are not necessarily true in Africa

What constitutes a Data Revolution Data deluge Open data and data access Data privacy / data protection Democratization of data Data analytics Data literacy Improving data production process Use of other data sources such as Big data Innovation ….. Small data is data that is 'small' enough for human comprehension.[1][2] It is data in a volume and format that makes it accessible, informative and actionable.[3] The term "big data" is about machines and "small data" is about people. The information explosion is the rapid increase in the amount of published information or data and the effects of this abundance. Information privacy, or data privacy (or data protection), is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, and the legal and political issues surrounding them. Data democratization is the ability for information in a digital format to be accessible to the average end user. The goal of data democratization is to allow non-specialists to be able to gather and analyze data without requiring outside help. Data literacy is the ability to derive meaningful information from data, just as literacy in general is the ability to derive information from the written word. The complexity of data analysis, especially in the context of big data, means that data literacy requires some knowledge of mathematics and statistics.

Existing sources for official Statistics “Statistical” sources Sample survey: Systematic use of statistical methodology Direct control over data collection High cost, quality issues (non-response, survey errors) Census Allows results for small geographic areas, population sub-groups “Non-statistical” Sources Administrative data: Data for specific purposes, containing information on a complete group of units, updated continuously Tax data; credit card data; social insurance data; births, deaths, etc. There are two sources of data for statistics. Primary, or "statistical" sources are data that are collected primarily for creating official statistics, and include statistical surveys and censuses. Secondary, or "non-statistical" sources, are data that have been primarily collected for some other purpose (administrative data, private sector data etc.). Sample survey: Systematic use of statistical methodology Direct control over data collection High cost, quality issues (non-response, survey errors) Response burden Census Allows results for small geographic areas, population sub-groups High cost Administrative bodies: Data for specific purposes, containing information on a complete group of units, updated continuously Tax data; credit card data; social insurance data; births, deaths, etc. counts, etc. Quality issues

Big data sources Social Networks (human-sourced information) Loosely structured Previously recorded in books and other forms Almost entirely digitized Takes different formats Traditional Business systems (process-mediated data) Usually structured Form of Administrative registers Becoming digital Internet of Things (machine-generated data) Growth in the number of sensors, machines to measure and record events Continually generated and real-time Source: UNECE 1. Social Networks (human-sourced information): this information is the record of human experiences, previously recorded in books and works of art, and later in photographs, audio and video. Human-sourced information is now almost entirely digitized and stored everywhere from personal computers to social networks. Data are loosely structured and often ungoverned.   1100. Social Networks: Facebook, Twitter, Tumblr etc.   1200. Blogs and comments   1300. Personal documents   1400. Pictures: Instagram, Flickr, Picasa etc.   1500. Videos: Youtube etc.   1600. Internet searches   1700. Mobile data content: text messages   1800. User-generated maps   1900. E-Mail   2. Traditional Business systems (process-mediated data): these processes record and monitor business events of interest, such as registering a customer, manufacturing a product, taking an order, etc. The process-mediated data thus collected is highly structured and includes transactions, reference tables and relationships, as well as the metadata that sets its context. Traditional business data is the vast majority of what IT managed and processed, in both operational and BI systems. Usually structured and stored in relational database systems. (Some sources belonging to this class may fall into the category of "Administrative data").   21. Data produced by Public Agencies       2110. Medical records   22. Data produced by businesses       2210. Commercial transactions       2220. Banking/stock records       2230. E-commerce       2240. Credit cards 3. Internet of Things (machine-generated data): derived from the phenomenal growth in the number of sensors and machines used to measure and record the events and situations in the physical world. The output of these sensors is machine-generated data, and from simple sensor records to complex computer logs, it is well structured. As sensors proliferate and data volumes grow, it is becoming an increasingly important component of the information stored and processed by many businesses. Its well-structured nature is suitable for computer processing, but its size and speed is beyond traditional approaches.   31. Data from sensors       311. Fixed sensors          3111. Home automation          3112. Weather/pollution sensors          3113. Traffic sensors/webcam          3114. Scientific sensors          3115. Security/surveillance videos/images       312. Mobile sensors (tracking)          3121. Mobile phone location          3122. Cars          3123. Satellite images   32. Data from computer systems       3210. Logs       3220. Web logs

What Big Data “Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.” https://www.gartner.com/it-glossary/big-data “Big data is data sets that are so voluminous and complex that traditional data- processing application software are inadequate to deal with them.” Wikipedia Amazon SAS ORACLE IBM MICROSOFT

Big Data characteristics Volume: The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not. Variety: The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from text, images, audio, video Velocity: In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Variability Inconsistency of the data set can hamper processes to handle and manage it. Veracity The data quality of captured data can vary greatly, affecting the accurate analysis

Potential data sources for official Statistics Credit card data Health records Mobile phone data Open street maps Public transport Road sensors Satellite images Ship identification data Smart meter electricity Social media data Web scrapping Please note that the project should either use Big Data sources and/or utilize Big Data techniques, and ideally have some relevance or implications for official statistics, SDG indicators or other statistics needed for decision-making on public policies. The Global Working Group will review submissions and include those projects that meet these criteria, or possibly contact you for further information. Please note that the information submitted below, once approved, will be made public on the GWG Big Data Project Inventory website. Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. Web scraping a web page involves fetching it and extracting from it.[1][2] Fetching is the downloading of a page (which a browser does when you view the page). Therefore, web crawling is a main component of web scraping, to fetch pages for later processing. Once fetched, then extraction can take place. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else. An example would be to find and copy names and phone numbers, or companies and their URLs, to a list (contact scraping).

Overview of Big Data projects for official statistics United Nations Global Pulse World Bank Group Universities National Statistics Offices Government departments and ministries Source: UN Big Data for Official Statistics Global Working Group

Overview of Big Data projects Statistical domains Agriculture Business Crime Culture Demographic Economic and financial Energy Environment Labour Price Tourism Transport ……

Overview of Big Data projects Domain Big Data source Project Agriculture Satellite image Estimate crop, type planting area and output, satellite Business Web scrapping ICT usage survey Demographic Mobile phones and satellite imagery, credit card Human mobility dataset, household budget survey Economic and financial Smart meters, scanner, web scrapping, online cash register Final Consumption Expenditure replace surveying individual household Energy Smart meters Non-occupancy rates, household consumption Environment Sensors Real-time environment information system Labour Web scrapping, online records Google trends for forecasting unemployment, labour market statistics Price Scanner, web scrapping, smart meter Consumer Price statistics Tourism Mobile data, online sources Inbound and outbound flows Transport Satellite imagery, sensor, mobile phone Population mobility, urban statistics, accessibility, remoteness Agriculture: Update the sampling frame by using data from land use survey and agriculture census

Overview of Big Data projects Sweden 3 Brazil 2 Cameroon Czech Republic Ecuador Finland Guatemala Netherlands Singapore Spain Switzerland Uganda Albania 1 Australia Bangladesh Belarus Global 40 United States 21 Colombia 7 Europe China 6 Belgium 5 Italy Poland Hungary 4 Mexico Netherlands Uganda Canada 3 India Indonesia Ireland Chile 1 Congo - Democratic Republic of, Cote d'Ivoire, Ghana, Uganda, Zambia Denmark Germany Ghana Kenya Mongolia Mozambique Norway Pakistan Philippines Romania South Africa Sri Lanka Tunisia Turkey United Republic of Tanzania Vietnam, Indonesia 184 projects

Overview of Big Data projects Cameroon 2 Congo - Democratic Republic of, Cote d'Ivoire, Ghana, Uganda, Zambia 1 Ghana Kenya Mozambique South Africa Tunisia Uganda 5 United Republic of Tanzania

The way forward Assist in the statistical process Deriving statistics New insights Exploration / Feasibility stage Data quality Big Data capturing One size does not fit all Use your intuition Internet penetration 35.2 % Africa World Average 54.4 Facebook https://www.internetworldstats.com/stats1.htm close to 14% Facebook users