Download presentation
Presentation is loading. Please wait.
1
African Centre for Statistics
New data sources, including big data, in official statistics – Trend and work being done at global level Sub-regional workshop on integration of administrative data, big data and geospatial information for the compilation of SDG indicators April 2018 Molla Hunegnaw African Centre for Statistics
2
Contents Data Revolution Big Data vs big data Official data sources
New data sources Potential data sources Overview of Big Data projects Issues to consider as a way forward
3
A Definition of Data Revolution
An explosion in the volume of data, the speed with which data are produced, the number of producers of data, the dissemination of data, and the range of things on which there is data, coming from new technologies such as mobile phones and the “internet of things”, and from other sources, such as qualitative data, citizen-generated data and perceptions data... Drawing representing the Internet of things (IoT). The Internet of Things (IoT) is the network of physical devices, vehicles, home appliances and other items embedded with electronics, software, sensors, actuators, and connectivity which enables these objects to connect and exchange data.[1][2][3] Each thing is uniquely identifiable through its embedded computing system but is able to inter-operate within the existing Internet infrastructure. [SG’s Data Revolution Group in “A World that Counts”]
4
Concern for Africa With this definition, we are going to be left behind by the data revolution These things are not necessarily true in Africa
5
What constitutes a Data Revolution
Data deluge Open data and data access Data privacy / data protection Democratization of data Data analytics Data literacy Improving data production process Use of other data sources such as Big data Innovation ….. Small data is data that is 'small' enough for human comprehension.[1][2] It is data in a volume and format that makes it accessible, informative and actionable.[3] The term "big data" is about machines and "small data" is about people. The information explosion is the rapid increase in the amount of published information or data and the effects of this abundance. Information privacy, or data privacy (or data protection), is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, and the legal and political issues surrounding them. Data democratization is the ability for information in a digital format to be accessible to the average end user. The goal of data democratization is to allow non-specialists to be able to gather and analyze data without requiring outside help. Data literacy is the ability to derive meaningful information from data, just as literacy in general is the ability to derive information from the written word. The complexity of data analysis, especially in the context of big data, means that data literacy requires some knowledge of mathematics and statistics.
6
Existing sources for official Statistics
“Statistical” sources Sample survey: Systematic use of statistical methodology Direct control over data collection High cost, quality issues (non-response, survey errors) Census Allows results for small geographic areas, population sub-groups “Non-statistical” Sources Administrative data: Data for specific purposes, containing information on a complete group of units, updated continuously Tax data; credit card data; social insurance data; births, deaths, etc. There are two sources of data for statistics. Primary, or "statistical" sources are data that are collected primarily for creating official statistics, and include statistical surveys and censuses. Secondary, or "non-statistical" sources, are data that have been primarily collected for some other purpose (administrative data, private sector data etc.). Sample survey: Systematic use of statistical methodology Direct control over data collection High cost, quality issues (non-response, survey errors) Response burden Census Allows results for small geographic areas, population sub-groups High cost Administrative bodies: Data for specific purposes, containing information on a complete group of units, updated continuously Tax data; credit card data; social insurance data; births, deaths, etc. counts, etc. Quality issues
7
Big data sources Social Networks (human-sourced information)
Loosely structured Previously recorded in books and other forms Almost entirely digitized Takes different formats Traditional Business systems (process-mediated data) Usually structured Form of Administrative registers Becoming digital Internet of Things (machine-generated data) Growth in the number of sensors, machines to measure and record events Continually generated and real-time Source: UNECE 1. Social Networks (human-sourced information): this information is the record of human experiences, previously recorded in books and works of art, and later in photographs, audio and video. Human-sourced information is now almost entirely digitized and stored everywhere from personal computers to social networks. Data are loosely structured and often ungoverned. Social Networks: Facebook, Twitter, Tumblr etc. Blogs and comments Personal documents Pictures: Instagram, Flickr, Picasa etc. Videos: Youtube etc. Internet searches Mobile data content: text messages User-generated maps 2. Traditional Business systems (process-mediated data): these processes record and monitor business events of interest, such as registering a customer, manufacturing a product, taking an order, etc. The process-mediated data thus collected is highly structured and includes transactions, reference tables and relationships, as well as the metadata that sets its context. Traditional business data is the vast majority of what IT managed and processed, in both operational and BI systems. Usually structured and stored in relational database systems. (Some sources belonging to this class may fall into the category of "Administrative data"). 21. Data produced by Public Agencies Medical records 22. Data produced by businesses Commercial transactions Banking/stock records E-commerce Credit cards 3. Internet of Things (machine-generated data): derived from the phenomenal growth in the number of sensors and machines used to measure and record the events and situations in the physical world. The output of these sensors is machine-generated data, and from simple sensor records to complex computer logs, it is well structured. As sensors proliferate and data volumes grow, it is becoming an increasingly important component of the information stored and processed by many businesses. Its well-structured nature is suitable for computer processing, but its size and speed is beyond traditional approaches. 31. Data from sensors 311. Fixed sensors Home automation Weather/pollution sensors Traffic sensors/webcam Scientific sensors Security/surveillance videos/images 312. Mobile sensors (tracking) Mobile phone location Cars Satellite images 32. Data from computer systems Logs Web logs
8
What Big Data “Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.” “Big data is data sets that are so voluminous and complex that traditional data- processing application software are inadequate to deal with them.” Wikipedia Amazon SAS ORACLE IBM MICROSOFT
9
Big Data characteristics
Volume: The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not. Variety: The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from text, images, audio, video Velocity: In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Variability Inconsistency of the data set can hamper processes to handle and manage it. Veracity The data quality of captured data can vary greatly, affecting the accurate analysis
10
Potential data sources for official Statistics
Credit card data Health records Mobile phone data Open street maps Public transport Road sensors Satellite images Ship identification data Smart meter electricity Social media data Web scrapping Please note that the project should either use Big Data sources and/or utilize Big Data techniques, and ideally have some relevance or implications for official statistics, SDG indicators or other statistics needed for decision-making on public policies. The Global Working Group will review submissions and include those projects that meet these criteria, or possibly contact you for further information. Please note that the information submitted below, once approved, will be made public on the GWG Big Data Project Inventory website. Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. Web scraping a web page involves fetching it and extracting from it.[1][2] Fetching is the downloading of a page (which a browser does when you view the page). Therefore, web crawling is a main component of web scraping, to fetch pages for later processing. Once fetched, then extraction can take place. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else. An example would be to find and copy names and phone numbers, or companies and their URLs, to a list (contact scraping).
11
Overview of Big Data projects for official statistics
United Nations Global Pulse World Bank Group Universities National Statistics Offices Government departments and ministries Source: UN Big Data for Official Statistics Global Working Group
12
Overview of Big Data projects
Statistical domains Agriculture Business Crime Culture Demographic Economic and financial Energy Environment Labour Price Tourism Transport ……
13
Overview of Big Data projects
Domain Big Data source Project Agriculture Satellite image Estimate crop, type planting area and output, satellite Business Web scrapping ICT usage survey Demographic Mobile phones and satellite imagery, credit card Human mobility dataset, household budget survey Economic and financial Smart meters, scanner, web scrapping, online cash register Final Consumption Expenditure replace surveying individual household Energy Smart meters Non-occupancy rates, household consumption Environment Sensors Real-time environment information system Labour Web scrapping, online records Google trends for forecasting unemployment, labour market statistics Price Scanner, web scrapping, smart meter Consumer Price statistics Tourism Mobile data, online sources Inbound and outbound flows Transport Satellite imagery, sensor, mobile phone Population mobility, urban statistics, accessibility, remoteness Agriculture: Update the sampling frame by using data from land use survey and agriculture census
14
Overview of Big Data projects
Sweden 3 Brazil 2 Cameroon Czech Republic Ecuador Finland Guatemala Netherlands Singapore Spain Switzerland Uganda Albania 1 Australia Bangladesh Belarus Global 40 United States 21 Colombia 7 Europe China 6 Belgium 5 Italy Poland Hungary 4 Mexico Netherlands Uganda Canada 3 India Indonesia Ireland Chile 1 Congo - Democratic Republic of, Cote d'Ivoire, Ghana, Uganda, Zambia Denmark Germany Ghana Kenya Mongolia Mozambique Norway Pakistan Philippines Romania South Africa Sri Lanka Tunisia Turkey United Republic of Tanzania Vietnam, Indonesia 184 projects
15
Overview of Big Data projects
Cameroon 2 Congo - Democratic Republic of, Cote d'Ivoire, Ghana, Uganda, Zambia 1 Ghana Kenya Mozambique South Africa Tunisia Uganda 5 United Republic of Tanzania
16
The way forward Assist in the statistical process Deriving statistics
New insights Exploration / Feasibility stage Data quality Big Data capturing One size does not fit all Use your intuition Internet penetration 35.2 % Africa World Average 54.4 Facebook close to 14% Facebook users
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.