Frank Yu Australian Bureau of Statistics Unstructured Data 1.

Slides:



Advertisements
Similar presentations
Role of NSOs in Analysis John Cornish. Analysis underpins effective NSO operations Analysis is broad in extent, and it supports all phases of the production.
Advertisements

C6 Databases.
The database approach to data management provides significant advantages over the traditional file-based approach Define general data management concepts.
Sharing Enterprise Data Data administration Data administration Data downloading Data downloading Data warehousing Data warehousing.
United Nations Economic Commission for Europe Statistical Division The Data Deluge: What Does It Mean for Official Statistics? Steven Vale UNECE
Managing Data Resources
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Virtual Meetings Increasing Collaboration While Reducing Costs and Ensuring Business Continuity Ram Narayanaswamy CTO 8x8, Inc.
Data and Knowledge Management
Sabine Mendes Lima Moura Issues in Research Methodology PUC – November 2014.
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
Chapter 3 Foundations of Business Intelligence: Databases and Information Management.
Module 1: Overview of Information System in Organizations Chapter 2: How Organizations use IS.
What is Big Data? “… a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Informationsystems Goals of Information Systems  Organisations guided by strategic plan, where long term goals of organisation are identified  Following.
Metadata: Integral Part of Statistics Canada Quality Framework International Conference on Agriculture Statistics October 22-24, 2007 Marcelle Dion Director.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Copyright 2010, The World Bank Group. All Rights Reserved. Integrating Agriculture into National Statistical Systems Section A 1.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Multimedia Databases (MMDB)
Eszter Horvath United Nations Statistics Division Qatar National Statistics Day Doha, Qatar, 10 December 2013 Modernization of Official Statistics (Session.
1 Guidelines For The Future Sharing Best Practice For National Bibliographies In The Digital Era Neil Wilson Information Coordinator IFLA Bibliography.
CHAPTER 5 Data and Knowledge Management. CHAPTER OUTLINE 5.1 Managing Data 5.2 Big Data 5.3 The Database Approach 5.4 Database Management Systems 5.5.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Chapter 12 The Macro Environment – Technological Influences
Integrating Official Statistics and Geospatial Information – ABS experience Frank Yu First Assistant Statistician Project Management and Infrastructure.
7.1 Managing Data Resources Chapter 7 Essentials of Management Information Systems, 6e Chapter 7 Managing Data Resources © 2005 by Prentice Hall.
1 26 October 2013 Observation and Reflection on Official Statistics against Big Data Challenge Yuan Pengfei Research Institute of Statistical Sciences.
Case 2: Emerson and Sanofi Data stewards seek data conformity
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Language and Computation Day University of Essex 4 October 2005.
Triangulation between commercial and administrative sources Workshop of the DUG Conference on 8 th October 2009.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
We provide web based benchmarking, process diagnostics and operational performance measurement solutions to help public and private sector organisations.
DATA RESOURCE MANAGEMENT
Foundations of Business Intelligence: Databases and Information Management.
Trust, Influence, and Noise: Implications for Safety Surveilance Bill Rand Asst. Prof. of Marketing and Computer Science Director of the Center for Complexity.
Modernization of official statistics Eric Hermouet Statistics Division, ESCAP
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
1 Unstructured Data (UD) What is unstructured data? How is it statistically valuable? Challenges of turning UD into information.
ICT in Classroom Prepared by: Ymer LEKSI Kukes
IT Enablement Approaches Large Business may have hundreds of processes to be enabled by IT. Several Types of Application may be deployed –Departmental.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
IoT Meets Big Data Standardization Considerations
6-1 Copyright © 2013 Pearson Canada Inc. Databases and Information Management CHAPTER SIX.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Using administrative data to produce official social statistics New Zealand’s experience.
Foundations of Business Intelligence: Databases and Information Management MGMT172: Lecture 04.
Foundations of Business Intelligence: Databases and Information Management Chapter 6 VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors.
New data sources (such as Big Data) and Traditional Sources Work Package 2.
Managing Data Resources File Organization and databases for business information systems.
Big Data and Official Statistics: Philippine Context Erniel B. Barrios.
New NSW Geography syllabus 7-10
Databases and Information Management
MANAGING DATA RESOURCES
One Language. One Enterprise.™
The ultimate in data organization
Big Data in Official Statistics: Generalities
Presentation transcript:

Frank Yu Australian Bureau of Statistics Unstructured Data 1

What I will cover in this talk What unstructured data is and its value for official statistics The challenges that it presents statistical production What the ABS is doing to progress capability in this area 2

Features of “unstructured” data Does not reside in traditional databases and data warehouses May have an internal structure, but does not fit a relational data model Generated by both humans and machines  Textual and multimedia content  Machine-to-machine communication Examples include  Personal messaging – , instant messages, tweets, chat  Business documents – business reports, presentations, survey responses  Web content – web pages, blogs, wikis, audio files, photos, videos  Sensor output – satellite imagery, geolocation data, scanner transactions 3

The value of unstructured data sources Provide a rich source of information about people, households and economies May enable the more accurate and timely measurement of a range of demographic, social, economic and environmental phenomena  Combined with traditional data sources  As a replacement for traditional data sources So presents unprecedented opportunities for official statistics to  Improve delivery of current statistical outputs  Create new information products not possible with traditional data sources ABS believes that the benefit should be demonstrated on a case-by- case basis – the improvement of end-to-end statistical outcomes in terms of objective criteria such as accuracy, relevance, consistency, interpretability, timeliness, and cost 4

Content analysis For unstructured data to be useful it must be analysed to extract and expose the information it contains Different types of analysis are possible, such as:  Entity analysis – people, organisations, objects and events, and the relationships between them  Topic analysis – topics or themes, and their relative importance  Sentiment analysis – subjective view of a person to a particular topic  Feature analysis – inherent characteristics that are significant for a particular analytical perspective (e.g. land coverage in satellite imagery)  Many others Techniques and tools already exist or being developed … 5

But the scale is mind-boggling 6 1 ZB = bytes = 1024 Exabytes About 85% is unstructured data

Big Data Data sets of such size, complexity and volatility that their business value cannot be fully realised with existing data capture, storage, processing, analysis and management capabilities 7 The systematic use of unstructured data is a Big Data challenge!

Some other significant challenges Validity of statistical inference  Sample biases  Model biases Privacy and public trust  Disclosure threat due to mosaic effect Data integrity  Missing, inconsistent and inaccurate data  Volatile sources Data ownership and access  Public good versus commercial advantage  Value of private sector data 8

ABS work in this area Established an research program led by Methodology Division to build a sound foundation for the mainstream use of Big Data – particularly unstructured data – in statistical production and analysis Investigating techniques and technology solutions for future enterprise systems – such as open-source, NSI-source, and commercial software products Particular areas of interest  Machine learning  Multidimensional data visualisation  Semantic Web methods  Distributed computing 9

Some key initiatives for Satellite imagery for agricultural statistics – use of satellite sensor data for the production of agricultural statistics such as land use, crop type and crop yield. Mobile device location data for population mobility – use of mobile device location-based services and/or global positioning for measuring population mobility Visualisation for exploratory data analysis – advanced visualisation techniques for the exploratory analysis of structured and unstructured data sets Automated entity analysis of unstructured data –techniques for the extraction and resolution of concepts, entities and facts from text data 10

Questions? 11