1 Unstructured Data (UD) What is unstructured data? How is it statistically valuable? Challenges of turning UD into information.

Slides:



Advertisements
Similar presentations
Role of NSOs in Analysis John Cornish. Analysis underpins effective NSO operations Analysis is broad in extent, and it supports all phases of the production.
Advertisements

C6 Databases.
Frank Yu Australian Bureau of Statistics Unstructured Data 1.
United Nations Economic Commission for Europe Statistical Division The Data Deluge: What Does It Mean for Official Statistics? Steven Vale UNECE
Open Government Vlora Ademi, Business Development Manager-Edu, Microsoft Macedonia &Kosovo
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Managing Data Resources
Databases Chapter Distinguish between the physical and logical view of data Describe how data is organized: characters, fields, records, tables,
Data and Knowledge Management
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Chapter 4: Managing Information Resources with Databases Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Chapter
INTRODUCTION In this presentation definition of ICT and its impacts-economic and social are discussed briefly. The topic of “ICT in education” is dealt.
Chapter 3 Foundations of Business Intelligence: Databases and Information Management.
What is Big Data? “… a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management.
CHAPTER 5 Data and Knowledge Management. CHAPTER OUTLINE 5.1 Managing Data 5.2 The Database Approach 5.3 Database Management Systems 5.4 Data Warehouses.
INTRANETS DEFINITION (from Cambridge International Dictionary of English) intra- Combining form used to form adjectives meaning 'within' (the stated place.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Page 1 © Hortonworks Inc – All Rights Reserved Hortonworks Naser Ali UK Building Energy Management Group Hadoop: A Data platform for businesses.
Session 1: Understanding the Value of Official statistics: Introduction Eurostat CES seminar, 9 th of April, 2014 Mariana Kotzeva, Adviser Hors Classe.
Processing and Analyzing Large log from Search Engine Meng Dou 13/9/2012.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Eszter Horvath United Nations Statistics Division Qatar National Statistics Day Doha, Qatar, 10 December 2013 Modernization of Official Statistics (Session.
CHAPTER 5 Data and Knowledge Management. CHAPTER OUTLINE 5.1 Managing Data 5.2 Big Data 5.3 The Database Approach 5.4 Database Management Systems 5.5.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
OBSERVATIONAL METHODS © 2012 The McGraw-Hill Companies, Inc.
Chapter 11: Qualitative and Mixed-Method Research Design
Case 2: Emerson and Sanofi Data stewards seek data conformity
Data and Knowledge Management
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Triangulation between commercial and administrative sources Workshop of the DUG Conference on 8 th October 2009.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Data and Knowledge Management CHAPTER Managing Data 5.2 The Database Approach 5.3 Database Management Systems 5.4 Data Warehouses and Data Marts.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
OBSERVATIONAL METHODS © 2009 The McGraw-Hill Companies, Inc.
Modernization of official statistics Eric Hermouet Statistics Division, ESCAP
CHAPTER 5 Data and Knowledge Management. CHAPTER OUTLINE 5.1 Managing Data 5.2 The Database Approach 5.3 Database Management Systems 5.4 Data Warehouses.
IT Enablement Approaches Large Business may have hundreds of processes to be enabled by IT. Several Types of Application may be deployed –Departmental.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
© 2012-Robert G Parker May 24, 2012 Page: 1 © 2012-Robert G Parker May 24, 2012 Page: 1 © 2012-Robert G Parker May 24, 2012 Page: 1 © 2012-Robert G Parker.
IoT Meets Big Data Standardization Considerations
6-1 Copyright © 2013 Pearson Canada Inc. Databases and Information Management CHAPTER SIX.
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
PREPARED BY: PN. SITI HADIJAH BINTI NORSANI. LEARNING OUTCOMES: Upon completion of this course, students should be able to: 1. Understand the structure.
4° ESSnet workshop on the EuroGroups Register Development of an enhanced EGR Vision EGR version 2.0.
Using administrative data to produce official social statistics New Zealand’s experience.
Foundations of Business Intelligence: Databases and Information Management MGMT172: Lecture 04.
What is Big Data? Dr Alasdair Rutherford University of Stirling.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Foundations of Business Intelligence: Databases and Information Management Chapter 6 VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors.
IS6146 Databases for Management Information Systems Lecture 12: Exam Revision Rob Gleasure robgleasure.com.
New data sources (such as Big Data) and Traditional Sources Work Package 2.
Managing Data Resources File Organization and databases for business information systems.
New NSW Geography syllabus 7-10
Big-Data Fundamentals
MANAGING DATA RESOURCES
One Language. One Enterprise.™
Business Intelligence
The ultimate in data organization
Dark Data Are we at risk?.
Big DATA.
Data Analysis and R : Technology & Opportunity
Presentation transcript:

1 Unstructured Data (UD) What is unstructured data? How is it statistically valuable? Challenges of turning UD into information

A way to describe data that is not contained in a database or some other type of data structure. Unstructured data can be textual or non-textual. These databases are sometimes called “NoSQL” 2 Unstructured data is:

Features of “unstructured” data  Does not reside in traditional databases  Does not fit a relational data model  Generated by both humans and machines  Facebook, Linked-in etc...  Machine-to-machine communication (IP address routing)  Examples include  Personal messaging – , instant messages, tweets, chat  Business documents – business reports, presentations, survey responses  Web content – web pages, blogs, wikis, audio files, photos, videos  Sensor output – satellite imagery, geolocation data, scanner transactions (transportation arrival and departures) 3

The value of unstructured data sources  Provide a rich source of information about people, households and economies  May enable the more accurate and timely measurement of a range of demographic, social, economic and environmental phenomena  Combined with traditional data sources  As a replacement for traditional data sources  So presents unprecedented opportunities for official statistics to  Improve delivery of current statistical outputs  Create new information products not possible with traditional data sources  ABS believes that the benefit should be demonstrated on a case- by-case basis – the improvement of end-to-end statistical outcomes in terms of objective criteria such as accuracy, relevance, consistency, interpretability, timeliness, and cost 4

Content analysis  Unstructured data must be analysed to extract and expose the information it contains  Different types of analysis are possible, such as:  Entity analysis – people, organisations, objects and events, and the relationships between them  Topic analysis – topics or themes, and their relative importance  Sentiment analysis – subjective view of a person to a particular topic  Feature analysis – inherent characteristics that are significant for a particular analytical perspective (e.g. land coverage in satellite imagery)  Many others 5

Scale: 40 Zettabyte [ZB] = Gigabyte [GB] = 6 1 ZB = bytes = 1024 Exabytes About 85% is unstructured data

Big Data  Data sets of such size, complexity and volatility that their business value cannot be fully realised with existing data capture, storage, processing, analysis and management capabilities 7 The systematic use of unstructured data is the ‘Big Data’ challenge!

Some other significant challenges  Validity of statistical inference  Sample biases  Model biases  Privacy and public trust  Disclosure threat due to mosaic effect  Data integrity  Missing, inconsistent and inaccurate data  Volatile sources  Data ownership and access  Public good versus commercial advantage  Value of private sector data  8

What are some ways to manage unstructured data? 9

“NoSQL” databases NoSQL databases are storage databases which do not use the SQL language. However they have there own ways of structuring this data. Some of them are: 10

MONGO DB Hadoop Cassandra CouchDB Hypertable 11