Download presentation
Presentation is loading. Please wait.
Published byJuliana Carter Modified over 9 years ago
1
1 Chapter VII Metadata and Data quality International Recommendations for Water Statistics (IRWS) – Chapter VII Metadata and Data quality Expert Group Meeting on the IRWS United Nations New York, 4-6 November 2008
2
Location in IRWS PART I Chapter 1: Introduction. Chapter 2: Main concepts and the SEEAW Chapter 3: Economic units Chapter 4: Data items PART II Chapter 5: Data collection strategy Chapter 6: Data sources compilation methods Metadata and data quality Chapter 7: Metadata and data quality Chapter 8: Dissemination Chapter 9: Chapter 9: Indicators ANNEXES Annex 1: Supplementary data items Annex 2: Link between data items and the SEEAW Annex 3: Link between data items and indicators of WWDR and MDG Annex 4: Link between data items and indicators of FAO GLOSSARY
3
Outline of Chapter Section A – IntroductionSection A – Introduction Section B – Metadata for water statisticsSection B – Metadata for water statistics Section C – Data quality dimensionsSection C – Data quality dimensions Section D – Data quality assessment frameworkSection D – Data quality assessment framework
4
Section B – Metadata Metadata is information used to describe data. A very short definition of metadata then is “data about data”. Metadata descriptions go beyond the pure form and contents of data. They are used to describe: Administrative facts about data (who created them, and when),Administrative facts about data (who created them, and when), How data were collected and processed before being disseminated or stored in a database.How data were collected and processed before being disseminated or stored in a database.
5
Metadata frameworks There are many metadata frameworks. These include, for example: Statistical Data and Metadata Exchange (SDMX);Statistical Data and Metadata Exchange (SDMX); Dublin Core Metadata Initiative (DCMI), ISO- 19115;Dublin Core Metadata Initiative (DCMI), ISO- 19115; FGDC (Federal Geographic Data Committee);FGDC (Federal Geographic Data Committee); Data Documentation Initiative (DDI);Data Documentation Initiative (DDI); Resource Description Framework (RDF).Resource Description Framework (RDF).
6
SDMX ElementDescription ContactIt describes contact points for the data or metadata, including how to reach the contact points. Metadata updateDate on which the metadata element was inserted or modified. Statistical presentationDescription of the table contents, with their data breakdowns. Release calendar policyDescribes the policy regarding the release of statistics according to a preannounced schedule (if available). Institutional frameworkRefers to a law or other formal provision that assigns responsibilities and authority to agencies for the collection, processing, and dissemination of the statistics, and includes any data sharing arrangements. TransparencyDescribes the policy on: - the availability of the terms and conditions under which statistics are collected, compiled, and disseminated - providing advanced notice of major changes in methodology, source data, and statistical techniques - internal governmental access to statistics prior to their release; the policy on statistical products’ identification. Related quality reportsReference to available quality reports for the data.
7
SDMX …. cont Comparability and coherenceThe extent to which differences between statistics from different geographical areas, non-geographical domains, or over time, can be attributed to differences between the true values of the statistics. Accuracy and reliabilityThe accuracy of statistical information is the degree to which the information correctly describes the phenomena it was designed to measure. Statistical conceptsThe statistical concept under measure and the organisation of data, i.e. the type of variables included in the domain of study, and the source of these concepts (i.e. SNA, SEEAW, IRWS other). Scope and coverageScope/Coverage describes the coverage of the statistics and how consistent this is with internationally accepted standards, guidelines, or good practices. Source dataDescription of the data collection programs and their adequacy for the production of statistics, including meeting the requirements for methodological frameworks, scope, classifications systems, and basis for recording. Data validationValidation describes methods and processes for routinely assessing microdata and macrodata. RelevanceRefers to the processes for monitoring the relevance and practical utility of existing statistics in meeting users’ needs and how these processes inform the development of statistical programs. Quality assuranceRefers to processes in place to focus on quality, to monitor the quality of the statistical programs, to deal with quality considerations in planning the statistical programs.
8
Section C – Dimensions of data quality Source: Laliberte, Grunewald and Probst (2003): Data Quality: A Comparison of IMF’s Data Quality Assessment Framework (DQAF) and Eurostat’s Quality Definition. Available from http://www.oecd.org/datao ecd/26/3/17831984.pdf
9
Section C – Dimensions of data quality Prerequisites of qualityPrerequisites of quality AccessibilityAccessibility AccuracyAccuracy CoherenceCoherence CredibilityCredibility RelevanceRelevance TimelinessTimeliness
10
Prerequisites of quality These are all institutional and organisational conditions that have an impact on the quality of water statistics. These include: The legal basis for compilation of data; adequacy of data sharing and coordination among data producing agencies;The legal basis for compilation of data; adequacy of data sharing and coordination among data producing agencies; Assurance of confidentiality of data provided by data producers and respondents,Assurance of confidentiality of data provided by data producers and respondents, Adequacy of human, financial, and technical resources for implementation of water statistics programmes and implementation of measures to ensure their efficient use;Adequacy of human, financial, and technical resources for implementation of water statistics programmes and implementation of measures to ensure their efficient use; Quality awareness by staff and data producers.Quality awareness by staff and data producers.
11
Accessibility This includes: The ease with which the existence of information can be ascertained;The ease with which the existence of information can be ascertained; The suitability of the form (e.g. standard tables or water indicators);The suitability of the form (e.g. standard tables or water indicators); The media (e.g. web or paper publications) of dissemination through which the information can be accessed;The media (e.g. web or paper publications) of dissemination through which the information can be accessed; Availability of metadata;Availability of metadata; Existence of user support services and an advance released calendar.Existence of user support services and an advance released calendar.
12
Accuracy The degree to which the data correctly estimate the data items. Accuracy has many attributes and in practice there is not a single aggregate or overall measure of accuracy. In general, it is characterized in terms of errors in statistical estimates and is traditionally decomposed into bias (systematic error) and variance (random error). Accuracy depends upon the quality of data collected and the processes undertaken by statistical offices to reduce errors at all stages of the data collection process.
13
Coherence This reflects the degree to which the data are logically connected and mutually consistent, i.e. they can be successfully brought together with other statistical information The use of standard concepts, classifications and statistical populations promotes coherence, as does the use of common methodology across water data collections. The use of standard concepts, classifications and statistical populations promotes coherence, as does the use of common methodology across water data collections. Coherence does not necessarily imply full numerical consistency. Coherence has four important sub-dimensions: coherence within a dataset, coherence across datasets, coherence over time, and coherence across countries.
14
Credibility This refers to the confidence that users have in the producers of the data. Users’ confidence is built over time. One important aspect is trust in the objectivity of the data. That is the data are perceived to be produced professionally in accordance with appropriate statistical standards, such as the SEEAW and IRWS, and that policies and practices are transparent. For example, data should not be released in response to political pressure
15
Relevance This reflects the degree to which the data meets the needs of users. Measuring relevance requires identification of user groups and their needs. Some indicators of relevance are: Some indicators of relevance are: The use of data by key usersThe use of data by key users The number of requests for data by all usersThe number of requests for data by all users The results of user satisfaction surveysThe results of user satisfaction surveys
16
Timeliness This refers to the amount of time between the end of the reference period, and the date on which the data are released.This refers to the amount of time between the end of the reference period, and the date on which the data are released. The timeliness of information influences its relevance.The timeliness of information influences its relevance. Often timeliness is a trade-off against accuracy.Often timeliness is a trade-off against accuracy. Timeliness is related to the existence of a publication schedule. A publication schedule comprises a set of release dates or may involve a commitment to release water data within a prescribed time period from their receipt.Timeliness is related to the existence of a publication schedule. A publication schedule comprises a set of release dates or may involve a commitment to release water data within a prescribed time period from their receipt. Punctuality is is the amount of time between the announced release date and the actual release dataPunctuality is is the amount of time between the announced release date and the actual release data
17
Section D – Data quality assessment framework Data collection process 1.Identify 2.Review 3.Collect 4.Compile 5.Disseminate
18
Section D – Data quality assessment framework 0. Prerequisites for data quality 0.1. Institutional arrangements support the development of water statistics. Prerequisite 0.2. Legal arrangements support the development of water statistics.Prerequisite 0.3. The production and dissemination of water statistics are guided by professional principles, policies and practices. Prerequisite 0.4. Staff, facilities, computing resources, and financing are commensurate with statistical programs Prerequisite 0.5. Data quality is considered at all stages of statistical development.Prerequisite 1. Identifying what information to produce 1.1. Mechanisms are in place to identify new and emerging water information needs Relevance 1.2. Data items are identified and selected based on information needs.Relevance
19
Section D – Data quality assessment framework (cont.) 2. Reviewing existing water data 2.1. Data quality are assessed against relevant data quality indicators and frameworks. Accuracy 2.2. Gaps in existing data and information have recently been identified and recorded (within the last 3 years). Coherence 2.3. Deficiencies with existing data and information (such as data quality issues) are identified and recorded (i.e. within the last 3 years). Coherence 3. Selecting and collecting data 3.1. The choice of data sources and statistical techniques are informed solely by statistical considerations. Credibility 3.2. Frames are regularly updated.Accuracy 3.3. Data collections are designed, and tested to ensure they collect relevant and accurate data. Accuracy 3.4. Data collections are conducted in a professional manner.Accuracy
20
Section D – Data quality assessment framework (cont.) 4. Compiling information 4.1. Data are compiled using international statistical standards, guidelines and best practices. Coherence 4.2. Data are compiled using standard classifications.Coherence 4.3. Data is compiled using reliable statistical methods and procedures.Accuracy 4.4. Revisions are made when required.Accuracy 5. Disseminating information to users 5.1. Decisions about dissemination are informed solely by statistical considerations.Credibility 5.2. Water statistics are disseminated to a range of audiences.Accessibility 5.3. Data dissemination includes information regarding water statistics publications ad publication schedules. Accessibility 5.4. Data dissemination includes support services.Accessibility 5.5. The relevance and practical utility of water statistics are monitored.Relevance 5.6. Publications are published on time and schedule.Timeliness
21
Questions to the EGM: 1.Should there be standard metadata for water statistics? 2.If so what should it be and which of the metadata frameworks is the most appropriate starting point for water statistics? 3.Which data quality framework is most appropriate starting point for water statistics? 4.Should we develop a data quality assessment framework as part of the IRWS or should this be part of the compilation guidelines?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.