Data Quality Assurance in Cooperative Information Systems: a Multi-dimension Quality Certificate Cinzia Cappiello1, Chiara Francalanci1, Barbara Pernici1,

Slides:



Advertisements
Similar presentations
The Next Generation Grid Kostas Tserpes, NTUA Beijing, 22 of June 2005.
Advertisements

Quality Data for a Healthy Nation by Mary H. Stanfill, RHIA, CCS, CCS-P.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
OASIS Reference Model for Service Oriented Architecture 1.0
DESIGNING A PUBLIC KEY INFRASTRUCTURE
SOX and IT Audit Programs John R. Robles Thursday, May 31, Tel:
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
Cryptography and Network Security Third Edition by William Stallings Lecture slides by Lawrie Brown.
Service Broker Lesson 11. Skills Matrix Service Broker Service Broker, provides a solution to common problems with message delivery and consistency that.
City Hall of Iasi Ethics in e-guidance, privacy and security devices Date: Author: Cristina Nucuta.
MagicNET: Security Architecture for Discovery and Adoption of Mobile Agents Presented By Mr. Muhammad Awais Shibli.
A Framework for Automated Web Application Security Evaluation
1 Multi Cloud Navid Pustchi April 25, 2014 World-Leading Research with Real-World Impact!
Mobile Agent Technology for the Management of Distributed Systems - a Case Study Claudia Raibulet& Claudio Demartini Politecnico di Torino, Dipartimento.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Session ID: Session Classification: Dr. Michael Willett OASIS and WillettWorks DSP-R35A General Interest OASIS Privacy Management Reference Model (PMRM)
Software Engineering Quality What is Quality? Quality software is software that satisfies a user’s requirements, whether that is explicit or implicit.
Other Quality Attributes Other Important Quality attributes Variability: a special form of modifiability. The ability of a system and its supporting artifacts.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
A Flexible Access Control Model for Web Services Elisa Bertino CERIAS and CS Department, Purdue University Joint work with Anna C. Squicciarini – University.
Topic 1 – Introduction Huiqun Yu Information Security Principles & Applications.
Any data..! Any where..! Any time..! Linking Process and Content in a Distributed Spatial Production System Pierre Lafond HydraSpace Solutions Inc
Extracting value from grey literature Processes and technologies for aggregating and analysing the hidden Big Data treasure of the organisations.
Cryptography and Network Security Chapter 1. Background  Information Security requirements have changed in recent times  traditionally provided by physical.
Tool Support for Testing Classify different types of test tools according to their purpose Explain the benefits of using test tools.
Organization and Implementation of a National Regulatory Program for the Control of Radiation Sources Program Performance Criteria.
REDCap General Overview
Key management issues in PGP
Introduction to DBMS Purpose of Database Systems View of Data
Figure 9.8 User Evaluation Form
Chapter 2 Database Environment.
Chapter 16 Database Administration and Security
Introduction Multimedia initial focus
“Measuring recovery: signposts to good practice”
Public Key Infrastructure (PKI)
Object oriented system development life cycle
Big Data Quality the next semantic challenge
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
University of Technology
Chapter 2 Database Environment.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2: Database System Concepts and Architecture
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Data, Databases, and DBMSs
Providing Secure Storage on the Internet
Overview of Databases and Transaction Processing
Chapter 1 Database Systems
Software Engineering Experimentation
Database Systems Chapter 1
Database System Architecture
CHAPTER SIX OVERVIEW SECTION 6.1 – DATABASE FUNDAMENTALS
Configuration Management DataBase
Outline Introduction Background Distributed DBMS Architecture
Introduction to DBMS Purpose of Database Systems View of Data
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Sub-Regional Workshop on International Merchandise Trade Statistics Compilation and Export and Import Unit Value Indices 21 – 25 November Guam.
Chapter 1 Database Systems
Chapter 2 Database Environment Pearson Education © 2014.
SDLC Phases Systems Design.
Robin Dale RLG OAIS Functionality Robin Dale RLG
Web Information Systems Engineering (WISE)
Chapter 2 Database Environment Pearson Education © 2009.
Module 4 System and Application Security
Chapter 2 Database Environment Pearson Education © 2009.
Instructor Materials Chapter 5: Ensuring Integrity
Metadata on quality of statistical information
Outline Introduction Background Distributed DBMS Architecture
Best Practices in Higher Education Student Data Warehousing Forum
Organizational Aspects of Data Management
Presentation transcript:

Data Quality Assurance in Cooperative Information Systems: a Multi-dimension Quality Certificate Cinzia Cappiello1, Chiara Francalanci1, Barbara Pernici1, Pierluigi Plebani1, Monica Scannapieco2 1 Politecnico di Milano, Milano, Italy {cappiell|francala|pernici|plebani}@elet.polimi.it 2 Università di Roma, “La Sapienza”, Rome, Italy IASI-CNR, Rome, Italy monscan@dis.uniroma1.it

Outline Definitions of data quality dimensions Relevant data quality dimensions in CISs A Quality Management Architecture Data Quality Certificate Future work Cinzia Cappiello

Definitions of data quality dimensions The data quality literature provides a thorough classification of data quality dimensions. There is not general agreement on the definition of most dimensions. The selected definitions are founded on a survey of the quality dimensions proposed in the literature over the past 10 years [Catarci, Scannapieco 2002]. On the basis of this classification a basic set of data quality dimensions is defined including accuracy, completeness, consistency, timeliness, interpretability and accessibility; which represent the dimensions considered by the majority of the authors. Timeliness is considered together with the other time related dimensions: currency and volatility. Cinzia Cappiello

Relevant data quality dimensions in CISs Category Major dimensions Sub dimensions Object Dimensions Accuracy Completeness Consistency Subject Dimensions Interpretability Architectural Dimensions Reliability Accessibility Process Dimensions Timeliness Volatility Currency Currency Level Security/Access Security Relevance History Cost Cinzia Cappiello

Object dimensions Accuracy: “a measure of the proximity of a data value v to some other value v’ that are considered correct” [Redman 1996] Completeness: “degree to which specific values are included in a data collection” [Wang & Strong 1996] Consistency: it is defined at three levels [Redman 1996] View consistency Value consistency Representation consistency Cinzia Cappiello

Subject and architectural dimensions Interpretability: it is related to the format in which data are specified and to the clarity of data definitions [Wang &Strong 1996] Reliability: it can be defined at two levels: data reliability and source reliability. Data are considered reliable if they can be trusted to convey the right information. Source reliability is calculated considering the reputation of the source. [Wand &Wang 1996] Accessibility: “the degree in which data are available or quickly or easily retrievable”. [Wang & Strong 1996] Cinzia Cappiello

Process dimensions(1) Timeliness: “the extent to which the age of data is appropriate for the task at hand”. A possible measure [Ballou 1998]: Currency: “the time interval between the latest update of a data value and the time it is used” Currency level: specifies the degree to which a data set is up-to-date [Cappiello, Francalanci, Pernici 2002] Volatility: it is defined as the temporal dynamics of Expiration which is the time until data remain valid. Volatility is a function that measures the probability that the expiration time will change within the interval between publication and expiration time [Pernici, Scannapieco 2002] Cinzia Cappiello

Process dimensions (2) Security/Access Security: it is defined as “the extent to which access to data can be restricted and hence kept secure” [Wang & Strong 1996]. We have listed the security requirements that should be satisfied to assure data security. The percentage of satisfied requirements in IS can be a measure of the value of this dimension. Relevance: it is a measure of the appropriateness of the data extracted for the requested task. Cinzia Cappiello

Process dimensions (3) History: the storage of what operations of quality improvement have been performed on data allows to build a certificate in which all the operations that have modified data are listed. For each operation has to be stored: Type of operation Execution date Percentage of improvement Cost: Dimensions that is able to evaluate the cost impact of the errors due to bad data quality Cinzia Cappiello

A Quality Management Architecture Information System DBMS Quality Factory Software Application Organization Infrastructure Data Quality Broker Common Data Quality Service Applications Request/Response CIS Data Quality Repository Common Data Quality Databases Cinzia Cappiello

The Quality Factory Translates the request into a format comprehensible to the IS Identifies the required data and extracts them from the Data Repository Identifies which data and data quality dimensions have been evaluated Using internal measurement tools, performs a static analysis of the values of the data quality dimension If data values do not satisfy quality requirements quality assessment sends an alert message to the Monitoring module Associates a quality certificate with data that satisfy quality requirements Stores the events in which data quality requirements are not satisfied Executes periodical monitoring operations on the data contained in Data Repository Translates the response into a format comprehensible to the user Cinzia Cappiello

Data Quality Certificate Stores the value associated to each quality dimensions adopted Contains sensitivity information Denotes the level of confidentiality of data being transferred It is owned by the source organization that provides the authentication of the data source Provides the integrity of the transmitted data Exchange unit format Cinzia Cappiello

Future work Software implementation of quality factory architecture and data quality certificate Application of the data quality certificate to evaluate the quality of Web services in a cooperative environment Evaluation of the impact of data replication and distribution on data quality dimensions Cinzia Cappiello