The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

WP5 – Chapter 7. Harmonisation Harmonisation of geometry, data definitions, data models, naming ISSUES: MS deliveries are described in WP 4.1 in an enhanced.
Chapter 4 Quality Assurance in Context
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
ITIL: Service Transition
T-FLEX DOCs PLM, Document and Workflow Management.
The Application of ISAD(G) to the Description of Archival Datasets Elizabeth Shepherd University College London.
The Experience Factory May 2004 Leonardo Vaccaro.
Software Testing and Quality Assurance
PREMIS What is PREMIS? – Preservation Metadata Implementation Strategies When is PREMIS use? – PREMIS is used for “repository design, evaluation, and archived.
SE 555 Software Requirements & Specification Requirements Management.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
Descriptive Metadata o When will mods.xml be used by METS (aip.xml) ?  METS will use the mods.xml to encode descriptive metadata. Information that describes,
Employee Central Presentation
Configuration Management
A tour of new features introducing LINQ. Agenda of LINQ Presentation We have features for every step of the way LINQ Fundamentals Anonymous Functions/Lambda.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
S New Security Developments in DICOM Lawrence Tarbox, Ph.D Chair, DICOM WG 14 (Security) Siemens Corporate Research.
Introduction to Software Quality Assurance (SQA)
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
DATA MODELLING TOOLS FOR ORGANISING DATABASES. For a database to be organised and logical, it must be well-designed and set out. In such cases, the databases.
Databases and LINQ Visual Basic 2010 How to Program 1.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Objectives Understand the basic concepts and definitions relating to testing, like error, fault, failure, test case, test suite, test harness. Explore.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
ITEC 3220M Using and Designing Database Systems
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Product Documentation Chapter 5. Required Medical Device Documentation  Business proposal  Product specification  Design specification  Software.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Chapter 13: Regression Testing Omar Meqdadi SE 3860 Lecture 13 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
ITGS Databases.
Cmpe 589 Spring 2006 Lecture 2. Software Engineering Definition –A strategy for producing high quality software.
NOAA Data Citation Procedural Directive 8 November 2012 DAARWG.
Digitally Signed Records – Friend or Foe? Boris Herceg Hrvoje Brzica Financial Agency – FINA Hrvoje Stančić.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
State of Georgia Release Management Training
 CMMI  REQUIREMENT DEVELOPMENT  SPECIFIC AND GENERIC GOALS  SG1: Develop CUSTOMER Requirement  SG2: Develop Product Requirement  SG3: Analyze.
Data Citation Implementation Pilot Workshop
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 1 Quality management Produced in Collaboration between.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
CM Spec analysis Markup from discussion 15/3. Summary of the scenario by way of the key business entities & their relationships CR Req Implem System or.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Slide title :40-47pt Slide subtitle :26-30pt Color::white Corporate Font : FrutigerNext LT Medium Font to be used by customers and partners : Arial HUAWEI.
Introduction for the Implementation of Software Configuration Management I thought I knew it all !
Visual Basic 2010 How to Program
TechStambha PMP Certification Training
ServiceNow Implementation Knowledge Management
Data Dictionaries ER Diagram.
IT Asset Management Status Update Hardware Asset Handling
CMMI – Staged Representation
Introduction to Database Systems
Chapter 2 Database Environment.
Engineering Processes
Data Quality By Suparna Kansakar.
Health Ingenuity Exchange - HingX
Data Model.
Database Systems Instructor Name: Lecture-3.
Databases and Information Management
Metadata in Digital Preservation: Setting the Scene
Engineering Processes
Presentation transcript:

The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie University Berlin Takustr.9, Berlin, Germany 2 nd International Open Data Dialog Berlin, Germany November 19 th 2013

Main Contents for the Presentation 1  The Data Attribution  The Applications/Use of Attribution  The Attribution Model Scope  The Attribution Model for Open Data  Techniques for Data Attribution  The Mechanism for Data Attribution  Conclusion

Why Data Attribution? The Data Attribution 2 Problem Statement for Data Attribution Problem Statement for Data Attribution Develop an efficient infrastructure that make this possible to verify the source of Dataset(s). The platform allows to manage the life-cycle of an dataset(s) its creation, modification and expiration into the system and platform should allows to trace the origin of each dataset and visualize graphically. Invented Solution for Data Attribution Invented Solution for Data Attribution Developed a model TAMOD (The Attribution Model for Open Data) and introduced a model theory to support the solution. Techniques are used to define the model process and efficiency. Mechanism is defined in order to implement and test the model approach. Developed a model TAMOD (The Attribution Model for Open Data) and introduced a model theory to support the solution. Techniques are used to define the model process and efficiency. Mechanism is defined in order to implement and test the model approach.

What is Data Attribution? The Data Attribution 3  The “Attribution” is to give credit or to acknowledge the Author/Creator of some Data while using that data for some purposes.  The act of Attribution is defined as to establish a relationship/link to the data which operates from one artifact to another artifact (the Creator and the User of that data). Keeping the track record of data from beginning to end.

What is Data Attribution? The Applications/Use of Data Attribution 4 Data Attribution is mainly used for acknowledging the Data Creator(s) and indicating the availability of Data. Crediting Author(s) It has to be ensured that the Credit is given to those authors who created that data Granularity It refers to the attributes or features that the datasets contain. Versions of Dataset The dataset(s) which are regularly updated that needs to be attributed against its each newer version. Location of the Dataset It refers to the persistent link which actually refers to the dataset(s) such as Digital Object Identifier, Deep Links.

The Applications/Use of Data Attribution The Applications/Use of Data Attribution 5 Data Quality All mandatory elements of Data Attribution must be filled out for each dataset in order to ensure the completeness and correctness of every single dataset. Data Relevance It is analyzed that the metadata information is exactly according to the submitted dataset(s), e.g. various attributes and description of data fields. Data Trustworthiness Evaluate the Quality criteria such as data quality (check the ratio of blank nodes), perform tests to measure the correctness (codes) and efficiency on datasets.

The Applications/Use of Data Attribution The Attribution Model Scope 6 Intellectual Property Establish the Rights and Ownerships for the dataset(s). Three points: What rights are associated?, Who Owns the datasets?, How open are the dataset(s)? Informational Discover and reuse the datasets, understanding or analyzing what work has done before, and how a subject changing over the period of time (History and Transformational Stages). Data Provenance Track the history of datasets so that the origin/source of datasets can be traced and also various transformational stages of datasets.

The Attribution Model Scope TAMOD Classifies Nodes into Five Parts 7  To manage the attribution data in the platform for keeping complete track record for the shared datasets through attribution model.  To define the model in a precise and efficient way so that it can manage the datasets transformation process during the life-cycle of dataset.  To support all kinds of representations of attribution during the process from one artifact to another artifact.  To allow developers to operate, build and share tools that can be integrated with the model.  To define the core set of rules that can express the valid interfaces for attribution data and workflow process with graphs.

Model Theory - TAMOD Classifies Nodes into Five Parts Artifact A fixed value Dataset which represent an Entity in a given State. The Attribution Model for Open Data (TAMOD) – 1 8 Metadata A description about the dataset, its elements and representation format. Actions Types of Operation that are performed on various dataset(s). Workflow/Process Performed on an artifact in order to create another artifact. Agents Indicate Entities that are controlling the processes such as Data Steward, Platform Administrator, Data Accessor.

The Attribution Model for Open Data (TAMOD) - 1 Techniques for Data Attribution

Techniques for Data Attribution Levels of Modification for Micro-Data Attribution 10 2 – Direct Attribution or Micro-Data Attribution – It is used to make the data attribution process efficient Crediting/Acknowledging the Authors/Creators of Dataset(s) into a more compact way in order to make the process more manageable, efficient and consistent for building/establishing relationship. Exclude the Organization or its department personnel whose role don’t fit for the role of data Creator or Compiler. Unique Identifiers are used for both Contributors and Contribution to reduce/compress the irrelevant entities or least important entities, a table can be used for supplementary data. 1 – Descriptive Attribution or Indirect-Data Attribution – It is used to record all kinds of modifications and certain operations that are performed by Data Accessor/User on the dataset(s).

Level of Data Modifications For Micro Data Attribution The Attribution Model for Open Data (TAMOD) – 2 11 Different Levels of Modification that needs to be consider in order to implement the Micro-Data Attribution Technique Low Level Intermediate Level Higher Level Basic modification in dataset or its fields that doesn’t impact on the health of dataset. Modify some data or data fields, enter some new data and description. Consider the level of impact on data health. Perform major changes in dataset or data attributes, and modifying the definition of data.

The Attribution Model for Open Data (TAMOD) - 2 The Mechanism for Data Attribution

The Mechanism for Source Data Attribution Attribution-Aware Data Querying 13 Main features to define the Mechanism for the Source Data Attribution       Apply Checksum for every dataset or each version of dataset such as SHA-1, SHA-2, etc. Apply Digital Signatures for RDF datasets Signature verification for RDF dataset(s) Integrity Assurance Result Verification   The attribution elements for Public Key is to describe the creation date, modification/expiry date    

Attribution-Aware Data Querying Conclusion 14  Data feeds are associated with certain attribution metadata such as data source, data creator/publisher, creation/publication date, modified date, data version, etc.  Attribution-aware data querying taking this concept into the consideration of data attribution implemented by some Govt. Institutes or Organizations, and how they are handling the system of Data Attribution?.  LIVE DEMO – Some Govt. Datasets Concrete Results that are obtained through SPARQL Queries Execution for analyzing data attribution process management.

Conclusion 15  The Data Attribution model describes the life-cycle of datasets such as Creating Link Data, Include Metadata Information, Data Access and Data Attribution Process.  For making the Attribution more efficient it has to analyzed the level of modification that are performed on datasets.  Micro-Data Attribution techniques is used to make the data Attribution more efficient.  Data Attribution mechanism maintain the Quality and Integrity of Datasets, previous works can be verified and reused.

Thanks for paying attention! Any Questions, Please? The End