1 Carga de DW como Wrkf Presentación sobre el artículo: Modeling Data Warehouse Refreshment Process as a Workflow Application M. Bouzeghoub, F. Fabret,

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Lecture 5 Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues.
Enterprise Business Processes and Reporting (IS 6214) MBS MIMAS `17 th Feb 2010 Fergal Carton Business Information Systems.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
Two main requirements: 1. Implementation Inspection policies (scheduling algorithms) that will extand the current AutoSched software : Taking to account.
Enterprise Business Processes and Reporting (IS 6214) MBS MIMAS 12 th Jan 2011 Fergal Carton Business Information Systems.
IS Consulting Process (IS 6005) Masters in Business Information Systems 2009 / 2010 Fergal Carton Business Information Systems.
February 12, 2009 Center for Hybrid and Embedded Software Systems Encapsulated Model Transformation Rule A transformation.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Data Staging Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of.
Enterprise Business Processes and Reporting (IS 6214) MBS MIMAS 19 th Jan 2011 Fergal Carton Business Information Systems.
Components and Architecture CS 543 – Data Warehousing.
DATA WAREHOUSING.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Business and IS Performance (IS 6010) MBS BIS 2010 / th November 2010 Fergal Carton Accounting Finance and Information Systems.
Business Intelligence System September 2013 BI.
Data Warehouse Components
Designing a Data Warehouse
Patrick Seto CS157A Section 3 Data Warehouses Presented by Patrick Seto CS157A Section 3.
Business Intelligence Instructor: Bajuna Salehe Web:
ETL By Dr. Gabriel.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Chapter 10 Architectural Design
Data Warehouse Operational Issues Potential Research Directions.
Data Warehouse & Data Mining
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
Data Warehouse Chapter 11. Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Topic (1)Software Engineering (601321)1 Introduction Complex and large SW. SW crises Expensive HW. Custom SW. Batch execution.
1 Data Warehouses BUAD/American University Data Warehouses.
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
Data Warehousing.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Transportation: Loading Warehouse Data Chapter 12.
Data Management for Decision Support Session-3 Prof. Bharat Bhasker.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
7 Strategies for Extracting, Transforming, and Loading.
Two-Tier DW Architecture. Three-Tier DW Architecture.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Data Warehouse A place the information system department puts the data that is turned into information. Data must be properly prepared,organized,and presented.
MIS 451 Building Business Intelligence Systems Data Staging.
Chapter 8: Data Warehousing. Data Warehouse Defined A physical repository where relational data are specially organized to provide enterprise- wide, cleansed.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Supervisor : Prof . Abbdolahzadeh
Advanced Applied IT for Business 2
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Database Vs. Data Warehouse
A Unifying View on Instance Selection
An Introduction to Data Warehousing
Introduction to Data Warehousing
Data Warehouse A place the information system department puts the data that is turned into information. Data must be properly prepared,organized,and presented.
Design Model Like a Pyramid Component Level Design i n t e r f a c d s
Introduction of Week 9 Return assignment 5-2
Data Warehousing Concepts
King Saud University College of Engineering IE – 462: “Industrial Information Systems” Fall – 2018 (1st Sem H) Introduction (Chapter 1) part.
Query Optimization.
Lecture 23 CS 507.
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
Presentation transcript:

1 Carga de DW como Wrkf Presentación sobre el artículo: Modeling Data Warehouse Refreshment Process as a Workflow Application M. Bouzeghoub, F. Fabret, Maja Matulovic-Broqué Raul Ruggia Abril-Mayo de 2001

2 DW Refreshment as Workflow u Architecture. Extraction / Cleaning Reconciliation / Integration ODS Corp DW Aggregation Customisation Source data in uniform and clean representation Data Sources Data Marts Highly aggreg data into a MD Model

3 DW Refreshment as... - Architecture u Data Stores: –Not necesarelly materialized. u Tasks: –Extractions & Cleaning: »Same or distinct wrapper or tools. –Data Reconciliation (multi-source cleaning). –Data Integration (multi-sources operations). »Same or separated. –High Level Aggregation: »From simple functions to data mining. –Customisation: »Adapting information to DM users.

4 DW Refreshment as Worfklow u We define the Refreshment Process as: –Workflow whose activities depend on the available products for data extraction, cleaning and integration, and whose triggering events of these activities depend on the application domain and on the required quality in terms of data freshness. u The quality of the data in DW depends on: –The capability of the DW System to convey in a reasonable time, from the sources to the data marts, the canges made at the data sources.

5 Refreshment vs. Loading u Loading process: –Four steps: »Preparation: extraction, cleaning, history mgm. »Integration: reconciliation, history. »High level aggreg: update propagation. »Customisat: custom data to Data Marts. –Initial instantiation of the DW. –The largest stage of DW project. »No constraints on the response time. –Requires more availability of Data Sources. –Mostly static.

6 Refreshment vs. Loading u Refreshment process: –Workflow dynamic and evolves. –More performance constrains than loading. –May have a complex asynchronism between its activities (preparation, integration, aggregation & customisation). »High level of paralelism in the preparation activity. »¿ Why loading does not ? Becase it doesn´t need it. –Must requiere less source availability than loading.

7 View maintenance vs. Data Refreshment u The main results: –Self-maintainability: –Coherent and efficient update propagation:

8 Refreshment process: Summary u The refreshment process: –Complex system which may be composed of asynchronous and parallel activites that need a certain monitoring. –It is an event driven system, which evolves according to: »Evolution of data sources & User Requirements. –Users, DW Admin and Source Admin impose constraints on: »Freshness of data, space limits, access frequency. –Robustness & availability of a periodic process. »Transactional coherence of DW update.

9 Views and Refreshment u View definition it is no sufficient: –The query does not specify: »History mgm strategies: u If the view operates on a history or not, and how this history is sampled. »Integration syncrhonization: u If the changes of a source should be integrated each hour or each week. u Which data timestamp should be taken when integrating changes of different sources. »Specific filters defined in the cleaning process (????????): u Choosing the same measure for certain attributes. u Rouding the values of some attributes. u Eliminating confidential data.

10 Views and Refreshment u View definition it is no sufficient: »Based on the same view, the refreshment may produce different results dependeng on extra-parameters. –1. The change extraction capabilities of each source (AVAILABILITY). »Characterizes the moment at which the integration can be performed. –2. The time needed to compute the change to the view from the changes in sources. –3. The ACTUALIZATION of data in each source. »Determines the difference between data in DW and its state in sources and real world.

11 Workflow model for DW u Activities / Tasks: –Extraction, Cleaning, History Mgm, Integration, Aggreg/Computing. –Object Transformation (e.g.: formatting, code values). u Coordination: –Events: »Temporal, Tasks termination, user defined. –Conditions: »Data values, input data state. u Summary of Task dependencies: –Data flow. –Task coordination. ¿Both kinds of dependencies are needed ?

12 Refreshment scheduling u Client-driven refreshment. –On demand by users. –Mainly update propagation from the ODS to DW. u Source-driven refreshment. –Triggered by changes in sources. –Concerns Preparation phase. u ODS-driven refreshment. –Corresponds to the part of the process which is automatically monitored by the DW system. –Concens Integration phase.

13 Semantics of Refreshment Process u Extra-View Parameters: –Change extration/detection capability on source. »Determines the availability of the changes from sources. »Impacts: u Data freshness. u Data coherence, because time discrepancies may occur in a view. –Time needed to compute the change in the view. –Frequency of source data actualization. »Determines the difference that may exist between the state of the source and DW. »It is out of the control of the DW.

14 Semantics of Refreshment Process u Summary: –Concerning the Refreshment strategy: »Building a refreshment strategy is wrt: u Data freshness, computation time of queries and views, data accuracy, etc. »And depends on: u Source constrains: –Availability windows, frequency of changes. u DW system limits: –Storage limits, functional limits.

15 Semantics of Refreshment Process u Design decisions. –Design decisions and refresh semantics: »The moment when each refreshment task takes place. »The way the different refreshment tasks are syncrhon. »The way the shared data is made visible for the tasks. –Design decisions are specified by defining: »The decomposition of the refreshment process in elementary tasks. »Ordering these tasks. »The events initiating the tasks.

16 DW Data Quality u Depends on: –The capability of DW Systems to convey in a reasonable time, from sources to DMs, the changes made at the sources. u Quality reqs. may impose sync. strategies. –If users desire high freshness for data, this means that each update in a source sourld be mirrored as soon as possible to the DW.