Data Warehousing: Tools & Technologies by: Er. Manu Bansal Assistant Professor Deptt of IT

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

Business Information Warehouse Business Information Warehouse.
Data Manager Business Intelligence Solutions. Data Mart and Data Warehouse Data Warehouse Architecture Dimensional Data Structure Extract, transform and.
Data Extraction, Cleanup & Transformation Tools
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Data Staging Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of.
Components and Architecture CS 543 – Data Warehousing.
Data Warehouse success depends on metadata
SESSION 7 MANAGING DATA DATARESOURCES. File Organization Terms and Concepts Field: Group of words or a complete number Record: Group of related fields.
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
® IBM Software Group © IBM Corporation IBM Information Server Deliver – Federation Server.
Leaving a Metadata Trail Chapter 14. Defining Warehouse Metadata Data about warehouse data and processing Vital to the warehouse Used by everyone Metadata.
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
Data Warehouse Tools and Technologies - ETL
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Cube Enterprise Database Solution presented to MTF GIS Committee presented by Minhua Wang Citilabs, Inc. November 20, 2008.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Jean-Pierre Dijcks Principal Product Manager Oracle Warehouse Builder Oracle Corporation.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
PowerMart of Informatica 발표자 : 김수경 (992COG05) 발표일 : March 27 th, 2000.
DataWarehousing and DataMining Prof. Sin-Min Lee.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Management Console Synonym Editor
Introduction to the Adapter Server Rob Mace June, 2008.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Information Builders : SmartMart Seon-Min Rhee Visualization & Simulation Lab Dept. of Computer Science & Engineering Ewha Womans University.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
3 Copyright © 2009, Oracle. All rights reserved. Accessing Non-Oracle Sources.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Transportation: Refreshing Warehouse Data Chapter 13.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
By N.Gopinath AP/CSE.  The data warehouse architecture is based on a relational database management system server that functions as the central repository.
Object storage and object interoperability
© 2012 Saturn Infotech. All Rights Reserved. Oracle Hyperion Data Relationship Management Presented by: Prasad Bhavsar Saturn Infotech, Inc.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
ViaSQL Transfer. Viaserv, Inc. Transfer – 2 The ViaSQL Transfer n Available only with ViaSQL Integrator n Move data between OS/390 and a LAN database.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Platinum DecisionBase1 DW Product Platinum - Computer AssociatesDecisionBase Hyunsook Lim Database Laboratory Dept. of CSE.
An Overview of Data Warehousing and OLAP Technology
Copyright © 2006, Oracle. All rights reserved. Czinkóczki László oktató Using the Oracle Warehouse Builder.
C Copyright © 2007, Oracle. All rights reserved. Introduction to Data Warehousing Fundamentals.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Copyright  Oracle Corporation, All rights reserved Building the Warehouse.
Managing Data Resources File Organization and databases for business information systems.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Supervisor : Prof . Abbdolahzadeh
Building a Data Warehouse
Building a Data Warehouse: Understanding Why & How
Defining Data Warehouse Concepts and Terminology
Overview of MDM Site Hub
PowerMart of Informatica
Data Warehouse.
Defining Data Warehouse Concepts and Terminology
MANAGING DATA RESOURCES
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
THE ARCHITECTURAL COMPONENTS
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

Data Warehousing: Tools & Technologies by: Er. Manu Bansal Assistant Professor Deptt of IT

Data Warehouse Database Always implemented on the Relational database management system (RDBMS)technology Always implemented on the Relational database management system (RDBMS)technology Data warehouse attributes such as very large database size, ad hoc query processing and need for flexible user view creation including aggregates, multiple joins, drill downs have become drivers for different approaches to Data warehouse. These are: Data warehouse attributes such as very large database size, ad hoc query processing and need for flexible user view creation including aggregates, multiple joins, drill downs have become drivers for different approaches to Data warehouse. These are:  Parallel relational database design (SMPs, MPPs, clusters of uni or multiprocessors).  An innovative approach to speed up traditional RDBMS by using new index structures to bypass relational table scans.  Multidimensional databases

Sourcing, Extraction, Cleanup & Transformation Tools Tool requirements Tool requirements Vendor approaches Vendor approaches Access to Legacy data Access to Legacy data Vendor solutions Vendor solutions Transformation engines Transformation engines

Tool Requirements Tasks to be performed Tasks to be performed Removing unwanted data from operational database Removing unwanted data from operational database Converting to common data names and definitions Converting to common data names and definitions Calculating summaries and derived data Calculating summaries and derived data Establishing defaults for missing data Establishing defaults for missing data Accommodating source data definition changes Accommodating source data definition changes consolidation/integration consolidation/integration metadata synchronization and management metadata synchronization and management

Significant issues for tools Database heterogeneity --- data models,access languages, data navigation, operations, concurrency, integrity, recovery, etc Database heterogeneity --- data models,access languages, data navigation, operations, concurrency, integrity, recovery, etc Data heterogeneity--- difference in the way data is defined and use in the models- homonyms, synonyms, unit incompatibility, different attributes for the same entity, modeling same effect. Data heterogeneity--- difference in the way data is defined and use in the models- homonyms, synonyms, unit incompatibility, different attributes for the same entity, modeling same effect.

Tool requirements source data identification source data identification support for various file structures support for various file structures handling data changes handling data changes extraction specification interface extraction specification interface ease of accessing dictionary/repository ease of accessing dictionary/repository maintainable generated code maintainable generated code selective data extraction selective data extraction field-level access to data field-level access to data data type & character set translation(moving in incompatible system) data type & character set translation(moving in incompatible system) summarization/aggregation of records & fields summarization/aggregation of records & fields warehouse loading from tool should be easy warehouse loading from tool should be easy vendor stability & support vendor stability & support

Vendor Approaches Code generators create tailored 3GL/4GL transformation programs based on source & target data definitions, and data transformation & enhancement rules Code generators create tailored 3GL/4GL transformation programs based on source & target data definitions, and data transformation & enhancement rules Data replication tools use triggers or a recovery log to capture changes to a single data source and apply the changes to another copy of the data on a different system Data replication tools use triggers or a recovery log to capture changes to a single data source and apply the changes to another copy of the data on a different system Rule-driven dynamic transformation engines (data mart builders) capture data from source system at user defined intervals, transform it, and load it into target system Rule-driven dynamic transformation engines (data mart builders) capture data from source system at user defined intervals, transform it, and load it into target system

Access to Legacy Data Apertus’ Enterprise/Access & Enterprise/Integrator are representative of this approach Apertus’ Enterprise/Access & Enterprise/Integrator are representative of this approach Three layer architecture Three layer architecture data layer provides data access & transaction services for management of corporate data assets; enforces business rules for data integrity data layer provides data access & transaction services for management of corporate data assets; enforces business rules for data integrity process layer provides services to manage automation & support for current business processes process layer provides services to manage automation & support for current business processes user layer manages user interaction with process and data layer services user layer manages user interaction with process and data layer services Data warehouse models supported Data warehouse models supported virtual warehouse against which enterprise applications execute; no data migration required virtual warehouse against which enterprise applications execute; no data migration required E/A + Open Gateway use SQL Server as a virtual database E/A + Open Gateway use SQL Server as a virtual database

Vendor Solutions Prism solutions - Warehouse Manager Prism solutions - Warehouse Manager generates code to extract/integrate data, create/mange metadata, & build a historical, subject-oriented database generates code to extract/integrate data, create/mange metadata, & build a historical, subject-oriented database Key changes, structural changes, etc. are handled Key changes, structural changes, etc. are handled Can extract data from DB2, IDMS, IMS, VSAM, RMS, and UNIX and MVS files; target databases include Oracle, Sybase & Informix Can extract data from DB2, IDMS, IMS, VSAM, RMS, and UNIX and MVS files; target databases include Oracle, Sybase & Informix SAS institute - SAS System SAS institute - SAS System data repository function builds the informational database data repository function builds the informational database SAS data access engine - extracts data, combines common variables, transform data, consolidate redundant data SAS data access engine - extracts data, combines common variables, transform data, consolidate redundant data SAS views - networking SAS views - networking SAS reporting, graphing, etc - front end SAS reporting, graphing, etc - front end SAS engines can also work with hierarchical & relational DBS and flat files SAS engines can also work with hierarchical & relational DBS and flat files

Vendor Solutions (contd.) Carleton Corporation Carleton Corporation Passport Passport Data access Data access Data analysis & auditing Data analysis & auditing Passport Data Language (PDL) Passport Data Language (PDL) Run-time environment Run-time environment Report writing Report writing Metacenter Metacenter Data extraction Data extraction Data transformation Data transformation Metadata capture & browsing Metadata capture & browsing Data mart subscription Data mart subscription Warehouse control center functionality Warehouse control center functionality Event control & notification Event control & notification Overall, a wide range of sophisticated capabilities Overall, a wide range of sophisticated capabilities Apertus & Carleton merged in 1997/98 Apertus & Carleton merged in 1997/98

Vendor Solutions (contd.) Vality corporation - Integrity data reengineering tool focused largely on data quality improvement Vality corporation - Integrity data reengineering tool focused largely on data quality improvement attains & maintains highest quality data; error removal, completion attains & maintains highest quality data; error removal, completion builds accurate, consolidated views of subjects areas builds accurate, consolidated views of subjects areas makes explicit all data relevant to a business function, even though it may be hidden in various legacy systems makes explicit all data relevant to a business function, even though it may be hidden in various legacy systems requires very little manual intervention, i.e. low deployment cost requires very little manual intervention, i.e. low deployment cost Evolutionary Technologies Inc. (ETI): ETI-Extract Evolutionary Technologies Inc. (ETI): ETI-Extract usage usage populate and maintain data warehouses populate and maintain data warehouses move to new architectures, while preserving investments in legacy systems move to new architectures, while preserving investments in legacy systems integrate disparate systems integrate disparate systems migrate data to new platforms, databases, and applications migrate data to new platforms, databases, and applications

Vendor Solutions (contd.) ETI (contd.) ETI (contd.) Master Toolset - a set of interactive editors Master Toolset - a set of interactive editors Environment Editor allows specification of different platforms & system operating environments to be accessed Environment Editor allows specification of different platforms & system operating environments to be accessed Schema Editor provides schema browsing & updating capability Schema Editor provides schema browsing & updating capability Grammar Editor provides means for defining customized conditional retrieval, transformation, conversion, and populate logic Grammar Editor provides means for defining customized conditional retrieval, transformation, conversion, and populate logic Template Editor enables specification of rules to shape the way data retrieval, conversion, and populate programs are generated Template Editor enables specification of rules to shape the way data retrieval, conversion, and populate programs are generated metadata database metadata database metadata exchange library (MDX) metadata exchange library (MDX) interactive browsing & reporting interactive browsing & reporting

Conversion editor - graphical interface for defining data mappings between source(s) and target Conversion editor - graphical interface for defining data mappings between source(s) and target ETI-Extract Executive - process control & automatic job execution ETI-Extract Executive - process control & automatic job execution ETI-Extract Workset Browser - user-customizable desktop ETI-Extract Workset Browser - user-customizable desktop Metadata Facility Metadata Facility Vendor Solutions (contd.)

Information Builders: EDA/SQL Information Builders: EDA/SQL SQL access & uniform relational view of relational and non-relational data residing in over 60 different databases SQL access & uniform relational view of relational and non-relational data residing in over 60 different databases supports copy management, data quality management, replication capabilities supports copy management, data quality management, replication capabilities standards support for ODBC and X/Open standards support for ODBC and X/Open gateways to Amdahl, IBM, DEC, HP, Bull, etc. gateways to Amdahl, IBM, DEC, HP, Bull, etc.

Transformation Engines Metadata exchange architecture (MX) Metadata exchange architecture (MX) a multicompany initiative to integrate metadata a multicompany initiative to integrate metadata a ‘back-end’ architecture with published API supporting technical & business data a ‘back-end’ architecture with published API supporting technical & business data Informatica: Powermart Suite Informatica: Powermart Suite Powermart Designer: source analyzer, warehouse designer, transformation designer Powermart Designer: source analyzer, warehouse designer, transformation designer PowerMart Server: extractor, transformation engine, loader PowerMart Server: extractor, transformation engine, loader Informatica Server Manager: Informatica server cofiguration Informatica Server Manager: Informatica server cofiguration Informatica Repository: metadata integration hub Informatica Repository: metadata integration hub Informatica PowerCapture: incremental refresh of data mart Informatica PowerCapture: incremental refresh of data mart

Transformation Engines (contd.) Constellar: Constellar Hub Constellar: Constellar Hub to handle the movement & transformation of data for both data migration & distribution in an operational system, and for capturing operational data for loading in a warehouse to handle the movement & transformation of data for both data migration & distribution in an operational system, and for capturing operational data for loading in a warehouse hub & spoke architecture hub & spoke architecture spoke use for connecting to source/sink spoke use for connecting to source/sink

hub does all kinds of transformation activity hub does all kinds of transformation activity record reformatting & restructuring record reformatting & restructuring field level data transformation, validation, and table lookup field level data transformation, validation, and table lookup file & multi-file set-level data transformation and table lookup file & multi-file set-level data transformation and table lookup creation of intermediate results for downstream tranformation by the hub creation of intermediate results for downstream tranformation by the hub can store data temporarily in a staging table can store data temporarily in a staging table Collate and Transform steps Collate and Transform steps Data Junction: similar to Constellar Hub Data Junction: similar to Constellar Hub Transformation Engines (contd.)

Thank You