Download presentation
Presentation is loading. Please wait.
Published byVictoria Wright Modified over 9 years ago
1
Data Warehousing: Tools & Technologies by: Er. Manu Bansal Assistant Professor Deptt of IT mrmanubvansal@gmail.com
2
Data Warehouse Database Always implemented on the Relational database management system (RDBMS)technology Always implemented on the Relational database management system (RDBMS)technology Data warehouse attributes such as very large database size, ad hoc query processing and need for flexible user view creation including aggregates, multiple joins, drill downs have become drivers for different approaches to Data warehouse. These are: Data warehouse attributes such as very large database size, ad hoc query processing and need for flexible user view creation including aggregates, multiple joins, drill downs have become drivers for different approaches to Data warehouse. These are: Parallel relational database design (SMPs, MPPs, clusters of uni or multiprocessors). An innovative approach to speed up traditional RDBMS by using new index structures to bypass relational table scans. Multidimensional databases
3
Sourcing, Extraction, Cleanup & Transformation Tools Tool requirements Tool requirements Vendor approaches Vendor approaches Access to Legacy data Access to Legacy data Vendor solutions Vendor solutions Transformation engines Transformation engines
4
Tool Requirements Tasks to be performed Tasks to be performed Removing unwanted data from operational database Removing unwanted data from operational database Converting to common data names and definitions Converting to common data names and definitions Calculating summaries and derived data Calculating summaries and derived data Establishing defaults for missing data Establishing defaults for missing data Accommodating source data definition changes Accommodating source data definition changes consolidation/integration consolidation/integration metadata synchronization and management metadata synchronization and management
5
Significant issues for tools Database heterogeneity --- data models,access languages, data navigation, operations, concurrency, integrity, recovery, etc Database heterogeneity --- data models,access languages, data navigation, operations, concurrency, integrity, recovery, etc Data heterogeneity--- difference in the way data is defined and use in the models- homonyms, synonyms, unit incompatibility, different attributes for the same entity, modeling same effect. Data heterogeneity--- difference in the way data is defined and use in the models- homonyms, synonyms, unit incompatibility, different attributes for the same entity, modeling same effect.
6
Tool requirements source data identification source data identification support for various file structures support for various file structures handling data changes handling data changes extraction specification interface extraction specification interface ease of accessing dictionary/repository ease of accessing dictionary/repository maintainable generated code maintainable generated code selective data extraction selective data extraction field-level access to data field-level access to data data type & character set translation(moving in incompatible system) data type & character set translation(moving in incompatible system) summarization/aggregation of records & fields summarization/aggregation of records & fields warehouse loading from tool should be easy warehouse loading from tool should be easy vendor stability & support vendor stability & support
7
Vendor Approaches Code generators create tailored 3GL/4GL transformation programs based on source & target data definitions, and data transformation & enhancement rules Code generators create tailored 3GL/4GL transformation programs based on source & target data definitions, and data transformation & enhancement rules Data replication tools use triggers or a recovery log to capture changes to a single data source and apply the changes to another copy of the data on a different system Data replication tools use triggers or a recovery log to capture changes to a single data source and apply the changes to another copy of the data on a different system Rule-driven dynamic transformation engines (data mart builders) capture data from source system at user defined intervals, transform it, and load it into target system Rule-driven dynamic transformation engines (data mart builders) capture data from source system at user defined intervals, transform it, and load it into target system
8
Access to Legacy Data Apertus’ Enterprise/Access & Enterprise/Integrator are representative of this approach Apertus’ Enterprise/Access & Enterprise/Integrator are representative of this approach Three layer architecture Three layer architecture data layer provides data access & transaction services for management of corporate data assets; enforces business rules for data integrity data layer provides data access & transaction services for management of corporate data assets; enforces business rules for data integrity process layer provides services to manage automation & support for current business processes process layer provides services to manage automation & support for current business processes user layer manages user interaction with process and data layer services user layer manages user interaction with process and data layer services Data warehouse models supported Data warehouse models supported virtual warehouse against which enterprise applications execute; no data migration required virtual warehouse against which enterprise applications execute; no data migration required E/A + Open Gateway use SQL Server as a virtual database E/A + Open Gateway use SQL Server as a virtual database
9
Vendor Solutions Prism solutions - Warehouse Manager Prism solutions - Warehouse Manager generates code to extract/integrate data, create/mange metadata, & build a historical, subject-oriented database generates code to extract/integrate data, create/mange metadata, & build a historical, subject-oriented database Key changes, structural changes, etc. are handled Key changes, structural changes, etc. are handled Can extract data from DB2, IDMS, IMS, VSAM, RMS, and UNIX and MVS files; target databases include Oracle, Sybase & Informix Can extract data from DB2, IDMS, IMS, VSAM, RMS, and UNIX and MVS files; target databases include Oracle, Sybase & Informix SAS institute - SAS System SAS institute - SAS System data repository function builds the informational database data repository function builds the informational database SAS data access engine - extracts data, combines common variables, transform data, consolidate redundant data SAS data access engine - extracts data, combines common variables, transform data, consolidate redundant data SAS views - networking SAS views - networking SAS reporting, graphing, etc - front end SAS reporting, graphing, etc - front end SAS engines can also work with hierarchical & relational DBS and flat files SAS engines can also work with hierarchical & relational DBS and flat files
10
Vendor Solutions (contd.) Carleton Corporation Carleton Corporation Passport Passport Data access Data access Data analysis & auditing Data analysis & auditing Passport Data Language (PDL) Passport Data Language (PDL) Run-time environment Run-time environment Report writing Report writing Metacenter Metacenter Data extraction Data extraction Data transformation Data transformation Metadata capture & browsing Metadata capture & browsing Data mart subscription Data mart subscription Warehouse control center functionality Warehouse control center functionality Event control & notification Event control & notification Overall, a wide range of sophisticated capabilities Overall, a wide range of sophisticated capabilities Apertus & Carleton merged in 1997/98 Apertus & Carleton merged in 1997/98
11
Vendor Solutions (contd.) Vality corporation - Integrity data reengineering tool focused largely on data quality improvement Vality corporation - Integrity data reengineering tool focused largely on data quality improvement attains & maintains highest quality data; error removal, completion attains & maintains highest quality data; error removal, completion builds accurate, consolidated views of subjects areas builds accurate, consolidated views of subjects areas makes explicit all data relevant to a business function, even though it may be hidden in various legacy systems makes explicit all data relevant to a business function, even though it may be hidden in various legacy systems requires very little manual intervention, i.e. low deployment cost requires very little manual intervention, i.e. low deployment cost Evolutionary Technologies Inc. (ETI): ETI-Extract Evolutionary Technologies Inc. (ETI): ETI-Extract usage usage populate and maintain data warehouses populate and maintain data warehouses move to new architectures, while preserving investments in legacy systems move to new architectures, while preserving investments in legacy systems integrate disparate systems integrate disparate systems migrate data to new platforms, databases, and applications migrate data to new platforms, databases, and applications
12
Vendor Solutions (contd.) ETI (contd.) ETI (contd.) Master Toolset - a set of interactive editors Master Toolset - a set of interactive editors Environment Editor allows specification of different platforms & system operating environments to be accessed Environment Editor allows specification of different platforms & system operating environments to be accessed Schema Editor provides schema browsing & updating capability Schema Editor provides schema browsing & updating capability Grammar Editor provides means for defining customized conditional retrieval, transformation, conversion, and populate logic Grammar Editor provides means for defining customized conditional retrieval, transformation, conversion, and populate logic Template Editor enables specification of rules to shape the way data retrieval, conversion, and populate programs are generated Template Editor enables specification of rules to shape the way data retrieval, conversion, and populate programs are generated metadata database metadata database metadata exchange library (MDX) metadata exchange library (MDX) interactive browsing & reporting interactive browsing & reporting
13
Conversion editor - graphical interface for defining data mappings between source(s) and target Conversion editor - graphical interface for defining data mappings between source(s) and target ETI-Extract Executive - process control & automatic job execution ETI-Extract Executive - process control & automatic job execution ETI-Extract Workset Browser - user-customizable desktop ETI-Extract Workset Browser - user-customizable desktop Metadata Facility Metadata Facility Vendor Solutions (contd.)
14
Information Builders: EDA/SQL Information Builders: EDA/SQL SQL access & uniform relational view of relational and non-relational data residing in over 60 different databases SQL access & uniform relational view of relational and non-relational data residing in over 60 different databases supports copy management, data quality management, replication capabilities supports copy management, data quality management, replication capabilities standards support for ODBC and X/Open standards support for ODBC and X/Open gateways to Amdahl, IBM, DEC, HP, Bull, etc. gateways to Amdahl, IBM, DEC, HP, Bull, etc.
15
Transformation Engines Metadata exchange architecture (MX) Metadata exchange architecture (MX) a multicompany initiative to integrate metadata a multicompany initiative to integrate metadata a ‘back-end’ architecture with published API supporting technical & business data a ‘back-end’ architecture with published API supporting technical & business data Informatica: Powermart Suite Informatica: Powermart Suite Powermart Designer: source analyzer, warehouse designer, transformation designer Powermart Designer: source analyzer, warehouse designer, transformation designer PowerMart Server: extractor, transformation engine, loader PowerMart Server: extractor, transformation engine, loader Informatica Server Manager: Informatica server cofiguration Informatica Server Manager: Informatica server cofiguration Informatica Repository: metadata integration hub Informatica Repository: metadata integration hub Informatica PowerCapture: incremental refresh of data mart Informatica PowerCapture: incremental refresh of data mart
16
Transformation Engines (contd.) Constellar: Constellar Hub Constellar: Constellar Hub to handle the movement & transformation of data for both data migration & distribution in an operational system, and for capturing operational data for loading in a warehouse to handle the movement & transformation of data for both data migration & distribution in an operational system, and for capturing operational data for loading in a warehouse hub & spoke architecture hub & spoke architecture spoke use for connecting to source/sink spoke use for connecting to source/sink
17
hub does all kinds of transformation activity hub does all kinds of transformation activity record reformatting & restructuring record reformatting & restructuring field level data transformation, validation, and table lookup field level data transformation, validation, and table lookup file & multi-file set-level data transformation and table lookup file & multi-file set-level data transformation and table lookup creation of intermediate results for downstream tranformation by the hub creation of intermediate results for downstream tranformation by the hub can store data temporarily in a staging table can store data temporarily in a staging table Collate and Transform steps Collate and Transform steps Data Junction: similar to Constellar Hub Data Junction: similar to Constellar Hub Transformation Engines (contd.)
18
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.