MIS2502: Data Analytics Extract, Transform, Load

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Business Intelligence (BI) PerformancePoint in SharePoint 2010 Sayed Ali – SharePoint Administrator.
C6 Databases.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc. All rights reserved. 8-1 BUSINESS DRIVEN TECHNOLOGY Chapter Eight: Viewing and Protecting Organizational.
© Tally Solutions Pvt. Ltd. All Rights Reserved 1 Stock Audit in Shoper 9 February 2010.
Enterprise Business Processes and Reporting (IS 6214) MBS MIMAS `17 th Feb 2010 Fergal Carton Business Information Systems.
IS Consulting Process (IS 6005) Masters in Business Information Systems 26 th Feb 2010 Fergal Carton Business Information Systems.
Today’s Goals Concepts  I want you to understand the difference between  Data  Information  Knowledge  Intelligence.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Data Staging Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of.
Data Mining.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
MIS2502: Data Analytics Relational Data Modeling
Agenda 02/20/2014 Complete data warehouse design exercise Finish reconciled data warehouse, bus matrix and data mart Display each group’s work Discuss.
Agenda 02/21/2013 Discuss exercise Answer questions in task #1 Put up your sample databases for tasks #2 and #3 Define ETL in more depth by the activities.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Database Design - Lecture 1
Ahsan Abdullah 1 Data Warehousing Lecture-17 Issues of ETL Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
IBM Start Now Business Intelligence Solutions. Agenda Overview of BI Who will buy and why Start Now BI solution Benefit to customer.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Databases and Data Warehouses: Supporting the Analytics-Driven.
Agenda 03/27/2014 Review first test. Discuss internal data project. Review characteristics of data quality. Types of data. Data quality. Data governance.
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
© 2007 by Prentice Hall 1 Introduction to databases.
GETTING THE DATA INTO THE WAREHOUSE: EXTRACT, TRANSFORM, LOAD MIS2502 Data Analytics.
BUS1MIS Management Information Systems Semester 1, 2012 Week 6 Lecture 1.
University of Nevada, Reno Organizational Data Design Architecture 1 Organizational Data Architecture (2/19 – 2/21)  Recap current status.  Discuss the.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
MIS2502: Data Analytics The Information Architecture of an Organization.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
MIS2502: Data Analytics Dimensional Data Modeling
PPTTEST 11/18/ :15 1 IT Ron Williams Business Innovation Through Information Technology Information Management.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
McGraw-Hill/Irwin ©2009 The McGraw-Hill Companies, All Rights Reserved CHAPTER 6 DATABASES AND DATA WAREHOUSES CHAPTER 6 DATABASES AND DATA WAREHOUSES.
7 Strategies for Extracting, Transforming, and Loading.
Chapter 6.  Problems of managing Data Resources in a Traditional File Environment  Effective IS provides user with Accurate, timely and relevant information.
University of Nevada, Reno Organizational Data Design Architecture 1 Agenda for Class: 02/06/2014  Recap current status. Explain structure of assignments.
MIS 451 Building Business Intelligence Systems Data Staging.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
BUSINESS INTELLIGENCE. The new technology for understanding the past & predicting the future … BI is broad category of technologies that allows for gathering,
11 Database The ultimate in data organization. 2 Database Management Systems (DBMS)  Application software designed to capture and analyze data  Four.
CHAPTER SIX DATA Business Intelligence
Data Staging Data Staging Legacy System Data Warehouse SQL Server
Data warehouse and OLAP
MIS2502: Data Analytics Dimensional Data Modeling
MIS5101: Extract, Transform, Load (ETL)
Databases and Data Warehouses Chapter 3
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
MIS5101: Business Intelligence Outcomes Measurement and Data Quality
MIS2502: Data Analytics Dimensional Data Modeling
MIS5101: Extract, Transform, Load (ETL)
MIS2502: Data Analytics Extract, Transform, Load
MIS5101: Extract, Transform, Load (ETL)
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
MIS2502: Data Analytics The Information Architecture of an Organization Aaron Zhi Cheng Acknowledgement:
MIS2502: Data Analytics Extract, Transform, Load
MIS2502: Data Analytics Extract, Transform, Load
MIS2502: Data Analytics Dimensional Data Modeling
Data Warehousing Concepts
MIS2502: Data Analytics Data Extract, Transform, Load(ETL)
Instructor Materials Chapter 5: Ensuring Integrity
Presentation transcript:

MIS2502: Data Analytics Extract, Transform, Load

Where we are… Now we’re here… Data entry Transactional Database Data extraction Analytical Data Store Data analysis Stores real-time transactional data Stores historical transactional and summary data

Getting the information into the data mart The data in the operational database… …is put into a data warehouse… …which feeds the data mart… …and is analyzed as a cube. Now let’s address this part…

Extract, Transform, Load (ETL) Extract data from the operational data store Transform data into an analysis-ready format Load it into the analytical data store

The Actual Process Query Transactional Database 1 Data conversion Extract Transform Load Transactional Database 1 Query Data conversion Query Data Mart Transactional Database 2 Data conversion Query Query Relational database Dimensional database

ETL’s Not That Easy! Data Consistency Data Quality What if the data is in different formats? Data Consistency How do we know it’s correct? What if there is missing data? What if the data we need isn’t there? Data Quality

Data Consistency: The Problem with Legacy Systems An IT infrastructure evolves over time Systems are created and acquired by different people using different specifications This can happen through: Changes in management Mergers & Acquisitions Externally mandated standards Generally poor planning

This leads to many issues Redundant data across the organization Customer record maintained by accounts receivable and marketing The same data element stored in different formats Social Security number (123-45-6789 versus 123456789) Different naming conventions “Doritos” versus “Frito-Lay’s Doritos” verus “Regular Doritos” Different unique identifiers used Account_Number versus Customer_ID What are the problems with each of these ?

What’s the big deal? This is a fundamental problem for creating data cubes We often need to combine information from several transactional databases How do we know if we’re talking about the same customer or product?

Now think about this scenario Hotel Reservation Database Café Database What are the differences between a “guest” and a “customer”? Is there any way to know if a customer of the café is staying at the hotel?

Solution: “Single view” of data The entire organization understands a unit of data in the same way It’s both a business goal and a technology goal but it’s really more this… ...than this

Closer look at the Guest/Customer Guests Guest_number Guest_firstname Guest_lastname Guest_address Guest_city Guest_zipcode Guest_email Customer Customer_number Customer_name Customer_address Customer_city Customer_zipcode vs. Getting to a “single view” of data: How would you represent “name?” What would you use to uniquely identify a guest/customer? Would you include email address? How do you figure out if you’re talking about the same person?

Organizational issues Why might there be resistance to data standardization? Is it an option to just “fix” the transactional databases? If two data elements conflict, who’s standard “wins?”

Data Quality The degree to which the data reflects the actual environment Do we have the right data? Is the collection process reliable? Is the data accurate?

Finding the right data Choose data consistent with the goals of analysis Verify that the data really measures what it claims to measure Include the analysts in the design process Adapted from http://www2.ed.gov/about/offices/list/os/technology/plan/2004/ site/docs_and_pdf/Data_Quality_Audits_from_ESP_Solutions_Group.pdf

Ensuring accuracy Know where the data comes from Manual verification through sampling Use of knowledge experts Verify calculations for derived measures Adapted from http://www2.ed.gov/about/offices/list/os/technology/plan/2004/ site/docs_and_pdf/Data_Quality_Audits_from_ESP_Solutions_Group.pdf

Reliability of the collection process Build fault tolerance into the process Periodically run reports, check logs, and verify results Keep up with (and communicate) changes Adapted from http://www2.ed.gov/about/offices/list/os/technology/plan/2004/ site/docs_and_pdf/Data_Quality_Audits_from_ESP_Solutions_Group.pdf