Enterprise Business Processes and Reporting (IS 6214) MBS MIMAS 24 th Feb 2010 Fergal Carton Business Information Systems
Last week Data storage –Not all data is stored in centralised systems –Process control / Manufacturing Execution Systems (MES) Cost of information collection –Eg. Cucina starts selling to customers in non-Euro zone Deciding what information to collect –Eg. Cucina management want to know what is going on in bread market in Ireland Reporting, scrutininsing, discovering: degree to which search parameters are known in advance Latency in technical archtecture and managers view –Not my problem, just get it sorted Exploiting data warehouses: significant work on cleaning Cucina and reporting types Cucina Working Prototype
This week Cucina and real time information Integration definition Extract, transform, load (ETL) Real time data Refresh rates and response times Briefing on EMC business pre-visit Reporting assignment
Cucina and real-time What information is required in real time? Why? What decisions are made? How can this data be provided? What constraints does the delivery of this data put on IT? What is the trade off between cost of providing information and benefit to managers> What about evolving requirements?
Integration is defined as the … … incorporation of information technologies into ways of working … sharing of information between people, processes or applications … implementation of virtual control (visibility) of physical processes and resources … connecting of applications and technologies such that data is shared … making visible to participants in a business process the information they require in a coherent fashion Databases ensure this coherency but at the cost of rigidity Any other way to ensure coherency?
ETL Tools Extraction, Transformation, and Loading Specification based Eliminate custom coding Third party and DBMS based tools
Data extraction and transformation Getting data out of legacy applications Cleaning up the data Enriching it with new data Converting it to a form suitable for upload Staging areas
Data Quality Problems Multiple identifiers Multiple field names Different units Missing values Orphaned values Multipurpose fields Conflicting data Different update times
Data Quality Problems Multiple identifiers: –some data sources may use different primary keys for the same entity such as different customer numbers. Multiple names: –the same field may be represented using different field names. Different units: –measures and dimensions may have different units and granularities. Missing values: –data may not exist in some databases. To compensate for missing values, different default values may be used across data sources.
Data Quality Problems Orphaned transactions: –some transactions may be missing important parts such as an order without a customer. Multipurpose fields: –some databases may combine data into one field such as different components of an address. Conflicting data: –some data sources may have conflicting data such as different customer addresses. Different update times: –some data sources may perform updates at different intervals.
Example 1 – the supplier file Sup codeSup nameSup addressCityPhone 4 digits Sup codeSup nameSup address…PhoneCat 3 letters +1,2,3 depending 4 digitson total purchases last year OLD NEW New supplier code to include city where firm is based Assignation of category based on amounts purchased
Example 2: merging files Complete customer file based on Accounts and Sales and Shipping OLD (finance) CustIDnameaddresscityaccount numbercredit limitbalance OLD (sales) OLD (Shipping) CustID*nameaddresscitydiscount ratessales_to_daterep_name CustID**nameaddresscityPreferred haulier
Example 3: customer files Hi, You know that SAP plans to end support for its R/3 4.6c software at the end of 2010 and R/3 4.7 by the end of Now you need to reach companies that are using SAP R/3 systems and convince them to upgrade their software to meet their emerging regulatory or business requirements. How do you reach them? You no doubt use your in-house database of customers who procured the software from you to reach out. There are many other companies apart from those in your in-house list who are yet running on older versions. Do you want to reach them? We can help you with a valid database of old version users. Our database has complete details including complete contact name, title, Job title, Address, Company, Postal Address, City, State/Province, ZIP/Postal Code, ZIP4, Country, Phone, Fax, Employees, and Sales; SIC Code, NAICS and Web Address. If you are interested in this database, please call let me know. Matt Brown, Ph: | Relationship Marketing (Data Services)| NewYork Business Center Inc.| 430 Park Avenue| 18th Floor. New York| NY 10022
Refreshing databases Timing Criticality of information Volume of data Response time Real-time requirement Level of aggregation / granularity
Life cycle of the DW Operational Databases Warehouse Database First time load Refresh Refresh Refresh Purge or Archive
Real time information Up to date On-line Actual data Live feed Decisions made on what basis?
Real time requirement? Historical sales or accounting data, not real-time Sales as quarter end approaches Inventory levels for MRP Exchange rates, when is Visa rate calculated? Real-time processing: card transactions down
Response times Response times are a function of : – response time, –Infrastructure elements, –Database sizing –Transaction processing –Interfaces –Reporting –Other processing demands –Peak times –…
Refresh Optimization
Determining the Refresh Frequency Maximize net refresh benefit Value of data timeliness Cost of refresh Satisfy data warehouse and source system constraints