Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009.

Slides:



Advertisements
Similar presentations
History Data Service1 Good Design for Historical source based Databases History Data Service Hamish James.
Advertisements

Chapter 6: Entity-Relationship Model (part I)
Wisconsin Department of Health Services Richard Miller Research Scientist Wisconsin Office of Health Informatics October 28, 2014 Matching Traffic Crash.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Managing Data Resources
Data Quality Class 3. Goals Dimensions of Data Quality Enterprise Reference Data Data Parsing.
Chapter 7 Using Data Flow Diagrams
PROCESS MODELING Transform Description. A model is a representation of reality. Just as a picture is worth a thousand words, most models are pictorial.
MANAGEMENT USES OF INFORMATION Pertemuan 02 Matakuliah: F0204 / SISTEM AKUNTANSI Tahun: 2007.
Chapter 14 The Second Component: The Database.
Data Modeling 1 Yong Choi School of Business CSUB.
Yong Choi School of Business CSUB
System Analysis Overview Document functional requirements by creating models Two concepts help identify functional requirements in the traditional approach.
Chapter 5: Modeling Systems Requirements: Events and Things
Modeling Systems Requirements: Events and Things.
IT 244 Database Management System Data Modeling 1 Ref: A First Course in Database System Jeffrey D Ullman & Jennifer Widom.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Systems Analysis and Design in a Changing World, Fifth Edition
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
5 Systems Analysis and Design in a Changing World, Fourth Edition.
Chapter 12 View Design and Integration. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Motivation for view design.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Systems Analysis and Design in a Changing World, 6th Edition 1 Chapter 4 - Domain Classes.
Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
CS370 Spring 2007 CS 370 Database Systems Lecture 4 Introduction to Database Design.
Chapter 9 View Design and Integration. © 2001 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Outline Motivation for view design.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Planning for Success 7 th - 8 th Meeting Course Name: Business Intelligence Year: 2009.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Lecture 4 Conceptual Data Modeling. Objectives Define terms related to entity relationship modeling, including entity, entity instance, attribute, relationship,
5 Systems Analysis and Design in a Changing World, Fifth Edition.
1 What is OO Design? OO Design is a process of invention, where developers create the abstractions necessary to meet the system’s requirements OO Design.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Week 2 The lecture for this week is designed to provide students with a general overview of 1) quantitative/qualitative research strategies and 2) 21st.
Data Modeling Yong Choi School of Business CSUB. Part # 2 2 Study Objectives Understand concepts of data modeling and its purpose Learn how relationships.
7 Strategies for Extracting, Transforming, and Loading.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Business Rules 12 th Meeting Course Name: Business Intelligence Year: 2009.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Chapter 3: Modeling Data in the Organization. Business Rules Statements that define or constrain some aspect of the business Assert business structure.
Data Profiling 13 th Meeting Course Name: Business Intelligence Year: 2009.
Data Warehouses, Online Analytical Processing, and Metadata 11 th Meeting Course Name: Business Intelligence Year: 2009.
Prepared By: Razif Razali 1 TMK 264: COMPUTER SECURITY CHAPTER SIX : ADMINISTERING SECURITY.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Business Models and Information Flow 10 th Meeting Course Name: Business Intelligence Year: 2009.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
Chapter 1 MARKETING IS ALL AROUND US. The Scope of Marketing Marketing is activity, set of institutions, and processes for creating, communicating, delivering,
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Copyright  2007 McGraw-Hill Pty Ltd PPTs t/a Marketing Research 2e by Lukas, Hair, Bush and Ortinau Slides prepared by Judy Rex 19-1 Chapter Nineteen.
Managing Data Resources File Organization and databases for business information systems.
Elements Of Modeling. 1.Data Modeling  Data modeling answers a set of specific questions that are relevant to any data processing application. e.g. ◦
5 Systems Analysis and Design in a Changing World, Fourth Edition.
5 Chapter 5: Modeling Systems Requirements: Events and Things Systems Analysis and Design in a Changing World.
Software Project Configuration Management
Databases Chapter 16.
Unified Modeling Language
Entity-Relationship Model
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Applications of Data Mining in Software Engineering
Database Design Using the REA Data Model
Database Design Chapters 17 and 18.
Presentation transcript:

Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

Bina Nusantara University 3 Source of this Material (2).Loshin, David (2003). Business Intelligence: The Savvy Manager’s Guide. Chapter 13

The Business Case There are two aspect to the business value of data enhancement. The first is that as organizational data environments mature and data managers want to exploit the corporate data asset, there is an increased necessity for sharing data from different group. The second aspect emerges from the actionable knowledge that can be discovered only by analyzing the result of composing multiple data sets. Data enhancement is a critical component to the BI program, especially as a value-adding process to the following. Competition in knowledge industries Customer relationship management Micromarketing and personalization Cooperative marketing Industry deregulation Bina Nusantara University 4

There are two approaches to data enhancement. One focuses on incrementally improving or adding information as data is viewed or processed. Incremental enhancements are useful as a component of a later analysis stage, such sequence pattern analysis and behavior modeling. The other approach is batch enhancement, where data collections are aggregated and methods are applied to the collection to create value-added information. Here some examples. Auditing Enhancement In business processes that require some degree of tracing capability, a frequent data enhancement is the addition of auditing data. Creating a tracking system associated with a sequence of related events provides a framework for evaluating efficiency within a business process. Temporal Enhancement Historical data provides critical insight to a BI program. Whereas in some cases the history is embedded in the collected data, other instances require that activity be enhanced by incrementally adding timestamps noting the time at which some event occurred. Bina Nusantara University 5 Types of Data Enhancement

Contextual Enhancement The place, or context, of data manipulation is an enhancement as well. A physical location, a path of access, the login account through which a series of transactions were performed, are examples of context that can augment data. Contextual enhancement also includes tagging data records in a way to be correlated with other pieces of data. Geographic Enhancement Data enhanced with geographic information allows for analysis based on regional clustering and data inference based in predefined geodemographics. The first kind of geographic enhancement is the process of address standardization, where addresses are cleansed and then modified to fit a predefined postal standard. Demographic Enhancement Demographic describe the similarities that exist within an entity cluster, such as customer age, marital status, gender, income, and ethnic coding. Demographic enhancements or through direct information merging. Bina Nusantara University 6 Types of Data Enhancement (cont…)

Psychographic Enhancement Psychographics describe what distinguishes individual entities within a cluster. Psychographics information is frequently collected via surveys, contest forms, customer service activity, registration cards, as well as specialized lists. The trick to using psychographic data is in being able to make the linkage between the entity within the organization database and the supplied psychographic data set. Inference Enhancement Information inference is a BI technique that allows the user to draw conclusions about the examined entity based on supporting evidence and business rules. Inferred knowledge can be used to augment data to reflect what we have learned, and this in turn provides greater insight into solving the business problem at hand. Bina Nusantara University 7 Types of Data Enhancement (cont…)

Incremental enhancement are those that can be attached to data in process. Provenance The provenance of an item is its source. This idea generalizes the temporal and auditing enhancements described earlier. A provenance can be as simple as a single string data field describing the source or as complex as a separate table containing a time stamp and a location code each time the record is updated, related through a foreign key. Audit Trails The combination of location, time, and activity information associated with a series of manipulations of a data record allows us to trace back all occasions at which that information was touched, giving us the audit data allowing us to see how activities cause data to flow through a system. Context This kind of enhanced data provides significant marketing benefit, because this context information can be fed into a statistical framework for reporting on the behavior of users based on their locations or times of activity. Bina Nusantara University 8 Incremental Enhancement

Batch enhancements are applied to a large set of data instances as an offline process. They typically involve the merging of data from multiple instances within a single data set or multiple data instances drawn from multiple data sets. Householding Householding is a process that attempts to reduce a set of individuals to a single grouped housing unit based on the database record attribution. A household consists of all people living as an entity within the same residence. Organizational Merging When organizations merge, they will eventually want to merge their vendor, customer, and employee databases as well as their base reference data. Other Batch Enhancements Other batch enhancements include data scrubbing, data cleansing, and health care diagnosis assistance, as well as building affinity programs and constructing relational associations, among others. Bina Nusantara University 9 Batch Enhancements

Standardization refers to ensuring that a data instance conforms to a predefined expected format. A data standard is a format representation for data values that can be described using a series of rules. Because a standard is a distinct model to which all items in a set must conform, this means we can try to automate two components of any standardization process: Determination of conformance to the standard Bringing a nonstandard data instance into conformance with the standard There is usually a well-defined rule set describing both how to determine if an item conforms to the standard and what actions need to be taken to bring the offending item into conformance. Data Standard and Standardization The value of data standardization lies in the notion that given the right base of reference information and a well-defined rule set, additional data can be added to a record in a purely automated way. Probably the most important benefit of standardization is that through the process of defining standards, organizations create a streamlined means for the transference and sharing of information. Bina Nusantara University 10 Standardization

Kinds of Standards Most standards either are dictated by some authority (such as the government), are developed through cooperation (such as an industry-defined standard), or are derived from common use (such as geographical biases toward representing dates). Bina Nusantara University 11 Standardization

In this section, we look at the different components of an address. The Address Standard  Recipient line The recipient line indicates the person or entity to which the mail is to be delivered.  Delivery Address line The delivery address line is the line that contains the specific location associated with the recipient.  Last line The last line of the address includes the city name, state, and ZIP code. Standard Abbreviations The postal service provides, a set of enumerations of standard abbreviations, including U.S. State and Possession abbreviations, street abbreviations, as well as common business word abbreviations. Bina Nusantara University 12 Example: Address Standardization

Zip + 4 ZIP codes are postal codes assigned to delivery areas to improve the precision of sorting and delivering mail. ZIP + 4 codes are a further refinement, narrowing down a delivery location within as subsection of a building or a street. Address Standardization Software Because the USPS addressing standard is so well documented, it is relatively straightforward to build automated address standardization software, which eases the way in which this enhancement can be performed. Bina Nusantara University 13 Example: Address Standardization (cont…)

There are many issues involved in data enhancement, but because a large number of them revolve around information record linkage, it is worthwhile to explore this greater detail. Record Linkage Any two records can be connected based on a set of chosen attributes are candidates to be linked together. Usually record linkage is performed only when the chosen attributes match exactly, but simple record linkage is limited, for the following reasons.  Information is missing  Information sources are in different formats  Record linkage is imprecise  Information is out of synchronization  Information is lost Semistructured Data Semistructured data refers to information that is partially formatted, such as data elements on a web page or the comments field in a customer service database. Bina Nusantara University 14 Enhancement Methodologies

Semistructured data may be a good source for both association and relation information, but the problem of extracting information out of the data is particularly difficult. Inference An inference is an application of a heuristic rule that essentially creates a piece of information where its didn’t exist before. Even though inferencing represents the application of intuition, it is done so in a way that can be automated. Inference rules usually reflect some understood business analysis that can be boiled down to a set of business rules. Types of Inference Enhancements based on inferencing are usually very focused bits of information relevant within a particular analytical context. Inferences are likely to center on demographic or psychographic details that can be derived as a direct result of data merging and analysis. Bina Nusantara University 15 Enhancement Methodologies

Buy versus Build In the software and services market, the term data enhancement is overloaded and can be used to refer to anything from data cleansing and address standardization all the way to services-based record linkage as a means to add data fields to submitted data, such as credit ratings. Performance Issues Some data enhancement applications are likely to be of high computational complexity, and therefore members of the team should be aware of high performance computing as well as database manipulation, ETL, and pattern matching. Bina Nusantara University 16 Management Issues

End of Slide Bina Nusantara University 17