Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNIT – I Data Warehouse and data mining

Similar presentations


Presentation on theme: "UNIT – I Data Warehouse and data mining"— Presentation transcript:

1 UNIT – I Data Warehouse and data mining
DEPARTMENT OF MANAGEMENT STUDIES UNIT – I Data Warehouse and data mining Dr. G. MAHESWARAN/AP 6/4/2018

2 DATA WARE HOUSE A data warehouse essentially combines information from several sources into one comprehensive database. For example, in the business world, a data warehouse might incorporate customer information from a company's point- of-sale systems (the cash registers), its website, its mailing lists and its comment cards. Alternatively, it might incorporate all the information about employees, including time cards, demographic data, salary information, etc. By combining all of this information in one place, a company can analyze its customers in a more holistic way, ensuring that it has considered all the information available. Data warehousing also makes data mining possible, which is the task of looking for patterns in the data that could lead to higher sales and profits. 2/26 Dr. G. MAHESWARAN/AP

3 CONNECT THE WORDS Collections of databases that work together are called data warehouses. This makes it possible to integrate data from multiple databases Data mining is used to help individuals and organizations make better decisions. Data mining is a broad set of activities used to uncover patterns in, and give meaning to, data. The data warehouse, on the other hand, is a repository for information that may be used, among other things, to support data mining. ... Often this sort of analysis is referred to as data mining. Example Google, bing. There is a basic difference that separates data mining and data warehousing that is data mining is a process of extracting meaningful data from the large database or data warehouse. However, data warehouse provides an environment where the data is stored in an integrated form which ease data mining to extract data more efficiently. Dr. G. MAHESWARAN/AP 1/26 Dr. G. MAHESWARAN/AP 3/26

4 A producer wants to know….
Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? What is the most effective distribution channel? What product prom- -otions have the biggest impact on revenue? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? Dr. G. MAHESWARAN/AP

5 Data, Data everywhere yet ...
I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to other Dr. G. MAHESWARAN/AP

6 Evolution of Database Technology
Data collection, database creation, IMS and network DBMS 1970s: Relational data model, relational DBMS implementation 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s—2000s: Data mining and data warehousing, multimedia databases, and Web databases 6/26 Dr. G. MAHESWARAN/AP

7 DATA WAREHOUSE A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context The data warehouse is that portion of an overall Architected Data Environment that serves as the single integrated source of data for processing information. 7/26 Dr. G. MAHESWARAN/AP

8 CHARACTERISTICS DATA WAREHOUSE Subject Oriented Integrated Non Volatile Time variant Accessible Process Oriented Subject-Oriented: Information is presented according to specific subjects or areas of interest, not simply as computer files. Data is manipulated to provide information about a particular subject. For example, the SRDB is not simply made accessible to end-users, but is provided structure and organized according to the specific needs. Integrated: A single source of information for and about understanding multiple areas of interest. The data warehouse provides one-stop shopping and contains information about a variety of subjects. Thus the OIRAP data warehouse has information on students, faculty and staff, instructional workload, and student outcomes. Non-Volatile: Stable information that doesn’t change each time an operational process is executed. Information is consistent regardless of when the warehouse is accessed. Time-Variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse. Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users. Process-Oriented: It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is on-going and iterative in nature. 8/26 Dr. G. MAHESWARAN/AP

9 TERMS RELATED TO DATA WAREHOUSE 9/26
DATA MART STAGING AREA OLAP OLAP TOOLS Data Mart: A data structure that is optimized for access. It is designed to facilitate end-user analysis of data. It typically supports a single, analytic application used by a distinct set of workers. Staging Area: Any data store that is designed primarily to receive data into a warehousing environment. OLAP (On-Line Analytical Processing): A method by which multidimensional analysis occurs where multidimensional analysis is the ability to manipulate information by a variety of relevant categories or “dimensions” to facilitate analysis and understanding of the underlying data. It is also sometimes referred to as “drilling-down”, “drilling-across” and “slicing and dicing” OLAP Tools: A set of software products that attempt to facilitate multidimensional analysis. Can incorporate data acquisition, data access, data manipulation, or any combination thereof. 9/26 Dr. G. MAHESWARAN/AP

10 COMPONENTS OF DATAWAREHOUSE SYSTEM
Operational, External & other Databases Analytical Data Store Enterprise Warehouse Data Marts MANAGEMENT DATA Data Analysis Data Acquisition Capture, Clean, Transform, Transport Query, Report, Analyze, Mine, Deliver Metadata Directory Metadata Repository MANAGEMENT METADATA Web Information Systems Warehouse Design 10/26 Dr. G. MAHESWARAN/AP

11 HOW IS THE WAREHOUSE DIFFERENT?
The data warehouse is distinctly different from the operational data used and maintained by day-to-day operational systems. Data warehousing is not simply an “access wrapper” for operational data, where data is simply “dumped” into tables for direct access. 11/26 Dr. G. MAHESWARAN/AP

12 OPERATIONAL DATA DATA WAREHOUSE Application oriented Detailed
Accurate, as of the moment of access Serves the clerical community Performance sensitive (immediate response required when entering a transaction) Flexible structure; variable contents Small amount of data used in a process DATA WAREHOUSE Subject oriented Summarized Represents values over time Serves the managerial community Performance relaxed (immediacy not required) Static structure large amount of data used i12 a process Dr. G. MAHESWARAN/AP

13 13/26 Dr. G. MAHESWARAN/AP

14 Motivation: “Necessity is the Mother of Invention”
Data explosion problem Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases 14/26 Dr. G. MAHESWARAN/AP

15 What Is Data Mining? Data mining (knowledge discovery in databases):
Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Alternative names: Data mining: a misnomer? Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. What is not data mining? (Deductive) query processing. Expert systems or small ML/statistical programs 15/26 Dr. G. MAHESWARAN/AP

16 Why Data Mining? —other Potential Applications
Database analysis and decision support Market analysis and management target marketing, customer relation management, market basket analysis, cross selling, market segmentation Risk analysis and management Forecasting, customer retention, improved underwriting, quality control, competitive analysis Fraud detection and management Other Applications Text mining (news group, , documents) Stream data mining Web mining. DNA data analysis 16/26 Dr. G. MAHESWARAN/AP

17 Market Analysis and Management
Where are the data sources for analysis? Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies Target marketing Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc. Determine customer purchasing patterns over time Conversion of single to a joint bank account: marriage, etc. Cross-market analysis Associations/co-relations between product sales Prediction based on the association information 17/26 Dr. G. MAHESWARAN/AP 6/4/2018

18 Market Analysis and Management
Customer profiling data mining can tell you what types of customers buy what products (clustering or classification) Identifying customer requirements identifying the best products for different customers use prediction to find what factors will attract new customers Provides summary information various multidimensional summary reports statistical summary information (data central tendency and variation) Dr. G. MAHESWARAN/AP

19 Corporate Analysis and Risk Management
Finance planning and asset evaluation cash flow analysis and prediction contingent claim analysis to evaluate assets cross-sectional and time series analysis (financial-ratio, trend analysis, etc.) Resource planning: summarize and compare the resources and spending Competition: monitor competitors and market directions group customers into classes and a class-based pricing procedure set pricing strategy in a highly competitive market 19/26 Dr. G. MAHESWARAN/AP

20 Fraud Detection and Management
Applications widely used in health care, retail, credit card services, telecommunications (phone card fraud), etc. Approach use historical data to build models of fraudulent behavior and use data mining to help identify similar instances Examples auto insurance: detect a group of people who stage accidents to collect on insurance money laundering: detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network) medical insurance: detect professional patients and ring of doctors and ring of references 20/26 Dr. G. MAHESWARAN/AP

21 Fraud Detection and Management
Detecting inappropriate medical treatment Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr.). Detecting telephone fraud Telephone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm. British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud. Retail Analysts estimate that 38% of retail shrink is due to dishonest employees. 21/26 Dr. G. MAHESWARAN/AP

22 Some more Applications
Sports IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat Astronomy JPL and the Palomar Observatory discovered 22 quasars with the help of data mining Internet Web Surf-Aid IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc. 22/26 Dr. G. MAHESWARAN/AP

23 23/26 Dr. G. MAHESWARAN/AP

24 24/26 Dr. G. MAHESWARAN/AP

25 25/26 Dr. G. MAHESWARAN/AP

26 26/26 Dr. G. MAHESWARAN/AP 6/4/2018


Download ppt "UNIT – I Data Warehouse and data mining"

Similar presentations


Ads by Google