Data Mining Lecture 2. Course Syllabus Course topics: Introduction (Week1-Week2) –What is Data Mining? –Data Collection and Data Management Fundamentals.

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

April 30, Data Warehousing and OLAP Technology: An Overview  What is a data warehouse?  Data warehouse architecture  From data warehousing to.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Chapter 3 Database Management
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DATA WAREHOUSE (Muscat, Oman).
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Lecture-8/ T. Nouf Almujally
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
Fundamentals of Information Systems, Fifth Edition
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Chapter 1 Introduction to Data Mining
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Datawarehouse Objectives
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DATABASES AND DATA WAREHOUSES
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
Data Mining Data Warehouses.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Fundamentals & Ethics of Information Systems IS 201
Chapter 13 The Data Warehouse
Data Warehouse—Subject‐Oriented
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Data Warehouse and OLAP
Data Warehousing and Data Mining
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Warehousing Concepts
Data Warehouse and OLAP
Data Warehouse and OLAP Technology
Presentation transcript:

Data Mining Lecture 2

Course Syllabus Course topics: Introduction (Week1-Week2) –What is Data Mining? –Data Collection and Data Management Fundamentals –The Essentials of Learning –The Emerging Needs for Different Data Analysis Perspectives Data Management and Data Collection Techniques for Data Mining Applications (Week3-Week4) –Data Warehouses: Gathering Raw Data from Relational Databases and transforming into Information. –Information Extraction and Data Processing Techniques –Data Marts: The need for building highly specialized data storages for data mining applications

Data: – raw – atomic –(mostly!) operational Information: –processed –re-organized –grouped Knowledge –patterns, models, findings ‘behind’ Information Wisdom –perfect orchestration of Knowledge Data Data (Operation) Information (Analytic) Knowledge Wisdom “Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?” T. S. Eliot Week 2- Data vs. Knowledge

Week 2- Evolution of Database and Information Systems 1960s: (focus on efficient data collection) Data collection, database creation, IMS and network DBMS 1970s: (focus on structured data collection) Relational data model, relational DBMS implementation 1980s: (focus on information extraction) RDBMS, advanced data models (extended- relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s – 2000s: (focus on knowledge extraction and modeling) Data Mining, Data Warehousing, Multi Dimensional Databases

Week 2- Data Collection and Data Management Fundamentals – What is Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision making process” William H. Inmon Subject-oriented: A data warehouse is organized around major subjects, such as customer,supplier, product, and sales.Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers

Week 2- Data Collection and Data Management Fundamentals – What is Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision making process” William H. Inmon Integrated: A data warehouse is usually constructed by integrating multiple Heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.

Week 2- Data Collection and Data Management Fundamentals – What is Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision making process” William H. Inmon Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5–10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time. Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.

data cleaning data integration data consolidation Week 2- Data Collection and Data Management Fundamentals – What is Data Warehouse

object oriented methodology comes in entities (cubes) attributes (dimensions) Week 2- Data Collection and Data Management Fundamentals – What is OLAP

taken from the Text Book

Multi Dimensional Database Modeling –star schema –snowflake schema –fact constellation schema fact vs dimension Week 2- Data Collection and Data Management Fundamentals – What is OLAP

taken from the Text Book

Week 2- Data Collection and Data Management Fundamentals – What is OLAP taken from the Text Book

Week 2- Data Collection and Data Management Fundamentals – What is OLAP taken from the Text Book

Week 2- Data Collection and Data Management Fundamentals – OLAP Operations taken from the Text Book roll-up drill-down slice dice pivot (rotation)

Week 2- Data Collection and Data Management Fundamentals – OLAP Operations

Week 2- Data Collection and Data Management Fundamentals – What is Data Mart ? data warehouse information about subjects that span the entire organization, its scope is enterprise-wide. which modeling schema ? the fact constellation schema is commonly used, since it can model multiple, interrelated subjects. data mart a department subset of the data warehouse that focuses on selected subjects, its scope is departmentwide. which modeling schema ? the star or snowflake schema are commonly used, since both are geared toward modeling single subjects

Week2-OLAP vs Data Mining On-Line Analytical Processing provides the ability to pose statistical and summary queries interactively (traditional On-Line Transaction Processing (OLTP) databases may take minutes or even hours to answer these queries) Advantages relative to data mining Can obtain a wider variety of results Generally faster to obtain results Disadvantages relative to data mining User must “ask the right question” Generally used to determine high-level statistical summaries, rather than specific relationships among instances

Week2-Reporting vs Data Mining Reporting Last months sales for each service type Sales per service grouped by customer sex or age bracket List of customers who lapsed their policy Data Mining What characteristics do customers that lapse their policy have in common and how do they differ from customers who renew their policy? Which motor insurance policy holders would be potential customers for my House Content Insurance policy?

Week2- Data to Knowledge Pyramid Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP

Week 2- Data Mining Perspective to Knowledge Discovery adapted from: U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An Overview,” Advances in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press Data Target Data Selection Knowledge Preprocessed Data Patterns Data Mining Interpretation/ Evaluation Preprocessing

Week2- Data Mining Process Flow Background Knowledge Goals for LearningKnowledge BaseDatabase(s) Plan for Learning Discover Knowledge Determine Knowledge Relevancy Evolve Knowledge/ Data Generate and Test Hypotheses Visualization and Human Computer Interaction Discovery Algorithms “In order to discover anything, you must be looking for something” Laws of Serendipity

Week2-Simplified view of Data Mining Process Flow Data Warehouse Data cleaning & data integration Filtering Databases Database or data warehouse server Data mining engine Pattern evaluation Graphical user interface Knowledge-base

Week 2- Extended Perspective on Data Mining Process Flow Data Warehouse Meta Data MDDB OLAM Engine OLAP Engine User GUI API Data Cube API Database API Data cleaning Data integration Layer3 OLAP/OLAM Layer2 MDDB Layer1 Data Repository Layer4 User Interface Filtering&IntegrationFiltering Databases Mining queryMining result

Week 2- Essentials of Learning Learning ? can we formalize it? is it just a chemical activation? is it memorization? is it continous node connecting/disconnecting on dynamically changing brain network topology?

Week 2- Essentials of Learning The Artifical Intelligence View: central to human knowledge and intelligence, essential for building intelligent machines. years of effort in AI has shown that trying to build intelligent computers by programming all the rules cannot be done; automatic learning is crucial. For example, we humans are not born with the ability to understand language — we learn it — and it makes sense to try to have computers learn language instead of trying to program it all it

Week 2- Essentials of Learning The Software Engineering View: Machine Learning allows us to program computers by example, which can be easier than writing code the traditional way. The Stats View: Machine Learning is the marriage of computer science and statistics computational techniques are applied to statistical problems. Machine Learning has been applied to a vast number of problems in many contexts, beyond the typical statistics problems. Machine Learning is often designed with different considerations than statistics (e.g., speed is often more important than accuracy).

Week 2-End Please check the web site for Learning Theory and its Esssentials: read –Course Text Book Chapter 3