Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Special Topics in Data Mining. Direct Objectives To learn data mining techniques To see their use in real-world/research applications To get an understanding.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Mining Techniques
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining By Dave Maung.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Principles of Data Mining Pham Tho Hoan
Data Mining Functionalities
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction to Data Mining
MIS 451 Building Business Intelligence Systems
Data Mining 101 with Scikit-Learn
Sangeeta Devadiga CS 157B, Spring 2007
CSE591: Data Mining by H. Liu
Data Warehousing and Data Mining
CHAPTER 7: Information Visualization
Data Pre-processing Lecture Notes for Chapter 2
CSE591: Data Mining by H. Liu
Presentation transcript:

Principles of Data Mining

Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data Mining Tasks 5. Components of Data Mining Algorithms 6. Statistics vs Data Mining

3. Difficulty lies in accessing it Large Data Sets are Ubiquitous 1. Due to advances in digital data acquisition and storage technology Business Supermarket transactions Credit card usage records Telephone call details Government statistics Scientific Images of astronomical bodies Molecular databases Medical records 2. Large databases mean vast amounts of information

Data Mining as Discovery Data Mining is Science of extracting useful information from large data sets or databases Also known as KDD Knowledge Discovery and Data Mining Knowledge Discovery in Databases

Data Mining Definition Analysis of (often large) Observational Data to find unsuspected relationships and Summarize data in novel ways that are understandable and useful to data owner Unsuspected Relationships non-trivial, implicit, previously unknown Ex of Trivial: Those who are pregnant are female Relationships and Summary are in the form of Patterns and Models Linear Equations, Rules, Clusters, Graphs, Tree Structures, Recurrent Patterns in Time Series Usefulness: meaningful:lead to some advantage, usually economic Analysis: Process of discovery (Extraction of knowledge)

Observational Data Objective of data mining exercise plays no role in data collection strategy E.g., Data collected for Transactions in a Bank Experimental Data Collected in Response to Questionnaire Efficient strategies to Answer Specific Questions In this way it differs from much of statistics For this reason, data mining is referred to as secondary data analysis

KDD Process Stages: Selecting Target Data Preprocessing Transforming them Data Mining to Extract Patterns and Relationships Interpreting Assesses Structures

Seeking Relationships Finding accurate, convenient and useful representations of data involves these steps: Determining nature and structure of representation E.g., linear regression Deciding how to quantify and compare two different representation E.g., sum of squared errors Choosing an algorithmic process to optimize score function E.g., gradient descent optimization Efficient Implementation using data management

2. Nature of Data Sets Structured Data set of measurements from an environment or process Simple case n objects with d measurements each: n x d matrix d columns are called variables, features, attributes or fields

21 Structured Data and Data Types US Census Bureau Data Public Use Microdata Sample data sets (PUMS) IDMarital Status 24854MaleMarriedHigh School grad ?? 29 Female Male Married HS grad Some College MaleNotChild0 Married PUMS Data has identifying information removed. Available in 5% and 1% sample sizes. 1% sample has 2.7 million records Missing data Noisy data A guess? Age Quantitative Continuous Education Income Categorical Ordinal Sex Categorical Nominal

Unstructured Data 1. Structured Data Well-defined tables, attributes (columns), tuples (rows) 2. Unstructured Data World wide web Documents and hyperlinks – HTML docs represent tree structure with text and attributes embedded at nodes – XML pages use metadata descriptions Text Documents Document viewed as sequence of words and punctuations – Mining Tasks »»»»»» Text categorization Clustering Similar Documents Finding documents that match a query

3.Types of Structures: Models and Patterns Representations sought in data mining Global Model Local Pattern Global Model Make a statement about any point in d-space Simple model: Y = aX + c Local Patterns Make a statement about restricted regions of space spanned by variables E.g.1: if X > thresh1 then Prob (Y > thresh2) =p

4. Data Mining Tasks Not so much a single technique Idea that there is more knowledge hidden in the data than shows itself on the surface Any technique that helps to extract more out of data is useful Five major task types: 1. Exploratory Data Analysis 2. Descriptive Modeling 3. Predictive Modeling 4. Discovering Patterns and Rules 5. Retrieval by Content)

Exploratory Data Analysis Interactive and Visual Pie Charts (angles represent size) Cox Comb Charts (radii represent size)

Descriptive Modeling Describe all the data or a process for generating the data Probability Distribution using Density Estimation Clustering and Segmentation Partitioning p-dimensional space into groups Similar people are put in same group

Predictive Modeling Classification and Regression Market value of a stock, disease Machine Learning Approaches

Discovering Patterns and Rules Detecting fraudulent behavior by determining data that differs significantly from rest Finding combinations of transactions that occur frequently in transactional data bases Grocery items purchased together

Retrieval by Content User has pattern of interest and wishes to find that pattern in database, Ex: Text Search Estimate the relative importance of web pages using a feature vector whose elements are derived from the Query-URL pair Image Search Search a large database of images by using content descriptors such as color, texture, relative position

Components of Data Mining Algorithms Four basic components in each algorithm 1. Model or Pattern Structure Determining underlying structure or functional form we seek from data 2. Score Function Judging the quality of the fitted model 3. Optimization and Search Method Searching over different model and pattern structures 4. Data Management Strategy Handling data access efficiently