An Introduction to Data Mining

Slides:



Advertisements
Similar presentations
Mining Association Rules from Microarray Gene Expression Data.
Advertisements

Data Mining Techniques Association Rule
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Understanding Data Analytics and Data Mining Introduction.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
1.Understand the essential elements that comprise a customer relationship management program 2.Describe the relationship that exists between marketing.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Association Rule Mining
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Academic Year 2014 Spring Academic Year 2014 Spring.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining Functionalities
Data Mining.
Data Mining – Intro.
What Is Cluster Analysis?
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data Mining Jim King.
Data Mining.
Data Mining 101 with Scikit-Learn
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Applications of Data Mining in Software Engineering
Chapter 3 Introduction to Data Mining
Introduction to Data Mining
Adrian Tuhtan CS157A Section1
Data Warehousing and Data Mining
Frequent patterns and Association Rules
Course Introduction CSC 576: Data Mining.
Data Mining.
Data Pre-processing Lecture Notes for Chapter 2
CSE591: Data Mining by H. Liu
Presentation transcript:

An Introduction to Data Mining By Rand Ali Computer Engineering & Information Technology Department

What is data Mining? Extraction of interesting patterns or knowledge from huge amount of data.

Why Data Mining The progress of computer hardware technology has led to large supplies of powerful and affordable computers, data collection equipment and storage media. The last decade has experienced a revolution in information availability and exchange via the Internet.

Why Data Mining The fast-growing, great amount of data, collected and stored in large and many data repositories, has far exceeded our human ability for understanding without powerful tools. As a result, data collected in large data repositories become “data tombs”—data archives that are seldom visited.

We are data rich but information poor

Data Mining objective Data mining tools perform data analysis and may uncover important data patterns, contributing greatly to business strategies and scientific and medical research. Data Mining turn data tombs into “golden nuggets” of knowledge.

Data mining—searching for knowledge (interesting patterns) in your data.

Data Mining is a step of knowledge Discovery process

Knowledge discovery as a process is an iterative sequence of the following steps: Data cleaning (to remove noise and inconsistent data). Data integration (where multiple data sources may be combined) Data selection (where data relevant to the analysis task are retrieved from the database) Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance) Data mining (an essential process where intelligent methods are applied in order to extract data patterns) Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures; Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)

What makes a pattern interesting? a pattern is interesting if it is easily understood by humans. valid on new or test data with some degree of certainty. potentially useful. novel.

Origins of Data Mining

Primary Data Mining Tasks In general, data mining tasks can be classified into two categories: descriptive and predictive. Predictive methods, use some variables to predict unknown or future values of other variables. Ex: Classification, Regression, Deviation Detection. Descriptive methods, characterize the general properties of the data in the database. Ex: Association Rule Discovery, Clustering, Sequential Pattern Discovery.

1- Association Rule Discovery Given a set of records each of which contain some number of items from a given collection. Association Rules Discovery produces dependency rules which will predict occurrence of an item based on occurrences of other items.

2-Sequential Pattern Discovery Sequential pattern mining is the discovery of frequently occurring ordered events or subsequences as patterns. An example of a sequential pattern is “Customers who buy a Canon digital camera are likely to buy an HP color printer within a month.”

3-Classification Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data (i.e., data objects whose class label is known).

Classification Example

4-Regression Whereas classification predicts categorical (discrete, unordered) labels, Regression analysis is used to predict missing or unavailable numerical data values rather than class labels.

5-Clustring clustering analyzes data objects without consulting a known class label. In general, the class labels are not present in the training data simply because they are not known to begin with. Clustering can be used to generate such labels. Clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Each cluster that is formed can be viewed as a class of objects, from which rules can be derived.

6-Outlier Analysis A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers.

Application 1 Market basket analysis analyzing customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. The discovery of such associations can help to develop marketing strategies by gaining insight into which items are frequently purchased together by customers.

Possible Marketing Strategies In one strategy, items that are frequently purchased together can be placed in proximity in order to further encourage the sale of such items together. Market basket analysis can also help retailers plan which items to put on sale at reduced prices. If customers tend to purchase computers and printers together, then having a sale on printers may encourage the sale of printers as well as computers.

If we think of the universe as the set of items available at the store, then each item has a Boolean variable representing the presence or absence of that item. Each basket can then be represented by a Boolean vector of values assigned to these variables. The Boolean vectors can be analyzed for buying patterns that reflect items that are frequently associated or purchased together. These patterns can be represented in the form of association rules.

For example, the information that customers who purchase computers also tend to buy antivirus software at the same time is represented in Association Rule below: Computer=>antivirus_software[support=2% confidence =60%] (1) Rule support and confidence are two measures of rule interestingness. They respectively reflect the usefulness and certainty of discovered rules. A support of 2% for Association Rule (1) means that 2% of all the transactions under analysis show that computer and antivirus software are purchased together. A confidence of 60% means that if a customer buys a computer, there is 60% chance that he will buy antivirus as well.

Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold. Such thresholds can be set by users or domain experts

Application2 Data Mining &DNA data analysis a great deal of biomedical research has focused on DNA data analysis. Recent research in DNA analysis has led to the discovery of genetic causes for many diseases and disabilities, as well as the discovery of new medicine and approaches for disease diagnosis, prevention, and treatment.

An important focus in genome research is the study of DNA sequences since such sequences form the foundation of the genetic codes of all living organisms. All DNA sequences comprise four basic building blocks (called nucleotides): adenine(A), cytosine(C), guanine(G), and thymine(T). These four nucleotides are combined to form long sequences or chains that resemble a twisted ladder.

DNA structure

Human beings have around 100,000 genes. Most diseases are not triggered by a single gene but by a combination of genes acting together. Association analysis methods can be used to help determine the kinds of genes that are likely to co-occur in target samples. Such analysis would facilitate the discovery of groups of genes and the study of interactions and relationships between them.

Thank you