Data Mining Motivation: “Necessity is the Mother of Invention”

Slides:



Advertisements
Similar presentations
Introduction to KDD.
Advertisements

Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
26 August 20151Data Mining 27/Sep/2008. Evolution of Database technology YEARPURPOSE 1960’sNetwork Model, Batch Reports 1970’sRelational data model, Executive.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Artificial Neural Network Applications on Remotely Sensed Imagery Kaushik Das, Qin Ding, William Perrizo North Dakota State University
Chapter 1. Introduction Motivation: Why data mining?
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
1 Knowledge Discovery from DataBases (KDD) A.K.A. Data Mining & by other names as well Carlo Zaniolo UCLA CS Dept.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Peano Count Trees and Association Rule Mining for Gene Expression Profiling using DNA Microarray Data Dr. William Perrizo, Willy Valdivia, Dr. Edward Deckard,
Academic Year 2014 Spring Academic Year 2014 Spring.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
There is an inherent meaning in everything. “Signs for people who can see.”
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
By Arijit Chatterjee Dr
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees Qiang Ding Qin Ding * William Perrizo Department of Computer Science.
MIS 451 Building Business Intelligence Systems
Data warehouse & Data Mining: Concepts and Techniques
Introduction C.Eng 714 Spring 2010.
Data and Applications Security Introduction to Data Mining
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 3 Introduction to Data Mining
Introduction to Data Mining
Data Mining: Concepts and Techniques
Data Warehousing and Data Mining
Data Mining Introduction
Smart Portal To Protect Child Online
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining: Introduction
Understanding Customer Behaviors with Information Technologies
Data Mining: Concepts and Techniques
The P-tree Structure and its Algebra Qin Ding Maleq Khan Amalendu Roy
CSE591: Data Mining by H. Liu
Presentation transcript:

Data Mining Motivation: “Necessity is the Mother of Invention” Automated data collection tools and mature database technology have led to tremendous amounts of stored data. We are drowning in data, but starving for knowledge! Solution: Data mining Extract interesting rules, patterns, constraints) (reduce volume, raise information/knowledge levels)

What Is Data Mining? Data mining: Alternative names: Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Alternative names: Knowledge discovery in dbs (KDD), knowledge extraction, data/pattern analysis, data prospecting, data archeology, data dredging, information harvesting, business intelligence, etc. What is not data mining? (Deductive) query processing.

Applications Database analysis and decision support Other Applications Market analysis and management target marketing, customer relation management, market basket analysis, market segmentation Risk analysis and management Forecasting, customer retention, improved underwriting, quality control, competitive analysis Fraud detection and management Other Applications Text mining (news group, email, documents) and Web analysis. Intelligent query answering

More Applications Sports Astronomy Internet Web Surf-Aid IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat Astronomy 22 quasars discovered with the help of data mining Internet Web Surf-Aid IBM Surf-Aid applies data mining algorithms to Web access logs to discover customer preference and behaviors, analyzing effectiveness of Web marketing, improving Web site organization, etc.

Data Mining: A KDD Process Knowledge Data mining: the core of the knowledge discovery process. Pattern Evaluation Data Mining Classification Clustering ARM Task-relevant Data Data Warehouse Selection Data Cleaning/ Integration: missing data, outliers, noise, errors Feature extraction, attribute selection Databases

Association Rule Mining: The “Walmart” Example Rule: {Diaper, Milk} => Beer (Diaper, Milk, Beer} Support = = 0.4 |D| Confidence = = 0.66 (Diaper, Milk} TID Items 1 Bread, Milk 2 Beer, Diaper, Bread, Eggs 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Bread, Diaper, Milk

Precision Ag example: Find image antecedents that imply high yield TIFF image Yield Map High Green reflectance  High Yield (obvious) High (NearInfraRed – Red)  High Yield (higher confidence)

Grasshopper Infestation Prediction Grasshopper caused significant economic loss last year. These insects are likely to visit again this year. Early prediction of the infestation is a key step to decrease damage. Association rule mining on remotely sensed imagery holds significant potential to achieve early detection. How do we signature initial infestation from RGB bands???

Gene Regulation Pathway Discovery example Results of clustering may indicated that nine genes are involved in a pathway. High confident rule mining on that cluster will discover the relationships among the genes in which the expression of one gene (e.g., Gene2) is regulated by others. Other genes (e.g., Gene4 and Gene7) may not be directly involved in regulating Gene2 and can therefore be excluded. Gene1 Gene2, Gene3 Gene4, Gene 5, Gene6 Gene7, Gene8 Gene9 Clustering ARM Gene4 Gene7 Gene1 Gene3 Gene8 Gene6 Gene9 Gene5 Gene2

Data Mining: Confluence of Multiple Disciplines Database Technology Statistics Data Mining Machine Learning Visualization Information Science Other Disciplines

Spatial Data Formats (Cont.) BAND-1 54 127 (1111 1110) (0111 1111) 4 193 (0000 1110) (1100 0001) BAND-2 7 240 (0010 0101) (1111 0000) 00 19 (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19

Spatial Data Formats (Cont.) BAND-1 54 127 (1111 1110) (0111 1111) 4 193 (0000 1110) (1100 0001) BAND-2 7 240 (0010 0101) (1111 0000) 00 19 (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19

Spatial Data Formats (Cont.) BAND-1 54 127 (1111 1110) (0111 1111) 4 193 (0000 1110) (1100 0001) BAND-2 7 240 (0010 0101) (1111 0000) 00 19 (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19 BIP format (1 file) 254 37 127 240 14 200 193 19

Spatial Data Formats (Cont.) BAND-1 54 127 (1111 1110) (0111 1111) 4 193 (0000 1110) (1100 0001) BAND-2 7 240 (0010 0101) (1111 0000) 00 19 (1100 1000) (0001 0011) BSQ format (2 files) Band 1: 254 127 14 193 Band 2: 37 240 200 19 BIL format (1 file) 254 127 37 240 14 193 200 19 BIP format (1 file) 254 37 127 240 14 200 193 19 bSQ format (16 files) B11 B12 B13 B14 B15 B16 B17 B18 B21 B22 B23 B24 B25 B26 B27 B28 1 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1

Peano Count Tree (P-tree) P-trees are a lossless representation of data in a compressed, recursive quadrant-orientation. NDSU holds patents on P-tree Technology

An example of Ptree Peano or Z-ordering quadrant Root Count 55 16 8 15 3 4 1 55 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 3 1 16 15 16 8 1 4 1 4 3 4 4 1 1 Peano or Z-ordering quadrant Root Count

An example of Ptree Level Pure (Pure-1/Pure-0) quadrant Fan-out 001 55 16 8 15 3 4 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 2 3 2 3 2 . 2 . 3 111 Level Fan-out QID (Quadrant ID) Pure (Pure-1/Pure-0) quadrant Root Count ( 7, 1 ) ( 111, 001 ) 10.10.11

Tuple Count Cube (T-cube) The (v1,v2,v3)th cell of the T-cube contains the Root Count of P(v1,v2,v3) = P1,v1 AND P2,v2 AND P3,v3

High confidence Association Rules Assume minimum confidence threshold 80%, minimum support threshold 10% Start with 1-bit values and 2 bands, B1 and B2 30 34  sums 24 27.2  thresholds 5 19 25 15 1,0 1,1 2,0 2,1 32 40 19.2 24 C: B1={0} => B2={0} c = 83.3%

The End                  Thank you |:~)