Data and Applications Security Introduction to Data Mining

Slides:



Advertisements
Similar presentations
Data Mining: What? WHY? HOW?
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data warehouse example
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining Knowledge Discovery in Databases Data 31.
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Data Mining: A Closer Look
Data Mining.
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Introduction to Biometrics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #3 Information Management and Data Mining August 29, 2005.
Chapter 1 Introduction to Data Mining
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Introduction to Data, Information and Knowledge Management Dr. Bhavani Thuraisingham The University of Texas at Dallas Data, Information and Knowledge.
Data Mining for Security Applications Dr. Bhavani Thuraisingham The University of Texas at Dallas January 2006.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining and Decision Support
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #17 Data Warehousing, Data.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data and Applications Security Developments and Directions
Introduction C.Eng 714 Spring 2010.
Introduction to Data, Information and Knowledge Management
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
Analyzing and Securing Social Networks
Data Mining Concepts and Techniques
Course Introduction CSC 576: Data Mining.
Data Mining: Introduction
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
Presentation transcript:

Data and Applications Security Introduction to Data Mining Dr. Bhavani Thuraisingham Guest Lecture February 25, 2008

Objective of the Unit This unit provides an introduction to data mining

Outline of Data Mining What is Data Mining? Data warehousing vs data mining Steps to Data Mining Need for Data Mining Example Applications Technologies for Data Mining Why Data Mining Now? Preparation for Data Mining Data Mining Tasks, Methodology, Techniques Commercial Developments Status, Challenges , and Directions

What is Data Mining? Information Harvesting Knowledge Mining Knowledge Discovery in Databases Data Archaeology Data Dredging Database Mining Knowledge Extraction Data Pattern Processing Information Harvesting Siftware The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data, often previously unknown, using pattern recognition technologies and statistical and mathematical techniques (Thuraisingham 1998)

Data Warehouses vs Data Mining Goal: Improved business efficiency Improve marketing (advertise to the most likely buyers) Inventory reduction (stock only needed quantities) Information source: Historical business data Example: Supermarket sales records Size ranges from 50k records (research studies) to terabytes (years of data from chains) Data is already being warehoused Sample question – what products are generally purchased together? The answers are in the data, need to MINE the data

What Does Warehousing do for Data Mining? Difficult to mine disparate data sources Data warehouse integrates the disparate data sources into a single logical entity Maintains integrity of the data Scrubbing and Cleaning Formats the data for querying and mining Multidimensional data

Is it Necessary to Have a Data Warehouse for Data Mining? Key to successful data mining is having good data Data warehousing integrates heterogeneous data sources, formats the data, and facilitates interactive query processing Having a data warehouse is good for data mining, but perhaps not essential Data mining tools could be used directly on good/clean databases

What’s going on in data mining? What are the technologies for data mining? Database management, data warehousing, machine learning, statistics, pattern recognition, visualization, parallel processing What can data mining do for you? Data mining outcomes: Classification, Clustering, Association, Anomaly detection, Prediction, Estimation, . . . How do you carry out data mining? Data mining techniques: Decision trees, Neural networks, Market-basket analysis, Link analysis, Genetic algorithms, . . . What is the current status? Many commercial products mine relational databases What are some of the challenges? Mining unstructured data, extracting useful patterns, web mining, Data mining, national security and privacy

Steps to Data Mining Clean/ Mine Integrate modify the data data data sources Mine the data Integrate data sources Report final results Take Actions Examine Results/ Prune results Data Sources

Knowledge Directed to Data Mining Mine the data Clean/ modify data sources Integrate data sources Expert System Report final results Take Actions Examine Results/ Prune results Data Sources

Need for Data Mining Large amounts of current and historical data being stored As databases grow larger, decision-making from the data is not possible; need knowledge derived from the stored data Data for multiple data sources and multiple domains Medical, Financial, Military, etc. Need to analyze the data Support for planning (historical supply and demand trends) Yield management (scanning airline seat reservation data to maximize yield per seat) System performance (detect abnormal behavior in a system) Mature database analysis (clean up the data sources)

Example Applications Medical supplies company increases sales by targeting certain physicians in its advertising who are likely to buy the products A credit bureau limits losses by selecting candidates who are likely not to default on their payment An Intelligence agency determines abnormal behavior of its employees An investigation agency finds fraudulent behavior of some people

Integration of Multiple Technologies Data Warehousing Machine Learning Database Management Statistics Parallel Processing Visualization Data Mining

Why Data Mining Now? Large amounts of data is being produced Data is being organized Technologies are developing for database management, data warehousing, parallel processing, machine intelligent, etc. It is now possible to mine the data and get patterns and trends Interesting applications exist

Preparation for Data Mining Getting the data into the right format Data warehousing Scrubbing and cleaning the data Some idea of application domain Determining the types of outcomes e.g., Clustering, classification Evaluation of tools Getting the staff trained in data mining

Some Types of Data Mining (Data Mining Tasks/Outcomes) Classification – grouping records into meaningful subclasses e.g., Marketing organization has a list of people living in Manhattan all owning cars costing over 20K Sequence Detection John always buys groceries after going to the bank Data dependency analysis – identifying potentially interesting dependencies or relationships among data items If John, James, and Jane meet, Bill is also present Deviation detection – discovery of significant differences between an observation and some reference Anomalous instances Discrepancies between observed and expected values

Data Mining Methodology (or Approach) Top-down Hypothesis testing Validate beliefs Bottom-up Discover patterns Directed Some idea what you want to get Undirected Start from fresh

Some Data Mining Techniques Market Basket analysis Decision Trees Neural networks Rough sets and fuzzy logic Inductive logic programming

Commercial Developments in Data Mining: Some Early Products Information Discovery-IDIS WizSoft - WhizWhy Hugin - Hugin IBM - Intelligent Miner Red Brick – DataMind (became part of Informix and now part of IBM) Neo Vista - Decision Series Reduct Systems - Datalogic/R Lockheed Martin - Recon Nicesoft – Nicel SAS – Enterprise Miner Recent products will be discussed in Unit #9

Current Status, Challenges and Directions Data Mining is now a technology Several prototypes and tools exist; Many or almost all of them work on relational databases Challenges Mining large quantities of data; Dealing with noise and uncertainty Directions Mining multimedia and text databases, Web mining (structure, usage and content), Data mining, national security and privacy