Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.

Slides:



Advertisements
Similar presentations
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University Introduction (based on.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Week 9 Data Mining System (Knowledge Data Discovery)
1 Data Mining Techniques Instructor: Ruoming Jin Fall 2006.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
1 CSE591 (575) Data Mining 1/21/ /6/2003 Computer Science & Engineering ASU.
Data Mining – Intro.
Data mining By Aung Oo.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Ch. Eick: Course Information COSC Introduction --- Part2 1. Another Introduction to Data Mining 2. Course Information.
Ch. Eick: Introduction Data Mining and Course Information 1 Introduction --- Part2 1. Another Introduction to Data Mining 2. Course Information.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
WPI Center for Research in Exploratory Data and Information Analysis From Data to Knowledge: Exploring Industrial, Scientific, and Commercial Databases.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Mining GyuHyeon Choi. ‘80s  When the term began to be used  Within the research community.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
2 Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion Objectives, Prerequisite and.
1 SHIM 413 Database Applications for Healthcare Fall 2006 Slides by H. T. Bao.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD) 1 Knowledge Discovery in Data [and Data Mining] (KDD) Let us find something interesting!
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
MIS2502: Data Analytics Advanced Analytics - Introduction.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
CENG 770. Data mining (knowledge discovery from data) – Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful)
CSC 4740 / 6740 Fall 2016 Data Mining Instructor: Yubao Wu Fall 2016.
Book web site:
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Why Data Mining? What Is Data Mining?
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
Eick: Introduction Machine Learning
Introduction to Data Mining
DATA MINING BY: PRADEEP AGRAWAL MBA (SEC – A) ALLIANCE UNIVERSITY – SCHOOL OF BUSINESS.
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Introduction C.Eng 714 Spring 2010.
Data Mining: Concepts and Techniques Course Outline
Promising “Newer” Technologies to Cope with the
Introduction --- Part2 Another Introduction to Data Mining
Data Mining: Concepts and Techniques — Slides for Textbook —
CSE591: Data Mining by H. Liu
Data Warehousing and Data Mining
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Course Introduction CSC 576: Data Mining.
Data Mining: Introduction
Dept. of Computer Science University of Liverpool
Introduction to KDD: Knowledge Discovery in Databases and Data Mining
Christoph F. Eick: A Gentle Introduction to Machine Learning
Big DATA.
Welcome! Knowledge Discovery and Data Mining
CSCE 4143 Section 001: Data Mining Spring 2019.
CSE591: Data Mining by H. Liu
Promising “Newer” Technologies to Cope with the
First 2-3 Lectures (Intro to DS/DM)
Presentation transcript:

Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING

“Non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” [Fayyad et al. 1996] Raw Data Data Mining Patterns Analytical Patterns (rules, decision trees) Statistical Patterns (data distribution) Visual Patterns Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp Fall WHAT IS DATA MINING? OR MORE GENERALLY, KNOWLEDGE DISCOVERY IN DATABASES (KDD)

NEED FOR DATA MINING Data are being gathered and stored extremely fast Computational tools and techniques are needed to help humans in summarizing, understanding, and taking advantage of accumulated data

DATA ANALYSIS (KDD)PROCESS data sources data analysis data mining analytical  statistical visual models model/patterns deployment prediction decision support new data data management databases data warehouses “good” model model/pattern evaluation quantitative qualitative data “pre”- processing noisy/missing data dim. reduction clean data

Machine Learning (AI) Contributes (semi-)automatic induction of empirical laws from observations & experimentation Statistics Contributes language, framework, and techniques Pattern Recognition Contributes pattern extraction and pattern matching techniques Databases Contributes efficient data storage, data cleansing, and data access techniques Data Visualization Contributes visual data displays and data exploration High Performance Comp. Contributes techniques to efficiently handling complexity Application Domain Contributes domain knowledge KDD IS INTERDISCIPLINARY TECHNIQUES COME FROM MULTIPLE FIELDS

Confirmatory (verification) Given a hypothesis, verify its validity against the data Exploratory (discovery) Prescriptive patterns Patterns for predicting behavior of newly encountered entities Descriptive patterns Patterns for presenting the behavior of observed entities in a human-understandable format DATA MINING MODES

WHAT DO YOU WANT TO LEARN FROM YOUR DATA? KDD APPROACHES Data classification regression clustering summarization dependency/assoc. analysis change/deviation detection IF a & b & c THEN d & k IF k & a THEN e IF A & B THEN IF A & D THEN A B C D A, B -> C 80% C, D -> A 22%

COMMERCIAL DATA MINING SYSTEMS Matlab Oracle data mining and lots more ….

WEKA Frank et al., University of Waikato, New Zealand ACADEMIC DATA MINING SYSTEMS RapidMiner Klinkenberg et al., Univ. of Dortmund, Germany R Programming Language Ross Ihaka and Robert Gentleman, Univ. of Auckland, New Zealand and many more ….

DATA MINING RESOURCES – JOURNALS Data Mining and Knowledge Discovery Journal Newsletters: ACM SIGKDD Explorations Newsletter Related Journals: TKDE: IEEE Transactions in Knowledge and Data Engineering TODS: ACM Transaction on Database Systems JACM: Journal of ACM Data and Knowledge Engineering JIIS: Intl. Journal of Intelligent Information Systems

DATA MINING RESOURCES – CONFERENCES KDD: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining ICDM: IEEE International Conference on Data Mining, SIAM International Conference on Data Mining PKDD: European Conference on Principles and Practice of Knowledge Discovery in Databases PAKDD Pacific-Asia Conference on Knowledge Discovery and Data Mining DaWak: Intl. Conference on Data Warehousing and Knowledge Discovery Related Conferences: ICML: Intl. Conf. On Machine Learning IDEAL: Intl. Conf. On Intelligent Data Engineering and Automated Learning IJCAI: International Joint Conference on Artificial Intelligence AAAI: American Association for Artificial Intelligence Conference SIGMOD/PODS: ACM Intl. Conference on Data Management ICDE: International Conference on Data Engineering VLDB: International Conference on Very Large Data Bases

DATA MINING RESOURCES – BOOKS, DATASETS, … See resources webpage at:

SUMMARY KDD is the “non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” The KDD process includes data collection and pre-processing, data mining, and evaluation and validation of those patterns Data mining is the discovery and extraction of patterns from data, not the extraction of data Important challenges in data mining: privacy, security, scalability, real- time, and handling non-conventional data

KDDRG: KNOWLEDGE DISCOVERY AND DATA MINING RESEARCH GROUP KDDRG Meetings WHEN? Fridays at 1 pm WHERE? Beckett Conference Room in Fuller Labs To receive announcements of the talks, please subscribe to the KDDRG mailing list I’ll send you an with instructions on how to do so