Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January 31 2005.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Godfather to the Singularity
Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Lecture 1: Introduction to Data Mining for Bioinformatics Fall 2005 Peter van der Putten (putten_at_liacs.nl) Databases and Data Mining.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining Knowledge Discovery: An Introduction
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining: An Introduction Billy Mutell. “The Library of Babel” Analogy Network of bookshelves with every book ever written All the books one could.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
AN INTELLIGENT AGENT is a software entity that senses its environment and then carries out some operations on behalf of a user, with a certain degree of.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
CSC 562: Final Project Dave Pizzolo Artificial Neural Networks.
Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DATA MINING © Prentice Hall.
School of Computer Science & Engineering
CH. 1: Introduction 1.1 What is Machine Learning Example:
Molecular Classification of Cancer
What is Pattern Recognition?
K Nearest Neighbor Classification
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A task of induction to find patterns
Presentation transcript:

Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January

Topics Lecture Demo Data Mining tool Exercises Data Mining tool Breaks TBD

Genomic Microarrays – Case Study Problem: –Leukemia (different types of Leukemia cells look very similar) –Given data for a number of samples (patients), can we Accurately diagnose the disease? Predict outcome for given treatment? Recommend best treatment? Solution –Data mining on micro-array data

Example: ALL/AML data 38 training patients, 34 test patients, ~ 7,000 patient attributes (microarry gene data) 2 Classes: Acute Lymphoblastic Leukemia (ALL) vs Acute Myeloid Leukemia (AML) Use train data to build diagnostic model ALLAML Results on test data: 33/34 correct, 1 error may be mislabeled

Sources of (artificial) intelligence Reasoning versus learning Learning from data –Patient data –Customer records –Stock prices –Piano music –Criminal mugshots –Websites –Robot perceptions –Etc.

Biomedical applications & data General population survey data Clinical studies Patient characteristics Imaging Lab tests Proteomics / genomics –Relating proteins / genes structure to biological functions Medical research literature ….

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –The process of discovery of interesting, meaningful and actionable patterns hidden in large amounts of data Multidisciplinary field originating from artificial intelligence, pattern recognition, statistics, machine learning, bioinformatics, econometrics, ….

Some working definitions…. Concepts: kinds of things that can be learned –Aim: intelligible and operational concept description –Example: the relation between patient characteristics and the probability to be diabetic Instances: the individual, independent examples of a concept –Example: a patient, candidate drug etc. Attributes: measuring aspects of an instance –Example: age, weight, lab tests, microarray data etc Pattern or attribute space

Data mining tasks Predictive data mining –Classification: classify an instance into a category –Regression: estimate some continuous value Descriptive data mining –Matching & search: finding instances similar to x –Clustering: discovering groups of similar instances –Association rule extraction: if a & b then c –Summarization: summarizing group descriptions –Link detection: finding relationships –…

Data Mining Tasks: Search f.e. agef.e. weight Finding best matching instances Every instance is a point in pattern space. Attributes are the dimension of an instance, f.e. Age, weight, gender etc. Pattern spaces may be high dimensional (10 to thousands of dimensions)

Data Mining Tasks: Classification ageweight Goal classifier is to seperate classes on the basis of known attributes The classifier can be applied to an instance with unknow class For instance, classes are healthy (circle) and sick (square); attributes are age and weight

Data Mining Tasks: Clustering f.e. agef.e. weight Clustering is the discovery of groups in a set of instances Groups are different, instances in a group are similar In 2 to 3 dimensional pattern space you could just visualise the data and leave the recognition to a human end user

Data Mining Tasks: Clustering f.e. agef.e. weight Clustering is the discovery of groups in a set of instances Groups are different, instances in a group are similar In 2 to 3 dimensional pattern space you could just visualise the data and leave the recognition to a human end user In >3 dimensions this is not possible

Examples of Classification Techniques Majority class vote Machine learning & AI Decision trees Nearest neighbor Neural networks Genetic algorithms / evolutionairy computing Artificial Immune Systems Good old statistics …..

Example Classification Algorithm 1 Decision Trees patients age > patients gender = male? 1200 patients Weight > 85kg 800 customers Diabetic (%10) etc. 400 patients Diabetic (%50) no yes no

Decision Trees in Pattern Space ageweight Goal classifier is to seperate classes (circle, square) on the basis of attribute age and income Each line corresponds to a split in the tree Decision areas are ‘tiles’ in pattern space

Decision Trees in Pattern Space ageweight Goal classifier is to seperate classes (circle, square) on the basis of attribute age and income Each line corresponds to a split in the tree Decision areas are ‘tiles’ in pattern space

Special Cases of Decision Trees Depth = 0 –Majority class classifier (ZeroR) Depth = 1 –One question only –Also known as decision stump Depth = n –Any amount of branches Various algorithms exist to learn the tree from data –Major difference is criterion to determine on what attribute value to split

Example classification algorithm 2: Nearest Neighbour Data itself is the classification model, so no abstraction like a tree etc. For a given instance x, search the k instances that are most similar to x Classify x as the most occurring class for the k most similar instances

= new instance Any decision area possible Condition: enough data available Nearest Neighbor in Pattern Space Classification fe agefe weight

Nearest Neighbor in Pattern Space Voorspellen f.e. agebvb. weight Any decision area possible Condition: enough data available

Example classification algorithm 3: Neural Networks Inspired by neuronal computation in the brain (McCullough & Pitts 1943 (!)) Input (attributes) is coded as activation on the input layer neurons, activation feeds forward through network of weighted links between neurons and causes activations on the output neurons (for instance diabetic yes/no) Algorithm learns to find optimal weight using the training instances and a general learning rule.

Example simple network (2 layers) Probability of being diabetic = f (age * weight age + body mass index * weight body mass index) Neural Networks Weight body mass index Probability of being diabetic age body_mass_index weight age

Neural Networks in Pattern Space Classification f.e. agef.e. weight Simpel network: only a line available (why?) to seperate classes Multilayer network: Any classification boundary possible

e Decision Tree Demo in WEKA, An open source mining tool

Descriptive data mining: association rules Discovery of interesting patters Rule format: if A (and B and C etc) then Z Example: –If customer buys potatoes (A) and sauerkraut (B) then customer buys sausage (Z) Belangrijke maten –Support condition: how often do potatoes and sauerkraut occur together (A,B) –Confidence rule: how often do sausages then occur / support conditions (is A,B  C always true?)

e Associatie rule demo in WEKA

Some examples of my research Using data mining for bio-medical applications –Predicting Survival Rate for Throat Cancer Patients –… Using bio-medical concepts for data mining –Artificial immune systems, learning computers based on the metaphor of the natural immune systems

What have we learned so far? Learning versus reasoning Data mining definitions Data mining tasks Example data mining techniques for classification Example data mining techniques for association rules WEKA Demos And now: lab sessions