Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining in Micro array Analysis

Similar presentations


Presentation on theme: "Data Mining in Micro array Analysis"— Presentation transcript:

1 Data Mining in Micro array Analysis
Classification (Supervised Learning) Finding models (functions) that describe and distinguish classes or concepts for future prediction E.g., predict disease based on gene expression profiles Similar to Prediction: Predict some unknown or missing categorical value rather than a numerical values Presentation: decision-tree, classification rule, neural network Cluster analysis (Unsupervised Learning) Class label is unknown: Group data to form new classes, e.g., cluster genes to find distribution patterns Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity E.g. Group genes based on their gene expression profiles

2 Supervised vs Unsupervised Learning
Classification Unsupervised Clustering unknown number of classes known number of classes based on a training set no prior knowledge used to classify future observations Als dritte Methode werde ich hier etwas über Pattern recognition erzählen. Bei dieser Methode beschäftigt man sich mit Entscheidungsfindungsprozessen. Diese Prozesse will man zuerst verstehen um sie dann mithilfe von Computern zu automatisieren. Die Methode des Pattern recognitions lässt sich in die 2 Klassen supervised und unsupervised unterteilen. In der Kategorie der supervised pattern recognition geht man von einer bekannten Anzahl Klassen aus. Bei der unsupervised PR ist die Anzahl Klassen unbekannt. Supervised PR basiert auf einem sogenannten training set. Dies ist eine Reihe von Beobachtungen, bei denen man die Einteilung in die Klassen bereits kennt. Aufgrund dieser vorher bekannten Zuordnung werden die eigentlichen Beobachtungen mit unbekannter Klasseneinteilung den Klassen zugeordnet. In der Variante der unsupervised PR geht man von keinem a priori Wissen aus. Für die Klassierung von zukünftigen Beobachtungen in vorgegebene Klassen wird supervised PR verwendet. Die Clusteranalyse, wie sie gerade vorgestellt wurde, gilt als eine Form der unsupervised PR. Somit möchte ich nicht weiter auf unsupervised PR eingehen. Im folgenden soll die Form der supervised PR vorgestellt werden. used to understand (explore) data

3 Supervised vs. Unsupervised Learning
* o income debt debt + + + + + + + + + + + + + + + + + + + + + Supervised Learning Unsupervised Learning + debt * o income debt income

4 Classification Training Set Data with unknown classes
Data with known classes Data with unknown classes Class Assignment Classification Technique Classifier

5 Types of Classifiers * income debt Linear Classifier:
Non Linear Classifier: * o income debt debt * * * o o o * * o o * * o * * o * o * o o income a*income + b*debt < t => No loan !

6 Predictive Modelling:
Day Outlook Temperature Humidity Wind Play Tennis 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No Predict categorical class labels Classify data (construct a model) based on the training set and the values (class labels) in a classifying attribute and Use it in classifying new data

7 Classification Learning : Induce classifiers from training data
Task: determine which of a fixed set of classes an example belongs to Input: training set of examples annotated with class values. Output:induced hypotheses (model/concept description/classifiers) Learning : Induce classifiers from training data Inductive Learning System Training Data: Classifiers (Derived Hypotheses) Predication : Using Hypothesis for Prediction: classifying any example described in the same manner Classifier Decision on class assignment Data to be classified

8 Decision Tree: Example
Day Outlook Temperature Humidity Wind Play Tennis 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No Outlook Sunny Overcast Rain Humidity Yes Wind High Normal No Strong Weak

9 Classification: Relevant Gene Identification
Goal: Identify subset of genes that distinguish between treatments, tissues, etc. Method Collect several samples grouped by treatments (e.g. Diseased vs. Healthy) Use genes as “features” Build a classifier to distinguish treatments

10 Gene Expression Example
ID G1 G2 G3 G4 Cancer No No Yes Yes Yes No Yes Yes Yes Yes Yes No Yes No 15 ….. … G1 >22 G3 G4 <=12 >12 No Yes <=52 >52 <=22 Problem: With large number of genes (~10000) Need to use feature selection/reduction techniques


Download ppt "Data Mining in Micro array Analysis"

Similar presentations


Ads by Google