Download presentation
Presentation is loading. Please wait.
1
Claudio Lottaz and Rainer Spang
Decomposing Complex Clinical Phenotypes by Biologically Structured Microarray Analysis Claudio Lottaz and Rainer Spang Berlin Center for Genome Based Bioinformatics, Berlin (Germany) Computational Diagnostics, Max Planck Institute for Molecular Genetics, Berlin (Germany)
2
Overview Introduction
Using functional annotation for semi-supervised classification Heterogeneity vs. performance Evaluation on cancer related data Concluasions 22-Oct-19
3
Tumor Classification Setting: More formally:
Introduction Tumor Classification Patients Genes D C Setting: Data: gene expression profiles Goal: prediction/classification of outcome/sub-type More formally: Many expression levels measured Samples labelled as disease and control Train classifier 22-Oct-19
4
State-of-the-Art Various powerful methods:
Introduction State-of-the-Art Various powerful methods: Support vector machines Shrunken centroids... Regularization to fight overfitting: Feature selection Large margins... Common hypothesis: Generate a single molecular signature 22-Oct-19
5
Introduction Complex Phenotypes A single clinical phenotype may be caused by different molecular mechanisms Our approach: discover several sub-classes in disease group Each sub-class has a homogeneous molecular signature 22-Oct-19
6
Molecular Symptoms Classical signatures are globally optimal
Introduction Molecular Symptoms Classical signatures are globally optimal They have no biological focus Genes are corregulated thus correlated in a global signature genes can be replaced with little loss Molecular Symptom: A functionally focused signature to identify a disease sub-class High specificity – sub-optimal sensitivity 22-Oct-19
7
Molecular Patient Stratification
Introduction Molecular Patient Stratification Patterns of molecular symptoms define a molecular patient stratification Control Subclass Control Subclass Molecular Symptom Control Another Molecular Symptom Diagnostic signature 22-Oct-19
8
Using Functionl Annotations: A Priori vs. A Posteriori
Using Functional Annotations Using Functionl Annotations: A Priori vs. A Posteriori Common procedure Data Functional Annotations Statistical Analysis Data Functional Annotations Statistical Analysis Our suggestion 22-Oct-19
9
Gene Ontology Biological terms in a directed graph
Using Functional Annotations Gene Ontology Biological terms in a directed graph Genes annotated to terms Levels represent specificity of terms 22-Oct-19
10
Structured Analysis of Microarrays
Using Functional Annotations Structured Analysis of Microarrays Classification in leaf nodes Regularized multivariate classifier Local signatures Diagnosis propagation Combine child diagnoses in inner nodes Generate more general diagnoses Regularization Shrink the classifier graph Remove uninformative branches 22-Oct-19
11
Leaf Node Classification
Using Functional Annotations Leaf Node Classification Shrunken centroid classification (Tibshirani et al. 2002) Classificatino according to distance to centroids Regularization via gene shrinkage Determine probability-like values as classification results 22-Oct-19
12
Propagation of Classification
Using Functional Annotations Propagation of Classification Weighted averages Weight according to child performance Weights are normalized per inner node Pa w1 w3 w2 C1 C2 C3 22-Oct-19
13
Graph Shrinkage Weights of nodes are shrunken by a constant
Using Functional Annotations Graph Shrinkage Weights of nodes are shrunken by a constant Negative weights are set to zero uninformative branches vanish Best shrinkage level chosen in cross-validation 22-Oct-19
14
Biased Classifier Evaluation
Heterogeneity vs. Performance Biased Classifier Evaluation Calibration of Sensitivity and Specificity Shrinkage Parameter Worst Performance in Leaf Node Cj = DCi ( j Dj )-1 22-Oct-19
15
Classifier Heterogeneity
Heterogeneity vs. Performance Classifier Heterogeneity Difference between two classifiers: measures inconsistency of classifications Node‘s redundancy: Graph‘s redundancy (K nodes of the shrunken graph) 22-Oct-19
16
Calibration Sensitivity vs. Specificity:
Heterogeneity vs. Performance Calibration Sensitivity vs. Specificity: Best classifiers: set to control prevalence More molecular symptoms: set higher than control prevalence Heterogeneity vs. Performance: Molecular symptoms are heterogeneous Thus high eliminates them 22-Oct-19
17
Leukemia Data Set Data set by Yeoh et al. 2002 Task for illustration
Evaluation on Cancer Related Data Leukemia Data Set Data set by Yeoh et al. 2002 Acute lymphocytic leukemia 327 patients of 7 clinical sub-types Expression profiles by HG-U95Av2 Task for illustration Detect MLL sub-type 20 MLL samples 109 test set / 218 training set 22-Oct-19
18
Functional Annotations
Evaluation on Cancer Related Data Functional Annotations Focus on GO‘s Biological Process branch (8‘173 terms) 12‘625 probesets on the chip 8‘679 genes (68.7% of probesets) In 1‘359 leaf nodes 845 inner nodes (total 2‘204 nodes) 22-Oct-19
19
MLL Classifier 2‘796 genes accessible through 32 nodes
Evaluation on Cancer Related Data MLL Classifier 2‘796 genes accessible through 32 nodes 22-Oct-19
20
Evaluation on Cancer Related Data
MLL Stratification 22-Oct-19
21
Conclusions Semi-supervised classification Functional annotation
Conclustions Conclusions Semi-supervised classification Datect sub-classes In labelled disease groups Functional annotation Use in an a priori fashion To find biologically focused signatures molecular symptoms Resolve complex clinical phenotypes (stratification through molecular symptoms) 22-Oct-19
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.