Download presentation
Presentation is loading. Please wait.
1
Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni
2
Data Mining in Bioinformatics Genetic data from comparative experiments (normal-cancer) Data provided by Dipartimento di morfologia ed embriologia – Università di Ferrara (Dott. Stefano Volinia) Software used: Weka Weka Matlab Matlab MySQL MySQL
3
Microarray Experiments Slide is prepared fixing base sequences (ESTs) in specific points (spots) on the glass Hybridization of two mRNA samples from two cell populations coloured with different fluorescent dyes Scanning the slide, we measure fluorescence intensities of the two channels in each spot
4
Dataset normalization Keep only spots with good intensity in at least 75% of the samples Log ratio: Subtract the median of ratios in each spot Divided by SD of each spot Keep only spots with at least one sample significantly expressed (Log Ratio >1.5) s1s2s3s4ClassCCNC EST10.21.2-2.3-0.7 EST2-1.12.50.70.3 EST30.91.22.3-0.6
5
Datasets analyzed Hepatocellular Carcinoma Reference: artificial mRNA pool Reference: artificial mRNA pool 7449 ESTs for 161 samples 7449 ESTs for 161 samples 95 Cancer 82 HBV+, 3 HCV+, 10 no Hepatitis antibodies 82 HBV+, 3 HCV+, 10 no Hepatitis antibodies 66 Normal 47 HBV+, 5 HCV+, 14 no Hepatitis antibodies 47 HBV+, 5 HCV+, 14 no Hepatitis antibodies Larynx squamous cell carcinoma Reference: normal larynx Reference: normal larynx 7626 ESTs for 22 samples 7626 ESTs for 22 samples 11 lynph node negative (N0) 11 lynph node positive (N+)
6
Supervised/unsupervised learning Supervised learning Decision tree Decision tree Support vector machines Support vector machines Unsupervised learning Hierarchical clustering Hierarchical clustering
7
Results Decision tree Clustering dendrogram 358885 <= 0.719385542 | 740476 <= 0.856739394 | | 626619 <= 0.552788235 | | | 451711 <= -0.84774 | | | | 786690 <= -0.116917241: HBV+ (5.0) | | | | 786690 > -0.116917241: HBV- (4.0) | | | 451711 > -0.84774: HBV+ (107.0/1.0) | | 626619 > 0.552788235 | | | 310406 <= -0.162467: HBV- (6.0) | | | 310406 > -0.162467: HBV+ (12.0/1.0) | 740476 > 0.856739394 | | 344648 <= 0.051885057: HBV- (10.0) | | 344648 > 0.051885057: HBV+ (7.0/1.0) 358885 > 0.719385542: HBV- (10.0/1.0)
8
Gene correlation Analysis of correlation between expression of different genes Study of the expression of every possible couple of genes Study of the expression of every possible couple of genes Computational complexity Computational complexity Integration with extra knowledge Genetic annotation (Gene Ontology) Genetic annotation (Gene Ontology) Chromosome location Chromosome location
9
Intra-gene relations Studying intra-gene relations we can obtain useful results for: Quality control Quality control Different ESTs from the same UGC should be equally expressed A bad correlation between these ESTs may be due to experimental error Chromosomal aberration Chromosomal aberration We can highlight parts of genes that lose correlation Purpose Studying intra-gene relations we can obtain useful results for: Quality control Quality control Different ESTs from the same UGC should be equally expressed A bad correlation between these ESTs may be due to experimental error Chromosomal aberration Chromosomal aberration We can highlight parts of genes that lose correlation UGCESTsPossiblecouplesTotalRel.LIVRel.HCCRel. Hs.31537961512313 Hs.306864510748 Hs.38118446626 Hs.820746202 Hs.38683446313 Hs.38678446000 Hs.35560846010 Hs.23645646102 Hs.16891346202
10
Relations in Processes Study relations between the genes involved in the same biological processes Biological processes as defined by the Gene Ontology Biological processes as defined by the Gene Ontology Highlight differences in gene correlations between normal and cancer Highlight differences in gene correlations between normal and cancer Purpose Studying intra-gene relations we can obtain useful results for: Quality control Quality control Different ESTs from the same UGC should be equally expressed A bad correlation between these ESTs may be due to experimental error Chromosomal aberration Chromosomal aberration We can highlight parts of genes that lose correlation Biological Process Total Rel. Only HCC Only LIV immune response 1098316 regulation of transcription, DNA-dependent 61256 antigen presentation, exogenous antigen 46400
11
Present Activities Development of a web-based interface to make several algorithms available for biologists (PHP, JAVA) Implementation of some algorithms as plug-ins of an open source analysis suite (JAVA) Extension of our algorithms in order to analyze other data sources: SAGE data SAGE data Affymetrix data Affymetrix data
12
Publications Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia. Exploiting supervised and unsupervised learning techniques for profiling cancer data. Presented at Workshop: Data Mining in Functional Genomics and Proteomics in ECAI 2004. Giacomo Gamberoni e Sergio Storari. Supervised and unsupervised learning techniques for profiling SAGE results. Presented at Discovery Challenge in ECML/PKDD 2004.
13
Publications Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia. Correlation of expression between different IMAGE clones from the same UniGene Cluster. Presented in ISBMDA 2004; published in Biological and Medical Data Analysis, Lecture Notes in Computer Science 3337, Springer.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.