Presentation is loading. Please wait.

Presentation is loading. Please wait.

Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Targeted Projection Pursuit for Microarray Data Analysis Joe Faith Northumbria University.

Similar presentations


Presentation on theme: "Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Targeted Projection Pursuit for Microarray Data Analysis Joe Faith Northumbria University."— Presentation transcript:

1 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Targeted Projection Pursuit for Microarray Data Analysis Joe Faith Northumbria University

2 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Outline 1.Analysing High-Dimensional Array Data 2.Dimension-Reduction Techniques 3.Targeted Projection Pursuit 4.Experimental Results

3 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Array Data New array technologies producing floods of quantitative data –cDNA and oligonucleotide –Protein arrays –Combinatorial chemistry arrays –Tissue arrays Typically dozens of samples x thousands of genes (or other attributes)

4 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Array Analysis Tasks In case of classified data (samples of known diagnostic classes, eg cancer tumours) –spot clusters in data –spot outliers –classify new cases into existing classes –genetic profiles, feature selection, finding markers for particular conditions Similar problems with time series / sequential data –Genome-wide study of transcription and regulation

5 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Array Analysis Techniques Lots of techniques borrowed from statistics, machine learning, data mining. Tend to be complicated and ‘opaque’ Want to find ways to allow experimenter to: –Visualise / communicate –Explore –Hypothesis formation

6 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Statistical Problems Nature of data presents many statistical problems: –Normalisation –Control of variance –Determining significance –Determining reliability ‘high p, low n’ Will ignore all these!

7 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Suppose we had just 2 genes… Gene A Gene B Clusters, classifications, outliers, correlations etc are then immediately obvious

8 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, 3D Scatter Plots Gene A Gene B Gene C

9 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, ‘Virtual Reality’ 3D Scatter Plots Angelova et al, 2005

10 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Dimension Reduction Techniques But what about p=4, 5, … 1000?? Need some way of visualising and exploring higher dimensional ‘space’ in 2D

11 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Hierarchical Clustering Produce a dendrogram based on sample/gene distances, and optimise order for display But single dimension obscures many relationships Eisen et al, 1998

12 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Multi-Dimensional Scaling Finds best possible 2D representation of data points (ie preserve distances between points) Eg Sammon’s Mapping (Ewing et al, 2001)

13 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, But… ‘curse of dimensionality’ spreads points Not projection based, so cannot visualise position of new unclassified samples No indication of particular stresses

14 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Linear Projection Based Methods Find a 2D ‘window’ through which we view the multidimensional data The position of the window then contains useful information about, eg, respective significance of particular genes Principal Components Analysis (Yeung, 2001) –Find view (window position) that best spreads the data Projection Pursuit –Find projections best suited for particular purposes, such as separating classifications (Lee, 2005)

15 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Grand Tours But each of these only show a single view out of many So ‘Grand Tours’ show a video of all possible views (Asimov, 1985) Grand Tours in high dimensions are mostly uninformative; and make it hard to interpret data

16 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Manual Controls Try using manual controls to alter projections? Controls are ‘opaque’: user has no intuition about the effect their actions will have Eg Xgobi (Cook, 97)

17 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Targeted Projection Pursuit The intuition: –Allow user to manipulate view of data directly –Computer then tries to find view that best matches ‘target’

18 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Quantitative Evaluation Task: find a view of a data set that best shows classifications Data: publicly available gene expression data sets of diagnosed cancer tissues Method: compare resulting views with standard techniques for degree of class separation

19 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Data LEUK50: Gene expression in two types of acute leukemia: acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) [Gol99]. 38 cases of B-cell ALL, 9 cases of T-cell ALL, and 25 cases of AML. Expression levels of 7219 genes SRBCT50: cDNA microarray analysis of small, round blue cell childhood tumors (SRBCT), including neuroblastoma (NB), rhabdomyosarcoma (RMS), Burkitt Lymphoma (BL; a subset of non- Hodgkin lymphoma) and members of Ewing’ family of tumors (EWS). 6567 genes for 83 samples [Kha01]. NCI50: 60 cell lines from the National Cancer Institute's anticancer drug screen [Sch00]: 9 breast, 5 central nervous system (CNS), 7 colon, 6 leukemia, 8 melanoma, 9 non-small-cell lung carcinoma (NSCLC), 6 ovarian, 2 prostate, 8 renal. 9703 cDNA sequences.

20 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, DR Techniques TPP: Targeted Projection Pursuit PP: Projection Pursuit (computer search for optimal view) SAM: Sammon Mapping VS: VizStruct non-linear projection based on radial coordinates

21 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Metrics I LDA : Linear Discriminant Analysis Index (Lee 05) 5NN: Generalisation performance of K- Nearest Neighbours Classifier

22 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Results DataLEUKSRBCTNCI Metric I LDA 5NNI LDA 5NNI LDA 5NN DR TPP.997100.999100.99496.7 PP.97298.6.988100.98162.3 SAM.95997.2.91195.2.92767.2 VS.95295.8.63756.6.83832.8 Joe Faith, Robert Mintram, Maia Angelova (2006), "Targeted Projection Pursuit for Visualising Gene Expression Data Classifications", BioInformatics (forthcoming). Joe Faith, Michael Brockway (2006), "Targeted Projection Pursuit Tool for Gene Expression Visualisation", Journal of Integrative Biology, (forthcoming).

23 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, LEUK Data Views

24 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, SRBCT Data Views

25 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, NCI Data Views

26 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Classifier Construction Find a view in which all classes are clearly separated: –Components of projection then define combination of genes to define classification –Can order by significance to find a subset of relevant genes

27 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Outlier Detection See LEUK data, outliers between ALL/T and ALL/B See which potential outliers move with the rest of the samples of that class

28 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Gene Identification Separate each class in turn from the remainder of the data. The most significant genes in this separation can then be found NCI data: –Human melanoma antigen recognized by T-cells (MART-1) mRNA Chr.9 selects for Melanoma samples [Coulie 94] –Desmoplakin gene selects ovarian cancer cases [Adams 06] SRBCT data: –CD83 selects Burkitt's Lymphoma samples [Dudziak 03]

29 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Future Work Quantitatively evaluate TPP on other tasks Develop tool to: –Handle wider range of data formats –Display time series / sequential data –Integrate with biological workflows: Standard gene lists Click-through to gene ontologies and DBs Work with biologists to trial tool and get feedback

30 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, References Maia Angelova, D. Ivanov and H. Yasrebi. Classification and visualisation of E.coli genes from microarray experiments, poster presentation, MASAMB05, March, Rothamsted Research, Harpenden, UK www.rothamsted.bbsrc.ac.uk/bab/masamb/posters/MAngelova.pdf Eisen,M.B., Spellman,P.T., Brown,P.O., and Botstein,D. (1998) Cluster analysis and display of genome-wide expression patterns, PNAS 95:25, 14863-14868 Ewing,R.M. and Cherry,J.M. (2001) Visualisation of expression clusters using Sammon's non-linear mapping. Bioinformatics, 17,658- 659. K.Y.Yeung and W.L.Ruzzo, Principal Components Analysis for clustering gene expression data, Bioinformatics 17 (9) 763-774 (2001) Lee,E.K, Cook,D., Klinke,S. and Lumley,T. (2005), Projection Pursuit for Exploratory Supervised Classification, Journal of Computational and Graphical Statistics, 14(4), 831-846 Asimov, D. (1985). The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing 6(1), 128 -- 11.

31 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, D. Cook, and A. Buja (1997), Manual Controls for High-Dimensional Data Projections J. Computational and Graphical Statistics, vol. 6, no. 4, pp. 464-480. Golub,T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasenbeek,M., Mesirov,J.P., Coller,H., Loh,M.L., Downing,.J.R., Caligiuri,M.A., Bloomfield,C.D., Lander,E.S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science,286(5439):531-7. Scherf,U., Ross,D.T., Waltham,M., Smith,L.H., Lee,J.K., Tanabe,L., Kohn,K.W., Reinhold,W.C., Myers,T.G., Andrews,D.T., Scudiero,D.A., Eisen,M.B., Sausville,E.A., Pommier,Y., Botstein,D., Brown,P.O., and Weinstein,J.N. (2000) A Gene Expression Database for the Molecular Pharmacology of Cancer, Nature Genetics, 24(3), 236-244. Khan,J., Wei,J.S., Ringnér,M., Saal,L.H., Ladanyi,M., Westermann,F., Berthold,F., Schwab,M., Antonescu,C.R., Peterson,C., and Meltzer,P.S. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6): 673--679.

32 Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Coulie,PG, et al, (1994) A new gene coding for a differentiation antigen recognized by autologous cytolytic T lymphocytes on HLA-A2 melanomas, J Exp Med. Jul 1;180(1):35-42 Dudziak et al (2003) Latent Membrane Protein 1 of Epstein-Barr Virus Induces CD83 by the NF-?B Signaling Pathway, J Virol; 77(15): 8290--8298. Adams et al (2006) Meningothelial meningioma in a mature cystic teratoma of the ovary, Pathologe Mar 23


Download ppt "Targeted Projection Pursuit, Joe Faith, Northumbria University, v1.1, Targeted Projection Pursuit for Microarray Data Analysis Joe Faith Northumbria University."

Similar presentations


Ads by Google