Presentation is loading. Please wait.

Presentation is loading. Please wait.

CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.

Similar presentations


Presentation on theme: "CCLE Cancer Cell Line Encyclopedia Alexey Erohskin."— Presentation transcript:

1 CCLE Cancer Cell Line Encyclopedia http://www.broadinstitute.org/ccle Alexey Erohskin

2 Cancer Cell Line Encyclopedia a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 1034 human cancer cell lines – genome-wide human Affymetrix SNP Array 6.0 – mRNA expression data (Affymetrix Human Genome U133 Plus 2.0 ) – Mass spectrometric mutation detection (33 genes) – Solution phase hybrid capture and massively parallel sequencing (1,600 genes) collection of tools to analyze the data

3 Use CCLE and publish in Nature 2 9 M A R C H 2 0 1 2 | V O L 4 8 3 | N A T U R E | 6 0 3 When coupled with pharmacological profiles for 24 anticancer drugs... this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity.... large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents.

4 CCLE is huge - today we’ll touch just the tip of the iceberg

5 Distribution of cancer types in the CCLE by lineage

6 More cell lines than at Sanger and GSK

7 Four major analysis tools

8 Actually, five

9 1. Find differentially expressed genes Define two sample sets Min 10 samples in each set Multiple criteria to define your sets Store sets on a shelf for future use

10 Interface to create two sample sets

11 Compare sets: Differential Expression (heatmap)

12 Compare sets: Differential Expression (one more example)

13 2. Gene neighbors (correlated expression). Steps: 1. Select gene 2. Select one probe for the gene (there could be >= 1) 3. Start analysis Main premise: Guilt by association (likely in the same pathways/network/process)

14 Gene neighbors (correlated expression). The picture is interactive, allowing more analysis

15 See expression as profile plot

16 Profile plots for several selected samples and probes

17 Scatter plots for two samples

18 3. GSEA: Gene Set Enrichment Analysis determines whether an a priori defined set of genes shows statistically significant differences between two biological states Steps: 1. Select gene set from the shelf or create a new one 2. Start analysis (compares with gene set database) 3. Do not rush after submission - wait

19 GSEA: Gene Set Enrichment Analysis for malignant_melanoma (61 samples) and glioma (62 samples)

20 Detailed enrichment results (part)

21 4. GENE-E Java desktop application Can load/analyze multiple datasets allow rapid visual exploration of data sets derived from gene expression, RNAi and chemical screens. heat map clustering, filtering, charting, marker selection, and many other tools. tools that are designed specifically for RNAi and gene expression data.

22 GENE-E 1.Select the tool 2.Java WEB start will download and start the application 3.Open the data for analysis 4.Loading data is slow (large files transfer)

23 GENE-E analysis: expression data Column and row profile plots RIGER ranks shRNAs according to their differential effects between two classes of samples, then identifies the genes targeted by the shRNAs Marker selection identifies entities that are differentially expressed between two classes Hierarchical clustering recursively merges objects based on their pair-wise distance

24 GENE-E: Clustering of samples and genes (part)

25 Marker selection in GENE-E (differentially expressed genes)

26 5. Mutations: shows most frequently mutated genes

27 Mutations details

28 6. Start IGV, integrative genomics viewer Data will be presented together

29 IGV, copy number variation

30 IGV, mutation details

31 IGV, mutation assessment

32 Things to remember For PC: Better to use Internet Explorer, not Firefox At least 10 samples required for the comparative analysis to run Multiple ways to accomplish your goals, DO NOT GIVE UP, JUST CONTINUE Analysis Result saved on your shelf and can be easily revisited later When you click “Analyze”, just wait (do not rush to hit more buttons). Analysis takes time (3-5 min) The system will send you the link to analysis result (when it is done) Delete cookies if you have problems with the browser

33 My Shelf stores my datasets and analysis results

34 BSR/IDM can add some value Hierarchical clustering and heat maps of gene expression on any subset of cancer cell lines Finding any number of co-expressed genes in subset of cancer cell lines (CCLE: 20 genes) Applying different distance measures for clustering (Euclidean distance, Pearson correlation, Absolute difference). Integration of CCLE data with other data of your interest. NMF (Nonnegative Matrix Factorization) clustering of sample/gene subset to determine optimal number of different sample/gene groups and group memberships. Customized bioinformatics analysis of the data.


Download ppt "CCLE Cancer Cell Line Encyclopedia Alexey Erohskin."

Similar presentations


Ads by Google