CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.

Slides:



Advertisements
Similar presentations
Data Visualization in Molecular Biology Alexander Lex July 29, 2013.
Advertisements

The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
RNA-seq analysis case study Anne de Jong 2015
Microarray GEO – Microarray sets database
STAC: A multi-experiment method for analyzing array-based genomic copy number data Sharon J. Diskin, Thomas Eck, Joel P. Greshock, Yael P. Mosse, Tara.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
Analysis of microarray data
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Before we start: Align sequence reads to the reference genome
NaviCell Web Service Data visualization tutorial.
NGS Analysis Using Galaxy
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007.
Bioinformatics Brad Windle Ph# Web Site:
Using geWorkbench: Hierarchical & SOM Clustering Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of.
Course on Functional Analysis
Welcome to DNA Subway Classroom-friendly Bioinformatics.
CELL INDEX DATABASE (CELLX): A WEB TOOL FOR CANCER PRECISION MEDICINE Pacific Symposium on Biocomputing (PSB) 2015 January.
Gene expression analysis
Presenting Results Laura Biggins v1.0 1.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
The Stanley Neuropathology Consortium Integrative Database: A novel web-based tool for exploring neuropathological traits, gene expression and associated.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Introduction to caIntegrator caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT.
The Broad Institute of MIT and Harvard Differential Analysis.
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Introduction to Oncomine Xiayu Stacy Huang. Oncomine is a cancer-specific microarray database and has a web-based data-mining platform aimed at facilitating.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
URL PHONE FAX ADDRESS #909, VENTURE VALLEY, 958, GOSAEK-DONG, GWONSEON-GU,SUWON,
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Data Visualization with Tableau
Introductory RNA-seq Transcriptome Profiling
CellExpress Tutorial A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Using ArrayExpress.
CellExpress Examples A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
Figure 1. Exploring and comparing context-dependent mutational profiles in various cancer types. (A) Mutational profiles of pan-cancer somatic mutations,
Figure 1. Complete work-flow of the Scasat
Gene expression analysis
Genomic alterations in breast cancer cell line MDA-MB-231.
Agenda About Excel/Calc Spreadsheets Key Features
Cancer Cell Line Encyclopedia
Stephen Bridgett, James Campbell, Christopher J. Lord, Colm J. Ryan 
Global analysis of the chemical–genetic interaction map.
Presentation transcript:

CCLE Cancer Cell Line Encyclopedia Alexey Erohskin

Cancer Cell Line Encyclopedia a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 1034 human cancer cell lines – genome-wide human Affymetrix SNP Array 6.0 – mRNA expression data (Affymetrix Human Genome U133 Plus 2.0 ) – Mass spectrometric mutation detection (33 genes) – Solution phase hybrid capture and massively parallel sequencing (1,600 genes) collection of tools to analyze the data

Use CCLE and publish in Nature 2 9 M A R C H | V O L | N A T U R E | When coupled with pharmacological profiles for 24 anticancer drugs... this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity.... large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents.

CCLE is huge - today we’ll touch just the tip of the iceberg

Distribution of cancer types in the CCLE by lineage

More cell lines than at Sanger and GSK

Four major analysis tools

Actually, five

1. Find differentially expressed genes Define two sample sets Min 10 samples in each set Multiple criteria to define your sets Store sets on a shelf for future use

Interface to create two sample sets

Compare sets: Differential Expression (heatmap)

Compare sets: Differential Expression (one more example)

2. Gene neighbors (correlated expression). Steps: 1. Select gene 2. Select one probe for the gene (there could be >= 1) 3. Start analysis Main premise: Guilt by association (likely in the same pathways/network/process)

Gene neighbors (correlated expression). The picture is interactive, allowing more analysis

See expression as profile plot

Profile plots for several selected samples and probes

Scatter plots for two samples

3. GSEA: Gene Set Enrichment Analysis determines whether an a priori defined set of genes shows statistically significant differences between two biological states Steps: 1. Select gene set from the shelf or create a new one 2. Start analysis (compares with gene set database) 3. Do not rush after submission - wait

GSEA: Gene Set Enrichment Analysis for malignant_melanoma (61 samples) and glioma (62 samples)

Detailed enrichment results (part)

4. GENE-E Java desktop application Can load/analyze multiple datasets allow rapid visual exploration of data sets derived from gene expression, RNAi and chemical screens. heat map clustering, filtering, charting, marker selection, and many other tools. tools that are designed specifically for RNAi and gene expression data.

GENE-E 1.Select the tool 2.Java WEB start will download and start the application 3.Open the data for analysis 4.Loading data is slow (large files transfer)

GENE-E analysis: expression data Column and row profile plots RIGER ranks shRNAs according to their differential effects between two classes of samples, then identifies the genes targeted by the shRNAs Marker selection identifies entities that are differentially expressed between two classes Hierarchical clustering recursively merges objects based on their pair-wise distance

GENE-E: Clustering of samples and genes (part)

Marker selection in GENE-E (differentially expressed genes)

5. Mutations: shows most frequently mutated genes

Mutations details

6. Start IGV, integrative genomics viewer Data will be presented together

IGV, copy number variation

IGV, mutation details

IGV, mutation assessment

Things to remember For PC: Better to use Internet Explorer, not Firefox At least 10 samples required for the comparative analysis to run Multiple ways to accomplish your goals, DO NOT GIVE UP, JUST CONTINUE Analysis Result saved on your shelf and can be easily revisited later When you click “Analyze”, just wait (do not rush to hit more buttons). Analysis takes time (3-5 min) The system will send you the link to analysis result (when it is done) Delete cookies if you have problems with the browser

My Shelf stores my datasets and analysis results

BSR/IDM can add some value Hierarchical clustering and heat maps of gene expression on any subset of cancer cell lines Finding any number of co-expressed genes in subset of cancer cell lines (CCLE: 20 genes) Applying different distance measures for clustering (Euclidean distance, Pearson correlation, Absolute difference). Integration of CCLE data with other data of your interest. NMF (Nonnegative Matrix Factorization) clustering of sample/gene subset to determine optimal number of different sample/gene groups and group memberships. Customized bioinformatics analysis of the data.