Download presentation
Published byVirgil Gyles Banks Modified over 8 years ago
1
An Overview of The Cancer Genome Atlas (TCGA)
Maxwell Lee National Cancer Institute Center for Cancer Research Laboratory of Cancer Biology and Genetics High-dimension Data Analysis Group January 7, 2016
2
Outline Of The Talk A brief history of TCGA Overview of TCGA data
TCGA data access policy and download Some examples of data analyses Discussion of relevant TCGA publications
3
History And Timeline Of Human Genome Science
Human genome project 1990 initiation 2000 draft sequence 2003 complete sequence other genome projects 2003 HapMap project 2003 ENCODE project Genomes Project TCGA 2005 pilot project announced 2009 transition to phase II 2014 end
4
History And Timeline Of TCGA
Dec 13, 2005 TCGA pilot project announced 2008 TCGA published glioblastoma paper 2009 TCGA transition to phase II 2011 TCGA published ovarian cancer paper 2014 TCGA ends
9
Major TCGA Research Components
Biospecimen Core Resource (BCR) Collect and process tissue samples Genome Sequencing Centers (GSCs) Use high-throughput Genome Sequencing to identify the changes in DNA sequences in cancer Genome Characterization Centers (GCCs) Analyze genomic and epigenomic changes involved in cancer Proteome Characterization Centers (PCCs) Analyze the proteomic content of a subset of TCGA samples Data Coordinating Center (DCC) The TCGA data are centrally managed at the DCC Cancer Genomics Hub (CGHub) This database stores cancer genome sequences and alignments. Genome Data Analysis Centers (GDACs) These centers provide informatics tools to facilitate broader use of TCGA data.
10
https://wiki.nci.nih.gov/display/TCGA/Introduction+to+TCGA
11
TCGABarcode
12
TCGA Sample Code Mention about code 01, 10, 11, 03, 06
13
TCGA Data Access Policy
An access control policy is in place for TCGA data to ensure that personally identifiable information is kept from unauthorized users. Open access - Houses data that cannot be aggregated to generate a data set unique to an individual. This tier does not require user certification for data access. Controlled access - Houses individually-unique information that could potentially be used to identify an individual. This tier requires user certification for data access.
14
TCGA Data Levels
17
TCGA Controlled Access Data
Access to controlled data is available to researchers who: Agree to restrict their use of the information to biomedical research purposes only Agree with the statements within TCGA Data Use Certification (DUC) Have their institutions certifiably agree to the statements within TCGA DUC Complete the Data Access Request (DAR) form and submit it to the Data Access Committee to be a TCGA Approved User. This form is available electronically through dbGaP.
18
TCGA Controlled Access Data
19
An approved user can request to add downloaders
21
Where to download the data?
TCGA Data Portal GDAC at Broad Institute cBioPortal The Cancer Genomics Hub (CGHub)
22
https://confluence.broadinstitute.org/display/GDAC/Dashboard-Stddata
23
https://confluence.broadinstitute.org/display/GDAC/Dashboard-Analyses
24
Download TCGA Data Using Broad GDAC Firehose
wget unzip firehose_get_latest.zip ./firehose_get ./firehose_get stddata latest ./firehose_get analyses latest ./firehose_get stddata latest LUAD LUSC #Downloaded: 250 files, 6.3G in 29m 54s (3.62 MB/s) ./firehose_get analyses latest BRCA OV #Downloaded: 312 files, 788M in 1h 18m 40s (171 KB/s)
26
IGV views of structural changes of recurrent SVs in MACROD2
Gistic2 analysis of TCGA gastric cancer data of 441 STAD tumor samples showed that FHIT, MACROD2, and PARK2 were in the 6th, 7th, and 12th most significantly deleted regions Hu et al Cancer Res, accepted
27
An Algorithm For Methylation And Expression Index (MEI)
Illumina Infinium HumanMethylation27 BeadChip Illumina HumanRef-8 v2 Expression BeadChip Differential methylation based on IHC (positive vs. negative for ER, PR, Her2, EGFR, or CK5) 2227 methylation markers in 1162 genes Top 3% most variable gene expression 541 genes 128 methylation markers in 65 genes MEI: the weighted sum of the gene expression where the weights are the negative numbers of the Spearman correlations. Figueroa JD, Yang H et al. Breast Cancer Res Treat. 2015
28
Polish dataset: K-M survival using MEI for ER+ and ER- samples
ER+ cases ER- cases Survival Probability Survival Probability p = 0.009 p = 0.360 Year Year
29
Validation: K-M survival using MEI for ER+ samples
TCGA ER+ GSE6532 ER+ p = 0.001 p = 0.001 Year Year OS DMFS Survival Probability OS NKI ER+ METABRIC ER+ p = 0.004 p = Year Year
30
TP53 Missense Mutations Associate With High TP53 Protein Levels
31
Correlation Between Gene Expression And DNA Methylation
34
Figure 4
37
CpG Island Methylator Phenotype Of Glioblastoma
Noushmehr et al. Cancer Cell 2010; 17(5): 510–522.
38
Figure 2
39
IDH1/2 And TET2 Mutations Are Mutually Exclusive
AML from Eastern Cooperative Oncology Group’s (ECOG) E1900 clinical trial Figueroa et al. Cancer Cell 18, 553–567, 2010
40
IDH Mutations Increase DNA Methylation Pathway
41
DNA Methylation Pathway
Shih et al. Nat Rev Cancer Sep;12(9):
47
Cluster-of-cluster Assignments (COCA) Of The Pan-cancer-12 Tumors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.