Download presentation
Presentation is loading. Please wait.
Published byEdward Gibson Modified over 8 years ago
1
mRNASeq analysis using TCGA HNSC data Vinay Kartha Monti lab rotation project 11/25/2013
2
Expression data mRNASeqv2 (Illumina HiSeq 2000) Samples with data available: 340 Each sample has 6 associated files: junction_quantification.txt rsem.genes.results rsem.genes.normalized_results rsem.isoforms.results rsem.isoforms.normalized_results bt.exon_quantification.txt Dataset reduction: Raw expression matrix: 20,531 genes Non-zero expression matrix: 20,200 genes Filtered expression matrix (CV >=1.25): 7,091 genes
3
QC Scatter plot of mean vs SD expression for non-zero expression data CV = std dev / mean = 1.25 CV-filtered data N = 340; n = 7091
4
QC Log-transformed* Asinh-transformed Box plot of CV - filtered expression data across all samples * Pseudocount of 0.01 added
5
QC
6
QC CV = std dev / mean = 1.25 x = y
7
QC Log-transformed* Box plot of MAD-filtered expression data across all samples * Pseudocount of 1 added
8
Clustered gene expression profile
9
Sample clustering based on grade/stage? See if expression is associated with clinical/phenotypic variables of interest Grade: GX: Grade cannot be assessed (undetermined grade) G1: Well differentiated (low grade) G2: Moderately differentiated (intermediate grade) G3: Poorly differentiated (high grade) G4: Undifferentiated (high grade) Stage: SI,SII, and SIII: Higher numbers indicate more extensive disease: Larger tumor size and/or spread of the cancer beyond the organ in which it first developed to nearby lymph nodes and/or tissues or organs adjacent to the location of the primary tumor SIV: Cancer has spread to distant organs and tissues For more information, see: http://www.cancer.gov/cancertopics/factsheet/detection/tumor-grade http://www.cancer.gov/cancertopics/factsheet/detection/tumor-grade http://www.cancer.gov/cancertopics/factsheet/detection/staging http://www.cancer.gov/cancertopics/factsheet/detection/staging
10
Sample clustering based on grade/stage? Fisher’s exact test (k = 2) Histological Grade Pathological Stage ClusterG1G2G3G4GXNATotal 12212949030203 2874386101137 Total30203876131340 ClusterS1S2S3S4AS4BNATotal 1143629102418203 24261760228137 Total186246162646340 p = 2.98e-04 (< 0.05) p = 0.040 (< 0.05)
11
Differential Expression with respect to Grade/Stage 340 samples (Total) TCGA sample vial codes: Histological Grade distribution among samples: Pathological Stage distribution among samples: 01A 301 01B 2 11A 37 G1 30 G2 203 G3 87 G4 6 GX 13 NA 1 G0 37 G1 25 G2 185 G3 77 G4 6 GX 9 NA 1 S0 37 SI 16 SII 47 SIII 41 SIVA 147 SIVB 6 NA 46 SI 18 SII 62 SIII 46 SIVA 162 SIVB 6 NA 46 https://wiki.nci.nih.gov/display/TCGA/TCGA+Barcode
12
Differential Expression with respect to Grade/Stage Cannot adjust expression for certain factors (Race/Ethnicity) due to missing phenotypic information Remove samples with missing information with respect to Grade/Stage and non-white patients G0 37 G1 25 G2 185 G3 77 G4 6 GX 9 NA 1 G0 32 G1 24 G2 156 G3 68 G4 6 Total = 286 S0 37 SI 16 SII 47 SIII 41 SIVA 147 SIVB 6 NA 46 S0 32 SI 14 SII 42 SIII 34 SIVA 130 SIVB 3 Total = 255
13
Adjust for gender? DE wrt Grade DE wrt Stage Don’t want to adjust for gender when it is associated with very few genes
14
Percentile-based gene filtering prior to DE testing Further reduce gene space prior to DE testing using 90 th percentiles to filter on Roughly divide # genes in half by choosing threshold log2(90 th percentile) value 90 th percentile >= 10.5 n = 5046 Grade (N = 286) 90 th percentile >= 10.5 n = 5019 Stage (N = 255)
15
Differential Expression testing Perform DE wrt Grade (N=286; n=5046) and Stage (N=255; n=5019) Tumor vs Normal (G0 vs G1+ ; S0 vs S1+) Within Grade/Stage comparison (G1 vs G2+ ; S2- vs S3+; excluding controls) Permutation-based t-test with sliding ‘time-points’ and sample pooling S3- vs S4A+ => (S1+S2+S3) vs (S4A + S4B) ‘diffanal’ function from diffanal.R (CBM repository) Number of permutations: 1000
16
DE testing by grade ComparisonNo. DE genes G1+3282 G2+617 G3+943 G4456
17
DE testing by grade
18
DE testing by stage ComparisonNo. DE genes S1+327 S2+0 S3+0 S4A+0
19
DE testing by stage
20
DE genes: G0 vs G1+
21
DE genes: G1 vs G2+
22
DE genes: G2- vs G3+
23
DE genes: G3- vs G4
24
AhR targets
26
Variation of expression across grade DPAGT1
27
Variation of expression across grade TAZ
28
Variation of expression across grade YAP1
29
Variation of expression across grade PDGFRB
30
Sliding windows Tool takes Time factors in the order in which they appear in the ‘Time’ column Does not pull corresponding factors in order of Time point levels For example: Results in incorrect ordering of groups prior to sliding window DE testing Time G3 G2 … … Levels: G3 G2 …
31
Future work Perform GSEA/hyper-enrichment and pathway analyses Perform Oral cancer-specific analyses Restrict anatomic sub-types to include only: Alveolar Ridge Base of tongue Buccal Mucosa Floor of mouth Hard Palate Hypopharynx Larynx Lip Oral cavity Oral tongue Oropharynx Tonsil
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.