Presentation is loading. Please wait.

Presentation is loading. Please wait.

MRNASeq analysis using TCGA HNSC data Vinay Kartha Monti lab rotation project 11/25/2013.

Similar presentations


Presentation on theme: "MRNASeq analysis using TCGA HNSC data Vinay Kartha Monti lab rotation project 11/25/2013."— Presentation transcript:

1 mRNASeq analysis using TCGA HNSC data Vinay Kartha Monti lab rotation project 11/25/2013

2 Expression data  mRNASeqv2 (Illumina HiSeq 2000)  Samples with data available: 340  Each sample has 6 associated files:  junction_quantification.txt  rsem.genes.results  rsem.genes.normalized_results  rsem.isoforms.results  rsem.isoforms.normalized_results  bt.exon_quantification.txt  Dataset reduction:  Raw expression matrix: 20,531 genes  Non-zero expression matrix: 20,200 genes  Filtered expression matrix (CV >=1.25): 7,091 genes

3 QC Scatter plot of mean vs SD expression for non-zero expression data CV = std dev / mean = 1.25 CV-filtered data N = 340; n = 7091

4 QC Log-transformed* Asinh-transformed Box plot of CV - filtered expression data across all samples * Pseudocount of 0.01 added

5 QC

6 QC CV = std dev / mean = 1.25 x = y

7 QC Log-transformed* Box plot of MAD-filtered expression data across all samples * Pseudocount of 1 added

8 Clustered gene expression profile

9 Sample clustering based on grade/stage?  See if expression is associated with clinical/phenotypic variables of interest  Grade:  GX: Grade cannot be assessed (undetermined grade)  G1: Well differentiated (low grade)  G2: Moderately differentiated (intermediate grade)  G3: Poorly differentiated (high grade)  G4: Undifferentiated (high grade)  Stage:  SI,SII, and SIII: Higher numbers indicate more extensive disease: Larger tumor size and/or spread of the cancer beyond the organ in which it first developed to nearby lymph nodes and/or tissues or organs adjacent to the location of the primary tumor  SIV: Cancer has spread to distant organs and tissues  For more information, see:  http://www.cancer.gov/cancertopics/factsheet/detection/tumor-grade http://www.cancer.gov/cancertopics/factsheet/detection/tumor-grade  http://www.cancer.gov/cancertopics/factsheet/detection/staging http://www.cancer.gov/cancertopics/factsheet/detection/staging

10 Sample clustering based on grade/stage?  Fisher’s exact test (k = 2)  Histological Grade  Pathological Stage ClusterG1G2G3G4GXNATotal 12212949030203 2874386101137 Total30203876131340 ClusterS1S2S3S4AS4BNATotal 1143629102418203 24261760228137 Total186246162646340 p = 2.98e-04 (< 0.05) p = 0.040 (< 0.05)

11 Differential Expression with respect to Grade/Stage  340 samples (Total)  TCGA sample vial codes:  Histological Grade distribution among samples:  Pathological Stage distribution among samples: 01A 301 01B 2 11A 37 G1 30 G2 203 G3 87 G4 6 GX 13 NA 1 G0 37 G1 25 G2 185 G3 77 G4 6 GX 9 NA 1 S0 37 SI 16 SII 47 SIII 41 SIVA 147 SIVB 6 NA 46 SI 18 SII 62 SIII 46 SIVA 162 SIVB 6 NA 46 https://wiki.nci.nih.gov/display/TCGA/TCGA+Barcode

12 Differential Expression with respect to Grade/Stage  Cannot adjust expression for certain factors (Race/Ethnicity) due to missing phenotypic information  Remove samples with missing information with respect to Grade/Stage and non-white patients G0 37 G1 25 G2 185 G3 77 G4 6 GX 9 NA 1 G0 32 G1 24 G2 156 G3 68 G4 6 Total = 286 S0 37 SI 16 SII 47 SIII 41 SIVA 147 SIVB 6 NA 46 S0 32 SI 14 SII 42 SIII 34 SIVA 130 SIVB 3 Total = 255

13 Adjust for gender? DE wrt Grade DE wrt Stage  Don’t want to adjust for gender when it is associated with very few genes

14 Percentile-based gene filtering prior to DE testing  Further reduce gene space prior to DE testing using 90 th percentiles to filter on  Roughly divide # genes in half by choosing threshold log2(90 th percentile) value 90 th percentile >= 10.5 n = 5046 Grade (N = 286) 90 th percentile >= 10.5 n = 5019 Stage (N = 255)

15 Differential Expression testing  Perform DE wrt Grade (N=286; n=5046) and Stage (N=255; n=5019)  Tumor vs Normal (G0 vs G1+ ; S0 vs S1+)  Within Grade/Stage comparison (G1 vs G2+ ; S2- vs S3+; excluding controls)  Permutation-based t-test with sliding ‘time-points’ and sample pooling  S3- vs S4A+ => (S1+S2+S3) vs (S4A + S4B)  ‘diffanal’ function from diffanal.R (CBM repository)  Number of permutations: 1000

16 DE testing by grade ComparisonNo. DE genes G1+3282 G2+617 G3+943 G4456

17 DE testing by grade

18 DE testing by stage ComparisonNo. DE genes S1+327 S2+0 S3+0 S4A+0

19 DE testing by stage

20 DE genes: G0 vs G1+

21 DE genes: G1 vs G2+

22 DE genes: G2- vs G3+

23 DE genes: G3- vs G4

24 AhR targets

25

26 Variation of expression across grade  DPAGT1

27 Variation of expression across grade  TAZ

28 Variation of expression across grade  YAP1

29 Variation of expression across grade  PDGFRB

30 Sliding windows  Tool takes Time factors in the order in which they appear in the ‘Time’ column  Does not pull corresponding factors in order of Time point levels  For example:  Results in incorrect ordering of groups prior to sliding window DE testing Time G3 G2 … … Levels: G3 G2 …

31 Future work  Perform GSEA/hyper-enrichment and pathway analyses  Perform Oral cancer-specific analyses  Restrict anatomic sub-types to include only:  Alveolar Ridge  Base of tongue  Buccal Mucosa  Floor of mouth  Hard Palate  Hypopharynx  Larynx  Lip  Oral cavity  Oral tongue  Oropharynx  Tonsil


Download ppt "MRNASeq analysis using TCGA HNSC data Vinay Kartha Monti lab rotation project 11/25/2013."

Similar presentations


Ads by Google