Smoking Habits There are over 1 billion people in the world that smoke tobacco There are over 1 billion people in the world that smoke tobacco Of these 5-6 million will die on an annual basis Of these 5-6 million will die on an annual basis This habit increases the likelihood of developing lung cancer to 20 times that of a non-smoker This habit increases the likelihood of developing lung cancer to 20 times that of a non-smoker
Gail Butler, Chris Scodeller, Julie Ward, & Lori Foster
Outline Sequencing of SCLC cell line Sequencing of SCLC cell line Somatic mutation Somatic mutation Mutation signatures in NCI-H209 Mutation signatures in NCI-H209 DNA repair pathways DNA repair pathways Genomic Rearrangement-specifically CHD7 Genomic Rearrangement-specifically CHD7
Sequencing of a SCLC cell line Why use SCLC? Why use SCLC? Not surgically resected Not surgically resected Cell line Cell line NCI-H209 NCI-H209 Immortal cell line Immortal cell line 55-year-old male with SCLC 55-year-old male with SCLC Smoking history not recorded Smoking history not recorded Showed histologically typical small cells Showed histologically typical small cells >97% of such tumors associated with tobacco smoking >97% of such tumors associated with tobacco smoking Taken before chemotherapy Taken before chemotherapy
Sequencing: The SOLiD Platform Massively parallel next-generation sequencing Massively parallel next-generation sequencing Greater than 99.94% accuracy Greater than 99.94% accuracy Relatively inexpensive Relatively inexpensive Allows for: Allows for: Whole genome sequencing Whole genome sequencing Targeted resequencing Targeted resequencing Gene expression data Gene expression data
Sample preparation Sample preparation Fragment library or mate pair libraries Fragment library or mate pair libraries Libraries are sheared and adaptor molecules are ligated to each unique molecule Libraries are sheared and adaptor molecules are ligated to each unique molecule
Each molecule attached to a bead Each molecule attached to a bead Amplified using emulsion PCR Amplified using emulsion PCR 3’ end modification 3’ end modification Beads are covalently attached to a glass slide Beads are covalently attached to a glass slide
A universal sequencing primer, ligase, and a set of fluorescently labeled di-base probes are introduced A universal sequencing primer, ligase, and a set of fluorescently labeled di-base probes are introduced
Multiple cycles of ligation, detection, and cleavage performed Multiple cycles of ligation, detection, and cleavage performed After the template has been read, synthesized strand removed After the template has been read, synthesized strand removed Primer attaches to template offset by 1 nucleotide Primer attaches to template offset by 1 nucleotide
Coverage Figure 1A Figure 1A Minimum 30x coverage Minimum 30x coverage Figure 1B Figure 1B 39x coverage for tumour 39x coverage for tumour 31x coverage for normal cell line 31x coverage for normal cell line
Bioinformatics Identify somatically acquired mutations from sequence data Identify somatically acquired mutations from sequence data 77 coding substitutions 77 coding substitutions 333 random variants 333 random variants Indels difficult to detect Indels difficult to detect Supplementary Fig.1
Somatically acquired genomic variants 22,910 somatically acquired (not inherited) mutations 22,910 somatically acquired (not inherited) mutations 70% intergenic 70% intergenic 28% intronic 28% intronic 0.8% non-coding translated 0.8% non-coding translated 0.6% coding 0.6% coding
Figure 1C Figure 1C Somatic mutations of NCI-H209 genome Somatic mutations of NCI-H209 genome Deletions, insertions, heterozygous and homozygous substitutions, mis-sense, nonsense, and rearrangements Deletions, insertions, heterozygous and homozygous substitutions, mis-sense, nonsense, and rearrangements
Point mutations in coding regions RB1 C706F Point Mutation RB1 C706F Point Mutation Nonconservative amino acid substitution Nonconservative amino acid substitution Inhibits phosphorylation and abolishes protein function Inhibits phosphorylation and abolishes protein function TP53 Splice Site Disruption TP53 Splice Site Disruption – TP53 encodes p53, a tumor suppressor Combination of RB1 and TP53 characteristic of SCLC Combination of RB1 and TP53 characteristic of SCLC
Non-synonymous vs. Synonymous Non-synonymous Non-synonymous Codes for different amino acid Codes for different amino acid Synonymous Synonymous Amino acid produced not modified Amino acid produced not modified Accumulation of mutations increasing fitness will be shown as an excess of non-synonymous Accumulation of mutations increasing fitness will be shown as an excess of non-synonymous Observed ratio not different than that expected by chance Observed ratio not different than that expected by chance Suggests that the majority of coding variants do not confer selective advantage Suggests that the majority of coding variants do not confer selective advantage
Mutations in regulatory regions Little known about mutations occurring on either side of transcription start sites Little known about mutations occurring on either side of transcription start sites Supplementary Fig. 2A Supplementary Fig. 2A Find somatic substitutions within 2kb of known transcription start sites Find somatic substitutions within 2kb of known transcription start sites
Apply hidden Markov models Apply hidden Markov models AI program that can be trained to find sequences AI program that can be trained to find sequences Predict which substitutions might affect transcription factor binding sites Predict which substitutions might affect transcription factor binding sites Supplementary Fig. 2B Supplementary Fig. 2B Distribution observed no different than that those mutations seen in random “simulated sets” of mutations Distribution observed no different than that those mutations seen in random “simulated sets” of mutations
May still be mutations that alter transcription factor binding and affect gene regulation May still be mutations that alter transcription factor binding and affect gene regulation Example Supplementary Fig. 2C Example Supplementary Fig. 2C T>G in RAS oncogene family gene, RAB42 T>G in RAS oncogene family gene, RAB42 Disrupts potential binding motif Disrupts potential binding motif
Big picture of somatic mutations Data indicates that most of the mutations in the coding and promoter regions are passenger events Data indicates that most of the mutations in the coding and promoter regions are passenger events Events that don’t contribute to the development of cancer, but have occurred during cancer growth Events that don’t contribute to the development of cancer, but have occurred during cancer growth Mutations confer no selective advantage to the cells Mutations confer no selective advantage to the cells
Tobacco smoke contains more than 60 carcinogens which bind and chemically modify DNA.
The carcinogen binds to the DNA forming a bulky adducts at purine bases (guanine and adenine). -Change the alpha helix -Allow non-Watson–Crick pairing -Get in the way
Most Common Transversions G>T/C>A (34%) G>A/C>T (21%) A>G/T>C (19%) Top 3 transversions are all purines…
This distribution of transversions is consistent with the literature This distribution of transversions is consistent with the literature Shows there is consistenency with mutational patterns. Shows there is consistenency with mutational patterns. Control for in vivo mutation Control for in vivo mutation
G>T transversions occur more frequently at methylated CpG dinucleotides G>T transversions occur more frequently at methylated CpG dinucleotides (34%) of total mutations (34%) of total mutations
CpG Sites cytosine-phosphate- guanine
G>T transversions occur more frequently at methylated CpG dinucleotides G>T transversions occur more frequently at methylated CpG dinucleotides In mammals, 70% to 80% of CpG are methylated In mammals, 70% to 80% of CpG are methylated (34%) of total mutations
5’ 3’ 5’ CpG Island: High frequency of cytosine connected to guanine. CpG islands are regions that contain a high CpG content. CpG islands are regions that contain a high CpG content. They are in and near approximately 40% of promoters of mammalian genes. They are in and near approximately 40% of promoters of mammalian genes.
It’s getting complicated so lets recap: Most transversion mutations (34% of total) are G>T Most transversion mutations (34% of total) are G>T The G >T mutations happen often at CpG sites The G >T mutations happen often at CpG sites The G >T mutations which happen at CpG sites are often methylated CpG sites The G >T mutations which happen at CpG sites are often methylated CpG sites
When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in the genome, a C is expected to precede a G
When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in a G>T mutations, a C precedes the G
Wait, what? 5’ 3’ 5’ -N-N-N-N-?-G-N-N-N-N-N-N-N-C-G-N-N-N-N-?-G>T-N-N-N-N-N-?- G-N-N-N- The expected fraction of CpG’s per Guanine in genomic DNA The fraction of G>Ts mutations on CpG’s per guanine in CpG islands. If everything was random, we would expect the G>T mutations to have an equal make up of CpG/G, as genomic CpG/G… …but that is not so!
Wait, what?
When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in a G>T mutations, a C precedes the G
When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in a G>A mutation, a C precedes the G Often occur outside CpG islands. Often occur outside CpG islands. Unusually high fraction likely due to spontaneous deamination of methylated cytosine to thymine Unusually high fraction likely due to spontaneous deamination of methylated cytosine to thymine
When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in a G>C mutation, a C precedes the G similar to G>T but these were significantly more likely to occur within CpG islands similar to G>T but these were significantly more likely to occur within CpG islands
WHAT DOES THIS ALL MEAN? “Thus, the sequence context of the 23,000 mutations in the NCI-H209 genome provides tremendous power to identify multiple distinctive mutation signatures, not evident from targeted re-sequencing studies of limited genomic regions.”
It’s getting complicated (still) so lets recap: Most transversion mutations (34% of total) are G>T Most transversion mutations (34% of total) are G>T The G >T mutations happen often at CpG sites The G >T mutations happen often at CpG sites The G >T mutations which happen at CpG sites are often methylated CpG sites. The G >T mutations which happen at CpG sites are often methylated CpG sites.
So how does the Methylation play into all this? Only 10–20% of CpG dinucleotides in CpG islands are methylated while 60–70% CpG sites are methylated outside the islands. Only 10–20% of CpG dinucleotides in CpG islands are methylated while 60–70% CpG sites are methylated outside the islands. This provides a model to see how methylation of CpG sites affects C>T mutations. This provides a model to see how methylation of CpG sites affects C>T mutations.
5’ 3’ 5’ CpG Island In other words, lets compare the frequency of G>C mutations here and here to see how methylation effects mutation. Non CpG Island Percent Methylated Percent Methylated
Non CpG islands CpG islands CpG islands Less CpG mutations in CpG islands than CpGs in non CpG islands.
5’ 3’ 5’ CpG Island Percent Methylated Less C>T Mutation Non CpG Island Percent Methylated More C>T Mutation Less G>C mutations in the islands…and there is less methylation in the islands….. …suggesting that C>T mutations preferentially occur at methylated CpGs
Can’t we fix this??? Bulky adducts on purines are the most common source of DNA damage from tobacco carcinogens. Bulky adducts on purines are the most common source of DNA damage from tobacco carcinogens. These bulky adducts get in the way of the RNA polymerase. These bulky adducts get in the way of the RNA polymerase. When the RNA polymerase stops, it recruits nucleotide excision repair machinery, leading to excision of the altered nucleotide, preventing mutation. When the RNA polymerase stops, it recruits nucleotide excision repair machinery, leading to excision of the altered nucleotide, preventing mutation.
The more expression, the more the repair. The more expression, the more the repair. Mutation repair in non transcribed regions occurred less frequently than transcribed regions (good!). Mutation repair in non transcribed regions occurred less frequently than transcribed regions (good!).
G>A mutations Mutations occurred about equally on transcribed and non-transcribed strands Mutations on both strands were significantly reduced in more highly expressed genes. A>G mutations Transcribed strand mutations decreased with higher gene expression. Non Transcribed mutations were relatively level.
This suggests at least two separate DNA repair pathways This suggests at least two separate DNA repair pathways Which suggests “distinct physicochemical effects on DNA structure, with variable recognition and excision by the genome surveillance machinery.” Which suggests “distinct physicochemical effects on DNA structure, with variable recognition and excision by the genome surveillance machinery.”
Genomic Rearrangements & Copy Number NCI-H209 genome has 58 somatic genome rearrangements 18 deletions (31%)18 deletions (31%) 9 tandem duplications (16%)9 tandem duplications (16%) 15 Inverted intrachromosomal rearrangements (26%)15 Inverted intrachromosomal rearrangements (26%) 9 non-inverted intrachromosomal rearrangements (16%)9 non-inverted intrachromosomal rearrangements (16%) 7 interchromosomal rearrangements7 interchromosomal rearrangements
Figure 3. Rearrangements between chromosomes 1 & 4 Intrachromosomal inversions Non-inverted intrachromosomal rearrangements Interchromosomal rearrangements Not classical inversions: Not classical inversions: Clear boundaries separating changes in copy number in genes on both chromosomesClear boundaries separating changes in copy number in genes on both chromosomes Breakpoints between chromosomes aren't reciprocalBreakpoints between chromosomes aren't reciprocal Unbalanced rearrangementsUnbalanced rearrangements
Oncogenic Fusion Genes Oncogenic Fusion Gene: A hybrid gene formed from two genes previously separated Chromosomal rearrangements can result in an oncogenic fusion gene if: 2 genes side by side Intact ORF Genes in the same orientation NCI-H209 Fusion gene: 240 bp deletion on chromosome 16: 1st 2 exons of CREBBP 3' portion of BTBD12 RT-PCR showed expression of fusion transcript This wasn't expressed in 55 other SCLS Direct further studies here????
CHD7 significance Figure 4. CHD7 codes for a chromatin helicase DNA binding protein NCI-H209: 39.5kb tandem duplication of39.5kb tandem duplication of exons 3-8 of CHD7 (Figure 4a &4c.) exons 3-8 of CHD7 (Figure 4a &4c.) NCI-H2171: Fusion gene of exons 1-3 of PVT1 (non-coding RNA gene immediately downstream of MYC) & exons 4-38 of CHD7 (Figure 4c.)-MYC amplificationFusion gene of exons 1-3 of PVT1 (non-coding RNA gene immediately downstream of MYC) & exons 4-38 of CHD7 (Figure 4c.)-MYC amplificationLU-135: Fusion gene of exon 1 of PVT1 (non-coding RNA gene immediately downstream of MYC) & exons of CHD7 (Figure 4c.) -MYC amplificationFusion gene of exon 1 of PVT1 (non-coding RNA gene immediately downstream of MYC) & exons of CHD7 (Figure 4c.) -MYC amplification This suggests that CHD7 rearrangements are a regular phenomenon in SCLC This suggests that CHD7 rearrangements are a regular phenomenon in SCLC
Figure 4. LU-135 studied by mate pair sequencing showed: Fusion gene of exon 1 of PVT1 (non-coding RNA gene immediately downstream of MYC) & exons of CHD7 CHD7 amplicon linked to MYC expression amplification MYC codes for a transcription factor that regulates expression of multiple genesMYC codes for a transcription factor that regulates expression of multiple genes Rearrangements resulted in increased expressivity in MYC & 3' end of CHD7Rearrangements resulted in increased expressivity in MYC & 3' end of CHD7 LU-135
Figure 4. NCI-H2171 & LU-135 show elevated levels of expression NCI-H2171 & LU-135 show elevated levels of expression SCLC in general have a greater normalized expression of CHD7 than non-SCLC & other tumor types
CHD7 Summary CHD7 rearrangements found in 3 SCLC cell linesCHD7 rearrangements found in 3 SCLC cell lines LU-135 & NCI-H2171: have PVTI-CHD7 fusion genes + MYC amplification LU-135 & NCI-H2171: have PVTI-CHD7 fusion genes + MYC amplification PVTI downstream of MYC & may be a transcriptional target of the MYC protein PVTI downstream of MYC & may be a transcriptional target of the MYC protein Insertion of CHD7 with subsequent amplification results in increased gene copy number & regulatory elements Insertion of CHD7 with subsequent amplification results in increased gene copy number & regulatory elements OVEREXPRESSION OVEREXPRESSION NCI-H209: duplication of parts of the CHD7 geneNCI-H209: duplication of parts of the CHD7 gene CHD7 is a chromatin remodeller that promotes enhancer-mediated transcription through histone methylation CHD7 is a chromatin remodeller that promotes enhancer-mediated transcription through histone methylation Histone modifiers have been implicated as cancer genes previously Histone modifiers have been implicated as cancer genes previously Rearrangements of CHD7 would make for an Rearrangements of CHD7 would make for an interesting extension of this paper interesting extension of this paper
Summary Each mutation due to the carcinogen affect causes consequences in three processes: Each mutation due to the carcinogen affect causes consequences in three processes: Chemical modification of a purine Chemical modification of a purine Failure to repair via surveillance pathways Failure to repair via surveillance pathways Incorrect nucleotide incorporation due to base distortion during DNA replication Incorrect nucleotide incorporation due to base distortion during DNA replication
Summary Transcription-coupled repair Transcription-coupled repair Stall RNA polymerase observed with NCI-H209 Stall RNA polymerase observed with NCI-H209 A>G mutations A>G mutations Expression-linked repair Expression-linked repair More effective in highly transcribed regions More effective in highly transcribed regions G>A mutations G>A mutations Combined Combined G>T and A>T mutations G>T and A>T mutations
After Thought Lung cancer develops after 50 pack years of smoking Lung cancer develops after 50 pack years of smoking 7,300 cigarettes a year 7,300 cigarettes a year On average you acquire one mutation for every 15 cigarettes smoked On average you acquire one mutation for every 15 cigarettes smoked
Questions?