Artefacts and Biases in Gene Set Analysis

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

RIP – T RANSCRIPT E XPRESSION L EVELS. O UTLINE RNA Immuno-Precipitation (RIP) NGS on RIP & its alternatives Alternate splicing Transcription as a graph.
Investigating the Importance of non-coding transcripts.
More regulating gene expression. Fig 16.1 Gene Expression is controlled at all of these steps: DNA packaging Transcription RNA processing and transport.
AP Biology Control of Eukaryotic Genes.
Gene Expression and Regulation
Regulation of Gene Expression Eukaryotes
Verna Vu & Timothy Abreo
Presenting Results Laura Biggins v1.0 1.
AP Biology Control of Eukaryotic Genes.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
AP Biology Control of Eukaryotic Genes.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Complexities of Gene Expression Cells have regulated, complex systems –Not all genes are expressed in every cell –Many genes are not expressed all of.
No reference available
Regulating Gene Expression To accompany “Regulating Gene Expression” Packet -review packet reading, pictures, and questions.
Extracting Biological Information from Gene Lists
Simon v RNA-Seq Analysis Simon v
Experimental summary Norwich DMSO treated versus Control Yeast
High-throughput data used in bioinformatics
Simon v1.0 Motif Searching Simon v1.0.
An Introduction to Protein Synthesis
Biases and their Effect on Biological Interpretation
Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 2016 Workshop
Gene expression from RNA-Seq
Control of Gene Expression
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Detect alternative splicing
Understanding and Validating Experimental Expectations
The Basics of Molecular Biology
The FASTQ format and quality control
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
2/23/15 Learning Objectives
Analysing ChIP-Seq Data
Gene Regulation Ability of an organisms to control which genes are present in response to the environment.
Behaviorally dependent allele-specific expression.
Relationship between Genotype and Phenotype
ChIP-Seq Data Processing and QC
Characterization of microRNA transcriptome in tumor, adjacent, and normal tissues of lung squamous cell carcinoma  Jun Wang, MD, PhD, Zhi Li, MD, PhD,
Relationship between Genotype and Phenotype
Eukaryote Regulation and Gene Expression
Altered microRNA expression in stenoses of native arteriovenous fistulas in hemodialysis patients  Lei Lv, MD, Weibin Huang, MD, Jiwei Zhang, MD, Yaxue.
Volume 24, Issue 1, Pages (July 2016)
Relationship between Genotype and Phenotype
Exploring and Understanding ChIP-Seq data
Epigenetics Study of the modifications to genes which do not involve changing the underlying DNA
Corona cell RNA sequencing from individual oocytes revealed transcripts and pathways linked to euploid oocyte competence and live birth  Jason C. Parks,
Control of Eukaryotic Genes
Control of Gene Expression in Eukaryotic cells
Visualising and Exploring BS-Seq Data
Genome-Wide Analysis of PLT2 Binding to Target Genes.
Relationship between Genotype and Phenotype
What is the “big picture” question?
Gene expression and regulation & Mutations
Simon V Motif Searching Simon V
I. Introduction and Data Collection C. Conducting a Study
Artefacts and Biases in Gene Set Analysis
MRNAs that show increased protein synthesis following UPF1 depletion are enriched for those with very long 3′UTRs. mRNAs that show increased protein synthesis.
The Omics Dashboard.
Predicting Gene Expression from Sequence
Experimental summary Norwich DMSO treated versus Control Yeast
Cell protein production
DNA, RNA and Genes Science 10 – 2017/18 Mrs. Day.
Comparison of N. gonorrhoeae gene expression in infected men in vivo and isolates grown in vitro. Comparison of N. gonorrhoeae gene expression in infected.
Homework #2 is due 10/18 Bonus #1 is due 10/25 Exam key is online.
Relationship between Genotype and Phenotype
Fig. 3 Conserved genomic association of PRC1 activity in different leukemic cells. Conserved genomic association of PRC1 activity in different leukemic.
Fig. 5 E2F1 also interacts with alternatively spliced transcripts from the MECOM gene. E2F1 also interacts with alternatively spliced transcripts from.
Relationship between Genotype and Phenotype
Differential Expression of RNA-Seq Data
Presentation transcript:

Artefacts and Biases in Gene Set Analysis Simon Andrews, Laura Biggins, Christel Krueger simon.andrews@babraham.ac.uk laura.biggins@babraham.ac.uk christel.krueger@babraham.ac.uk v2017-06

What does gene set enrichment test? Is a functional gene set enriched for genes in my hit list compared to a background set Are some genes more likely to turn up in the hits for technical reasons? Are some genes never likely to turn up in the hit list for technical reasons?

Biases All datasets contain biases Technical Biological Statistical Biases can lead to incorrect conclusions We should be trying to spot these Some are more obvious than others!

Technical Biases Simple GC bias from different polymerases in PCR

Statistical Biases The power to detect a significant effect is based on: How big the change is How well observed the data is (sample size) Lists of hits are often biased based on statistical power

RNA-Seq Statistical Biases What determines whether a gene is identified as significantly differentially regulated? The amount of change (fold change) The variability How well observed was it How much sequencing was done overall? How highly expressed was the gene? How long was the gene? How mappable was the gene?

RNA-Seq Statistical Biases

Biological Biases

Not Significant = Not Changing GO: Developmental Protein p=7.8e-8

Biases Look Like Real Biology

Correct selection of a background list can make a huge difference What genes were you likely to see? Some are technically impossible Membrane proteins in LC-MS Small-RNA in RNA-Seq Some are much less likely Unexpressed or low expressed in RNA-Seq Unmappable in ChIP-Seq Low CpG content in BS-Seq

Statistical biases affect gene sets too Fisher’s test is powered by Magnitude of change Observation level Big lists have more power to detect change Small lists are very difficult to detect

Relating Hits to Genes Most functional analysis is done at the gene level Gene Ontology Pathways Interactions Many hits are not gene based

Random Genomic Positions Find closest gene Synapse, Cell Junction, postsynaptic membrane (p=8.9e-12) Membrane (p=4.3e-13) Glycoprotein (p=1.3e-12) Find overlapping genes Plekstrin homology domain (p=1.8e-7) Ion transport (p=7.1e-7) ATP-binding (p=3.8e-8)

Random Transcripts Tends to favour genes with more splice variants Metal Binding, Zinc Finger (p=4.4e-12) Nucleus, Transcription Regulation (p=2.4e-14)

Hit Validation Do my hits look different from non-hits in factors which should be unrelated How easy would it be for the effect I see to be generated through a technical artefact?

Look for confounders Make sure background is appropriate Check for compositional bias Be suspicious of some ontology categories ribosomal, cytoskeleton, extracellular, secreted, translation What should we do if we think there are artefacts? http://www.bioinformatics.babraham.ac.uk/shiny/gene_screen/

Look for confounders http://www.bioinformatics.babraham.ac.uk/shiny/gene_screen/

Look for confounders Compter Sequence kmer analysis Does composition explain my hits? www.bioinformatics.babraham.ac.uk/projects/compter