Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews V2019-02 Motif Searching Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews V2019-02.

Slides:

Advertisements

Similar presentations

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.

Advertisements

Visualising and Exploring BS-Seq Data

Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Using ChIPMunk for motif discovery - quick-start guide - slide 3:10 - preparing data slide 11:12 - running ChIPMunk slide 13:14 - do I need ChIPHorde?

Transcription factor binding motifs (part I) 10/17/07.

Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.

CisGreedy Motif Finder for Cistematic Sarah Aerni Mentors: Ali Mortazavi Barbara Wold.

Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Sequencing a genome and Basic Sequence Alignment

Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.

© Wiley Publishing All Rights Reserved.

Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.

1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.

Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.

Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.

Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.

NGS data analysis CCM Seminar series Michael Liang:

Copyright OpenHelix. No use or reproduction without express written consent1.

NGS Bioinformatics Workshop 1.5 Tutorial – Genome Annotation April 5th, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor, MBB.

Sequencing a genome and Basic Sequence Alignment

1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

Copyright OpenHelix. No use or reproduction without express written consent1.

Comparative Genomics Gene Regulatory Networks (GRNs) Anil Jegga Biomedical Informatics Contact Information: Anil Jegga Biomedical Informatics Room # 232,

数据库使用杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.

Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.

Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.

Data Mining in Ensembl with BioMart Giulietta Spudich.

How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.

I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan.

Finding genes in the genome

The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.

Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.

HOMER – a one stop shop for ChIP-Seq analysis

Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.

Differential Methylation Analysis

Simon v RNA-Seq Analysis Simon v

SeqMonk tools for methylation analysis

Lesson: Sequence processing

Simon v1.0 Motif Searching Simon v1.0.

Biases and their Effect on Biological Interpretation

Regulatory Genomics Lab

Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.

Detection of genome regulation sequences

ANIMAL TARGET PREDICTION - TIPS

A Very Basic Gibbs Sampler for Motif Detection

From: DynaMIT: the dynamic motif integration toolkit

Logan-Hocking Schools

Sequence comparison: Local alignment

Learning Sequence Motif Models Using Expectation Maximization (EM)

EPConDB: Endocrine Pancreas Consortium Database

De novo Motif Finding using ChIP-Seq

De novo Motif Finding using ChIP-Seq

TSS Annotation Workflow

Analysing ChIP-Seq Data

Artefacts and Biases in Gene Set Analysis

ChIP-Seq Data Processing and QC

Summary and Recommendations

Exploring and Understanding ChIP-Seq data

Misleading Bioinformatics Mistakes, Biases, Mis-Interpretations and how to avoid them Festival of Genomics 2017.

Visualising and Exploring BS-Seq Data

Studbook Animal Lists.

Yating Liu July 2018 G-OnRamp workshop

Artefacts and Biases in Gene Set Analysis

Regulatory Genomics Lab

Welcome to the GrameneMart Tutorial

Basic Local Alignment Search Tool

Summary and Recommendations

Welcome - webinar instructions

Regulatory Genomics Lab

Presentation transcript:

Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews V2019-02 Motif Searching Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews V2019-02

Rationale Hit A Hit B Hit C GGATCC Gene A Gene B Gene C Prom A Prom B Prom C

Basic Questions Does the sequence around my hits look unusual? Do specific sequences turn up more often than expected in my hits? If so, do the sequences look like any known functional sequence? Are there sequences which can distinguish between two or more groups of hits?

Basic Workflow Hit regions Extract Sequences Check for artefacts Genes, CDS, Positions, Whatever Extract Sequences Check for artefacts Try to identify enriched sequences Check for enriched sequences Check for composition

Deciding what to extract Hit Hit Hit plus context Fixed width, centred on hit Gene A Promoter Gene Body / CDS 3’ UTR 5’ UTR

Extracting Sequence From positions From features BEDTools Genome Browsers* Custom scripts From features BioMart *not easily automatable for multiple sequences

BioMart – Selecting Assembly http://ensembl.org/biomart

BioMart – Specifying features

BioMart – selecting seq region

BioMart – header info

BioMart - exporting

Deciding on a comparison Genomic Dataset Single Dataset Dataset 1 Dataset 2 Enrichment Enrichment Choosing the appropriate comparison is the hardest part!

Filtering list of hits High specificity Quick run times Small list Large list High specificity Quick run times Potentially lower power Highest hit artefacts More power Long run times More noise Don’t need all hits to generate motif Often better to have a clean sequence set Remove sequences which look unusual

Artefacts Exclude common repeats Check composition Hit LINE LINE LINE LINE CGI CGI CGI Exclude common repeats Simple repeats (poly-A, SerThr repeats etc) Complex repeats (retroviral etc) Exclude hits with repeats Repeatmasked sequence Check composition Analyse compositionally biased regions explicitly

Software meme-suite.org xxmotif.genzentrum.lmu.de/ lgsun.grc.nia.nih.gov/CisFinder/ cb.utdallas.edu/cread/ HOMER homer.salk.edu/homer/motif/

MEME Suite

MEME Motif Discovery MEME DREME GLAM2 Original motif enrichment program PWM based motifs Long ungapped motifs, sensitive search, slow! DREME Short ungapped discriminatory motifs Degeneracy based motifs Quick! GLAM2 Gapped motifs

Main Parameters: Sequences (multi-fasta) Expected sites How many motifs to find Advanced Custom background Negative set Motif size restriction NB: Query size limited to 60kb Local installations don’t have this limit

Good Result

Good Result - Motif

Good Result - Positioning For ‘peak’ data, expect motifs to be roughly centred For promoter data there may be no pattern.

Artefactual Result - Composition MEME tends to favour long compositionally biased motifs Real motifs can be further down the list

Artefactual Result - Duplication Multiple transcripts with the same promoter Overlapping regions

AME – Known motif search Quicker / easier than de-novo discovery Limited to characterised binding sites Can choose from common motif sources Good place to start

AME Result No additional detail Could check for positional Bias with CentriMo Beware similar motifs from different factors

Discriminatory Motifs Group 1 Group 2 MEME can run in discriminatory mode DREME is designed for this specifically