Using ChIPMunk for motif discovery - quick-start guide - slide 3:10 - preparing data slide 11:12 - running ChIPMunk slide 13:14 - do I need ChIPHorde?

Slides:



Advertisements
Similar presentations
Rotate Strokes Digits A Presentation An animation will play when the slideshow is viewed. If the animation does not play, please check the system requirements.
Advertisements

Introduction to Eclipse cs112b1 – Lab01 by Rui Shi.
Short introduction to the use of PEARL General properties First tier assessments Higher tier assessments Before looking at first and higher tier assessments,
ADABAS to RDBMS UsingNatQuery. The following session will provide a high-level overview of NatQuerys ability to automatically extract ADABAS data from.
Java Development Kit Installation Guide Sun Microsystems.
Prepared by Abdullah Mueen and Eamonn Keogh
PRESS C7000/C7000P/C6000 Color Density Control Color Balance
An End-User Perspective On Using NatQuery Extraction From two Files T
1 eclipse Tips. 2 What is eclipse? Eclipse is a popular IDE (Integrated Development Environment) that we will use to create, compile, execute, and test.
Lesson One: The Beginning Chapter 2: Processing Learning Processing Daniel Shiffman Presentation by Donald W. Smith Graphics from built-in help reference.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
11-Jun-15 Getting Ready for CIT Registering and labs If you are not yet registered (and want to be), be sure I have your name and student ID We.
Copyright © 2008 Pearson Addison-Wesley. All rights reserved. Chapter 12 Separate Compilation Namespaces Simple Make Files (Ignore all class references.
Java Integrated Development Environments: ECLIPSE Part1 Installation.
Introduction to Java Lab CS110A – Lab Section 004 Instructor: Duo Wei.
Slides prepared by Rose Williams, Binghamton University Chapter 5 Defining Classes II.
26-Jun-15 Getting Ready for CIT Labs Lab is scheduled for 3;00-4:30 Fridays in Moore 207 Lab is not in Towne 309 (Registrar has it wrong) Everyone.
30-Jun-15 Getting Ready for CIT Labs Lab is scheduled for 1:30-3:00 Fridays in Moore 207 Lab is not in Towne 313 (Registrar has it wrong) Everyone.
Lesson One: The Beginning
13-Jul-15 Getting Ready for Java. 2 What You Need 256 MB of RAM (512 MB or more recommended) 500 MHz Pentium or better Macintosh: must run Mac OS X, preferably.
CS0007: Introduction to Computer Programming Setting Up Java.
® IBM Software Group © 2003 IBM Corporation How to Download and Install RMC 7.5 David Trent RMC Product Manager.
SET UP COMPUTER ** PLEASE BE AWARE SCREENSHOTS MAY NOT MATCH **
ImageJ Tutorial.
Installation & Configuration
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
1 Lab 3 Transport Layer T.A. Youngjoo Han. 2 Transport Layer  Providing logical communication b/w application processes running on different hosts 
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
1 eclipse Tips. 2 What is eclipse? Eclipse is a popular IDE (Integrated Development Environment) that we will use to create, compile, execute, and test.
How to Download and Install a Sharp Print Driver on a Mac.
XP New Perspectives on Microsoft Access 2002 Tutorial 51 Microsoft Access 2002 Tutorial 5 – Enhancing a Table’s Design, and Creating Advanced Queries and.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
The Basics of Javadoc Presented By: Wes Toland. Outline  Overview  Background  Environment  Features Javadoc Comment Format Javadoc Program HTML API.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction to Eclipse CSC 216 Lecture 3 Ed Gehringer Using (with permission) slides developed by— Dwight Deugo Nesa Matic
IBM Software Group ® Overview of SA and RSA Integration John Jessup June 1, 2012 Slides from Kevin Cornell December 2008 Have been reused in this presentation.
Indexed and Relative File Processing
Microsoft ® Office PowerPoint ® 2003 Training Playing sound [Your company name] presents:
DBLink3 Reporting Software for Model 22 Personal Noise Dosemeter system.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
With Windows 7 Introductory© 2011 Pearson Education, Inc. Publishing as Prentice Hall1 Windows 7 Introductory Chapter 3 Advanced File Management and Advanced.
9/2/ CS171 -Math & Computer Science Department at Emory University.
Setting Up Eclipse. What is Eclipse? Eclipse is a free, downloadable software that allows us to create, compile, and run JAVA programs.
FusionInspector & FusionInspectorWeb Galaxy-integration.
GumTree Development Environment Setup Windows Only Compatible with Eclipse 3.2 M3 (Last update: 16/11/05)
For additional assistance, please call the Help Desk Searching 1. If a Search window does not appear after logging into the system, click the Search icon.
Liferay Installation Prepared by: Do Xuan Hai 8 August 2011.
Using the AccuGlobe Software with the IndianaMap Using the AccuGlobe Software.
Tool Install How to download & install Java 6 & Eclipse updated version based on Dr. G. L. Ray’s slides.
UK MRC Human Genome Mapping Project Resource Centre Jemboss – a Graphical User Interface for the EMBOSS suite of programs.
Week 2 Lecture 1 Creating an Oracle Instance. Learning Objectives  Learn the steps for creating a database  Understand the prerequisites for creating.
Training Day 5 Customer Order Processing Recipe Manager © User Training September 2014 Recipe Manager Vydata Systems Training Presentation.
1 Installing Java on Your PC. Installing Java To develop Java programs on your PC: Install JDK (Java Development Kit) Add the directory where JDK was.
Course Overview, JDK & NetBeans BCIS 3680 Enterprise Programming.
Let’s not leave anything to chance. How that process to generate random numbers takes places requires some complicated statistics that are outside the.
Time to apply stuff… Faculty of Mathematics and Physics Charles University in Prague 5 th October 2015 Workshop 1 – Java Wrestling.
Copyright OpenHelix. No use or reproduction without express written consent1.
INTERNET APPLICATIONS CPIT405 Install a web server and analyze packets.
Getting data out of XML These exercises provide an overview of how to use the native Taverna XPath services to get data out of XML.
Multiple Sequence Alignment with PASTA Michael Nute Austin, TX June 17, 2016.
CACI Proprietary Information | Date 1 Upgrading to webMethods Product Suite Name: Semarria Rosemond Title: Systems Analyst, Lead Date: December 8,
Simon v1.0 Motif Searching Simon v1.0.
Tutorial: AutoTract.
1. Environment Setting Minhaeng Lee.
Add a dues payment to the Dues manager module for the Grand Lodge of Nebraska LSI Lodge Secretary Interface online Membership Database.
Simon V Motif Searching Simon V
Presentation transcript:

Using ChIPMunk for motif discovery - quick-start guide - slide 3:10 - preparing data slide 11:12 - running ChIPMunk slide 13:14 - do I need ChIPHorde? "A short guide to breeding and taming highly intelligent ChIPMunks"

Some basic questions Can I use ChIPMunk for the WHOLE PEAK SEGMENTS from ChIP-Seq experiment? –YES! But you will need to supply the “base coverage profile” (also called as the “peak shape”). Should I cut short segments around ChIP-Seq peak summits for ChIPMunk? –NO! Use the whole peaks with the base coverage data when possible. Want more details? Move to the next slides!

Prerequisites To use ChIPMunk motif discovery tool you need: –Java runtime environment (JRE, also called as Java Virtual Machine), use version no less than 1.5 May be you already have Java, test it by typing java –version Linux users: check your distro-specific package manager. ChIPMunk will run under Oracle Java as well as under OpenJDK. Windows users: go directly to java.com ! [NOTE] You do not need JDK (Java Development Kit), only JRE/JVM.

Extracting ChIPMunk Let’s assume you have successfully got your chipmunk_v?_binary.zip from the official ChIPMunk website (see downloads section): –Unpack it to any suitable folder. You now should see autosome directory. This is the ChIPMunk Java package autosome.ru. –Now you can run you ChIPMunk from the folder, that contains the autosome package. For simplicity you may wish to store the files with sequences just one level upper of the ChIPMunk’s autosome folder. ![NOTE] Do not try move anything outside of the autosome folder. Your ChIPMunk should live there.

Preparing your data: overview No prior information: simple multi-fasta, Simple data set Some arbitrary weights or quality values assigned for each sequence: multi-fasta with weights in headers, Weighted data set Prior positional profile along each sequence: multi-fasta with profiles in headers, Peak data set Peak and Weighted data sets can be useful not only for ChIP-Seq data but for any kind of data set where you have some quality rating or known positional preferences.

Preparing your data: Simple data set The simplest case: you already have a number of sequences to be used for motif discovery with ChIPMunk. No additional information is available. –You should arrange a simplest multi-fasta file like > header1 ACTGTGTGAAA > header2 AGTGTGTGTGTG ![NOTE] You can omit fasta headers since ChIPMunk would simply skip them. Remember – this is Simple data set.

Preparing your data: Weighted data set Let’s assume you have some prior information like any quality rating or any prior measure of presence/power of binding sites. –You should arrange a simple multi-fasta file specifying your arbitrary quality of each sequence in fasta headers: > 10.0 ACGGTGTAAAAA > 2.0 GGTAGTGTCGTAGTG ![NOTE] Your weights (quality values) should always be positive. Never use negative or zero- quality. Remember, this is Weighted data set.

Preparing your data: Peak data set If you have any prior profile information like shape of ChIP-Seq peaks than you can provide a profile in the fasta-header like: > AGTAAC > CAGTA ![NOTE] The length of each profile should be equal to the length of the corresponding sequence. Remember, this is Peak (or Profiled) data set.

ChIP-Seq data: what to do 1.The best usage case: ChIP-Seq data with base coverage (often provided in wiggle-files,.wig). Extract peak heights for each position of each sequence and generate the Peak multi-fasta. 2.Only peak height h and peak summit position is known. You should manually generate triangle profiles with triangle shape, having 0.0 height at both ends of the sequence and h height at peak summit position. 3.Only peak height h is known. Then use the weighted data set specifying h as weight/quality. ![NOTE] When available always use base coverage information or generate triangle profiles. This is extremely important for ChIPMunk performance.

Running ChIPMunk: specifying data set So, now you know the type of your sequence.mfa dataset. It is either Simple (s:sequence.mfa), weighted (w:sequence.mfa) or peak (p:sequence.mfa). Remember to supply it to ChIPMunk like p:sequence.mfa if your file is placed in your current directory. You can specify the local path to your file after p: if your file is located somewhere else on your drive. ![NOTE] We highly advise to use the peak data set if possible.

Running ChIPMunk: default mode java -Xms512M -Xmx1G autosome.ru.ChIPMunk p:your_sequences_with_profiles.mfa > output.log This will produce output.log with all informative output and allow Oracle Java to use from 512Mb to 1Gb of RAM. ! [NOTE] This will be the best way to search for unknown motif and allow ChIPMunk automatically use default parameter settings.

Running ChIPMunk: tweaking parameters The most obvious things you can tweak are: the motif lengths range (from 7 to 22bp for example): java -Xms512M -Xmx1G autosome.ru.ChIPMunk 7 22 yes 1.0 w:your_weighted_set.mfa The number of starting seeds, increasing the number from default 100 will improve precision: java -Xms512M -Xmx1G autosome.ru.ChIPMunk 7 22 yes 1.0 w:your_weighted_set.mfa 200 Allow ChIPMunk to automatically estimate the background model instead of predefined 0.5 GC%: java autosome.ru.ChIPMunk 7 22 yes 1.0 p:peak_data.fasta random local ! [NOTE] Don’t hesitate to consult with ChIPMunk manual or to contact ivan-dot-kulakovskiy-at-gmail-dot- com. There are many useful advanced options for ChIPMunk.

ChIPHorde extension: do I need it? You want to find the most significant motif in the set (for example find a common motif for a given transcription factor, TF) –ChIPHorde? NO, ChIPMunk is enough. You want to check different motif lengths (like 10, 12 and 15 bps) and manually select the best motif. –ChIPHorde? NO, run ChIPMunk several times with 10 to 10, 12 to 12 and 15 to 15 motif length ranges. –OR YES, you can run ChIPHorde in its ‘dummy’ mode like: java autosome.ru.ChIPHorde 10:10,12:12,15:15 dummy yes 1.0 w:your_weighted_sequence_set.mfa ! [NOTE] So, if you want to find the MOST SIGNIFICANT motif for a dataset then you DO NOT NEED ChIPHorde extension. But you can use it in dummy mode to check different lengths and then manually select required motifs.

You need ChIPHorde if You suspect different distinct motifs for your TF. Use ‘filter’ mode (dropping sequences with motif hits from the previous step): java autosome.ru.ChIPHorde 7:21,7:21,7:21 filter yes 0.0 w:your_weighted_sequence_set.mfa You want to find potential cofactor TFs. Use ‘mask’ mode (masking good motif hits from the previous step): java autosome.ru.ChIPHorde 7:21,7:21,7:21 mask yes 0.0 w:your_weighted_sequence_set.mfa The length range from 7 to 21bp is used to search for three different motifs. ![NOTE] ZOOPS factor (0.0 in this example) may heavily affect results. Please consult the manual! The length ranges are also important, especially in ‘mask’ mode.