Download presentation
Presentation is loading. Please wait.
1
Probe design for microarrays using OligoWiz Rasmus Wernersson, Assistant Professor Center for Biological Sequence Analysis Technical University of Denmark
2
Probe design -What is a Probe -OligoWiz -Probe Design -Cross Hybridization and Complexity -Affinity -Position for microarrays
3
Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical Analysis Fit to Model (time series) Expression Index Calculation Advanced Data Analysis ClusteringPCAClassification Promoter Analysis Meta analysisSurvival analysisRegulatory Network Comparable Gene Expression Data Normalization Image analysis The DNA Array Analysis Pipeline
4
An Ideal Probe - Discriminate well between its intended target and all other targets in the target pool - Detect concentration differences under the hybridization conditions must
5
comparisons AdvantagesDisadvantages PCR products Inexpensive Linkers can be applied Handling problems Hard to design to avoid cross- hybridization Unequal amplification Oligos Can be designed for many criteria Easy to handle Normalized concentrations Linkers can be applied Expensive (Dkk. 100-150 per oligo) Affymetrix GeneChip High quality data Standardized arrays Fast to set up Multiple probes per gene Expensive Arrays available for limited number of species Probe Type
6
OligoWiz a Tool for flexible probe design
7
OligoWiz 2.0 is a client-server application for designing oligonucleotides for microarrays The OligoWiz client (the graphical interface) is written in Java 1.4 and runs on virtually all platforms The OligoWiz Server performs the heavy-duty computation and is hosted on a multi-CPU Altix server at CBS. OligoWiz is created by Henrik Bjørn Nielsen and Rasmus Wernersson both at the Center for Biological Sequence Analysis at the Technical University of Denmark. About OligoWiz How and Who
8
About the OligoWiz scores All scores are normalize to a value between 0.0 (worst) and 1.0 (best). All scores are independent and is assigned a user-adjustable weight. A total score is calculated as the sum of all weighted scores and is normalized to a value between 0.0 and 1.0.
9
How to Avoid From Kane et al. (2000) we learn that a 50’mer probe can detect significant false signal from a target that has >75-80% homology to a 50’mer oligo or a continuous stretch of >15 complementary bases If we have substantial sequence information on the given organism, we can try to avoid this by choosing oligos that are not similar to any other expressed sequences. cross-hybridization
10
Hughes et al. 2001 Probe Specificity
11
Mapping Regions 5’ BLAST hits >75% & longer than 15bp 3’ The Sequence we want to design a probe for 50 bp Regions suitable for probes without similarity to other transcripts
12
BLAST hits >75% & longer than 15bp 5’ 3’ Sequence identical or very similar to the query sequence Therefore no BLAST hits with homology > 97% and with a ‘hit length vs. query length’ ratio > 0.8, are considered. 50 bp Filtering Self Detecting BLAST hits out The Sequence we want to design a oligo for
13
Only BLAST hits that passed filtering are considered If m is the number of BLAST hits considered in position i. Let h=(h1 i,...,hm i ) be the BLAST hits in position i in the oligo Where n is the length of the oligo Cross-hybridization Oligo BLAST hits { Max hit in pos. i 100% 0 expressed as a ‘homology score’
14
Similar Affinity Another way of ensuring a optimal discrimination between target and non-target under hybridization is to design all the oligos on an array with similar affinity for their targets. This will allow the experimentalist to optimize the hybridization conditions for all oligos by choosing the right hybridization temperature and salt concentration. Commonly Melting Temperature (Tm) is used as a measure for DNA:DNA or RNA:DNA hybrid affinity. for all oligos
15
Where H (Kcal/mol) is the sum of the nearest neighbor enthalpy, A is a constant for helix initiation corrections, S is the sum of the nearest neighbor entropy changes, R is the Gas Constant (1.987 cal deg-1 mol-1) and Ct is the total molar concentration of strands. Where N is all oligos in all sequences. Melting Temperature difference
16
Tm distributions for 30’mers and 50’mers
17
Tm Distribution for oligo length intervals
18
Avoid self annealing oligos Probes that form strong hybrids with it self i.e. probes that fold should be avoided. But, accurate folding algorithms like the one employed by mFOLD or RNAfold, is too time consuming, for large scale folding of oligos. Sensitivity may be influenced Time consumption: mFOLD ~2 sec / 30’mer Pr. gene (500bp) ~16 min.
19
Folding an oligonucleotide AT TG CT...............................................................................CG GT TT AT TG CT..............................................................................CG GT TT............................. Minimal loop size border Dynamic programming: alignment to inverted self The alignment is based on dinucleotides { { { {{{ Substitution matrix is based on binding energies an approximation
20
Folding a lot of oligos AT TG CT..............................................................................CG GT TT AT TG CT............................................................................CG GT TT Dynamic programming calculation for second etc. probe Full dynamic programming calculation for first probe Super-alignment matrix.................................................... Minimal loop size border Last probe...... a fast heuristic implementation
21
Reasonably folding prediction compared to mFOLD
22
Probes With Very Common Oligo with low-complexity: AAAAAAAGGAGTTTTTTTTCAAAAAACTTTTTAAAAAAGCTTTAGGTTTTTA (Human) Oligo without low-complexity: CGTGACTGACAGCTGACTGCTAGCCATGCAACGTCATAGTACGATGACT (Human) sub sequences may result in unspecific signal If the sub-fractions of an oligo are very common we define it as ‘low-complex’
23
Where norm is a function that normalizes to between 1 and 0, L is the length of the oligo and W i is the pattern in position i. expressed as a score For a given transcriptome a list of information content from all ‘words’ with length wl (8bp) is calculated: Where f(w) is the number of occurrences of a pattern and tf(w) is the total number of patterns of length wl. A low-complexity score for a given oligo is defined as: Low-complexity = 1-norm Low-complexity
24
Location of Oligo within transcript Labeling include reverse transcription of the mRNA and is sensitive to: - RNA degradation - Premature termination of cDNA synthesis - Premature termination of cRNA transcription (IVT) A ‘Position Score’ reflecting this (eukaryotes): Position score= (1-drp) 3’end Where drp is the chance of labeling termination pr. base
25
Species databases 215 species currently available The species databases are built from complete genomic sequences or UniGene collections in the case of Vertebrates. The databases are used for: Cross hybridization Low-complexity
26
Sequence Features -Special purpose arrays -Example: Detecting Differential splicing Intron/Exon structure, UTR regions etc. Exon Intron Exon
27
Annotation String Single letter code. Sequence:ATGTCTACATATGAAGGTATGTAA Annotation:(EEEEEEEEEEEEEE)DIIIIIII E: Exon I: Intron (: Start of exon ): End of exon D: Donor site A: Accepter site - single letter code
28
Extracting annotation -FeatureExtract server -www.cbs.dtu.dk/services/FeatureExtractwww.cbs.dtu.dk/services/FeatureExtract from GenBank files
29
Excercise Running OligoWiz 2.0 Java 1.4.1 or better required Input data Sequence only (FASTA) Sequence and annotation Rule-based placement of multiple probes Distance criteria Annotation criteria Please go to the exercise web-page linked from the course programme.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.