Download presentation
Presentation is loading. Please wait.
Published byNelly Müller Modified over 5 years ago
1
Nora Pierstorff Dept. of Genetics University of Cologne 30.8.2005
Combined ab initio and comparative analysis of putative regulatory regions Nora Pierstorff Dept. of Genetics University of Cologne
2
Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion
3
Eukaryotic regulation model
4
3 Approaches Search for binding sites of known transcription factors using Position Weight Matrices. Search for conserved motifs in upstream-regions of homolog or coregulated genes. Search statistical overrepresented motifs
5
Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion
6
Ab Initio Approach (overrepresented patterns)
overrepresented patterns are frequent in the DNA => many false positive predictions amount of available data is not large enough to find additional reliable universally valid rules
7
Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion
8
Dataset (collected by Nazina et al. 2003)
target-species: Drosophila melanogaster reference species: D. yakuba D. ananassae D. pseudoobscura D. virilis # sequences: 39 # bp: # regulatory regions: 87 # bp in enh: enhancer/sequence: 2.462 amount of bp in enhancers: Dorsal motif dorsal matches
9
Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion
10
Are enhancers alignable?
Emberly et al. (2003) the overlap of binding sites and conserved sequence blocks is not much greater than by chance, but still statistically significant compared organisms: D. melanogaster and D. pseudoobscura alignment methods: LAGAN, SMASH (construct chains of local alignments)
11
Assumptions about enhancer conservation
binding sites contain core sequences essential to bind transcription factor core sequences are conserved between binding sites of one species and between species binding sites are indicated by short, exactly conserved, overrepresented patterns
12
Alignment of short exact matches
input: chain of high scoring fragments from blastn alignment of each sequence pair output: regions containing a high amount of short conserved stretches
13
Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion
14
Result using only comparative approach with 5 species
m8 region score = number of short conserved stretches in a 200bp window
16
Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion
17
searching overrepresented motifs in conserved region
input: all short conserved words 1. step: counting the occurrence of all 5bp-substrings of the word in the 1000 surrounding basepairs 2. calculating one observed/expected ratio for every species output: conserved stretches containing at least one 5mer which is overrepresented in each species
18
Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion
19
Improvement by combination
m8 region score = number of short conserved stretches in a 200bp window m8 region score = number of short conservedoverrepresented stretches in a 200bp window
20
improvement by combination
21
Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion
22
Discussion use of a combination of methods improves predictions
in nearest future regulatory regions can be found without knowing the binding transcription factors, if enough related species are known. more features to differ between conserved regulatory regions and other functional conserved regions need to be found
23
References E. Emberly, N. Rajewsky, E. Siggia (2003) Conservation of regulatory elements between two species of Drosophila BMC Bioinformatics 2003, 4:57 A. Nazina, D. Papatsenko (2003) Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency. BMC Bioinformatics Dec 22;4:65.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.