Presentation is loading. Please wait.

Presentation is loading. Please wait.

A platform for pattern discovery in sets of biological sequences C. Alland, J. Nicolas.

Similar presentations


Presentation on theme: "A platform for pattern discovery in sets of biological sequences C. Alland, J. Nicolas."— Presentation transcript:

1 A platform for pattern discovery in sets of biological sequences C. Alland, J. Nicolas

2 Framework : bioinformatics platform of Genopole Ouest Functional exploration Proteomics Sequencing Genotyping Biochips Bioinformatics  Coordination  Data Bases  Bioinformatics Software  High Performance Computing  Teaching  Coordination  Data Bases  Bioinformatics Software  High Performance Computing  Teaching PCIO SunFire 6800 56 UltraSparc III 56 Go RAM PCIO SunFire 6800 56 UltraSparc III 56 Go RAM http://www.sb-roscoff.fr/ BioInfo-GPO/ O. Collin H. Leroy

3 Welcome Page of the bioinformatics platform service http://idefix.univ-rennes1.fr:8080/ Serveur-GPO/

4 Software Page of the bioinformatics platform service http://idefix.univ-rennes1.fr:8080/ Serveur-GPO/services.php

5 Aims of the project Annotation of genomes : Discovery of new genes/proteins Characterization of functional families Experimental comparison of methods : Choice of complexities and representations of patterns Copy/Implementation of several algorithms Practical tool : Parameter tuning Filtering… Set of biological sequences Common characteristic or discriminant pattern

6 Architecture of the platform Pattern Discovery Algorithms Visualization of results Alignment of sequences Search in banks Pattern filtering Tool box Supervisor Search of patterns Statistical Analysis of inter-motif regions Practical Use Refinement Interface

7 Welcome page of the pattern discovery service Jonassen Marsan Pevzner Regular languages inferring methods

8 Brazma hierarchy for (generalized) regular patterns +J full regular languages (finite automata)

9 Example of the discovery of candidates in the defensin family Defensins are a major family of antimicrobial peptides found in mammals, cationic peptides of 28-42 amino acids length containing 3 intramolecular disulfide bonds. Starting point : a set of 30 sequences (including all organisms), 4 for human. Aim : discovery of new candidates Collaboration with GERM (C. Pineau, F. Bourgeon) directed by B. Jégou, staffed with 40 people and specialized in researches on male reproduction in mammals.

10 Pratt : principle of the algorithm 1.One starts from a pattern graph containing all the most specific allowed patterns covering at least k of the n sequences in the training set; 2.A pattern search tree is explored starting from the most general one (empty pattern) and specializing it by adding allowed components (belonging to the pattern graph + generalization operators) while patterns obtain a better score. Several scores and search strategies are available; 3.The most significant patterns are filtered and a refinement phase may be applied to specialize flexible wild card with ambiguous letters

11 Pratt : three levels of use 1.Simple : most parameters are fixed or simplified; 2.Expert: all parameters available; 3.Meta : Pratt is applied to sequences of patterns.

12 Simple Pratt parameters

13 Simple Pratt results

14 Advanced Pratt parameters

15 Advanced Pratt results

16 Visualization of selected results

17 Meta Pratt

18 Search pattern in a databank

19 Results of the search in a databank

20 View of the search in a databank

21 Statistical Analysis of inter-motif regions

22 Results for refinment of patterns

23 Reverse Search in a Genome

24 Reverse Search in a Genome : principle From the patterns and knowledge of exons/introns splicing, a formal grammar may be inferred. Genomes are translated in the six frames and compiled in a suffix tree data structure. Syntactical analysis is done with the help of operations on suffix trees and results in potential new candidates. To: jnicolas@irisa.fr Pattern : C-x(2,4)-G-x(1,3)-C-x(3,4)-C-x(7)-[AG]-[HKNRST]-C-x(5,6)-C-C Organisme Chromosome Phase Position LengthOcc Length Ch preOcc Occ postOcc No match

25 Conclusion / Perspectives  10 new potential defensins discovered  Importance of a complete environment : coupling highly expressive patterns with syntactical search in banks  Current research : « meta level » using grammatical inference. Infer any regular language from a set of positive AND negative instances.  Open questions : Better filtering of patterns, introduction of probabilities, long distance interaction.

26


Download ppt "A platform for pattern discovery in sets of biological sequences C. Alland, J. Nicolas."

Similar presentations


Ads by Google