ORDered ALignment Information Explorer
Alignment editor Conservation computtion “barcode” = schematic alignment Phylogenic tree 3D viewer => sequence / structure / function / evolution cross-talks Sequence Clustering Features Editor
Alignment Positions Taxa Contexts Exploring Alignment Information up to the residue Level Global level Clusterings level Single Taxa Level Full length Domains Motifs, secondary structures, ….. Residues X x x 3D structure conservation phylogeny
Reads ALN, MSF, TFA, RSF, Macsims/XML, ORD file formats What is an alignment ? - description of the alignment (NorMD score, date, etc …) - set of sequences generic information (length, EC, phylogeny, …) features (PFAM-A, PROSITE, BLOCK, etc …) - clustering = groups of sequences - conservation scores based on clustering and Alignments :
Sequence editingClustering editing Current Alignment Overwrite current Create new MACSIM
Ordalie parameters (colors, fonts, thresholds, …) Description of the alignment (name, NorMD score, creation date,...) Original Set of aligned sequences - general information (length, pI, mol. Weight, …) - features (Pfam domain, secondary structures, …) - AA sequence Coordinates of 3D structures corresponding to PDB entries Description of 3D objects (representation type, colors, etc …) M 3 – new clustering Clustering 1 Sequences set 1 -> conservation M 4 – edit sequences Clustering 1 Edit Sequences -> conservation M 5 – clust. + edit Clustering 2 Edit Sequences -> conservation Inside : M 2 – macsims clustering Macsims Clustering Original Sequences set -> original conservation M 1 – original alignment Original Sequences set
SQlite Database accessible through SQL statements ODBC compatible Platform independant Light weight Contains all Ordalie data preferences performances ORD : file format
Modes : - features - search - pairwise identity - sequences editor - features editor - clustering - trees - conservation - superposition
Zone selection : Whole alignment By Feature User defined Criterions : % identity pI Length Composition (aminoacid, physico-chemical groups) Clustering Methods : Manual clustering by inserting/removing separators Hierarchical classification + Secator Kmeans + DPC Mixture model + AIC Clustering:
Threshold Global Identity -> 100% Identity Global Conserved -> >80% identity. Group Identity -> 100 % identity in group Mean Distance as cf ClustalX Vector Norm based on a vectorial (polarity,volume) representation of amino acids Liu2 based on Blosum62 Entropy takes gaps and physico-chemical properties of AA into account Validity of score clustering ? Conservation Methods :
Key Usage Points : Always leave a mode before entering a new one Sequences selection : « à la Windows » - selects a sequence - add current seq. to selection - Zone selection : - All (button) - selecting a feature - manuaally : - for starting point - for ending point - to delete a selected zone
TODO List : Short term : - Bugs, if any …. ;-) - group naming - project handling - MacOS X version - documentation and tutorials - publication Long term : - Bugs, if any …. ;-) - on-line web services - on-line Macsims calculation - on-line sequence, information, feature updating - 3D surface mapping of features. - ….
Running Ordalie : On surf/lameX : - setordalie - ordalie - ordalie option value option value File formats: MSF, TFA, ALN, RSF, XML/Macsims and ORD Conversion : ordalie toto.msf –convert ALN - toto.aln
1985 Enseignement