Copyright © 2004 Synamatix sdn bhd ( U) For audio portion of webcast please dial: +44 (0) (please omit zero if calling from outside the UK) PIN =
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Personal Introductions Robert Hercus - MD and Inventor, Synamatix Over 30 years IT experience Pioneered many large-scale IT projects “Language of Biology” basis of Synamatix Interests: Linguistics, Genomics, Artificial Intelligence Ali Zamli – Bioinformatician Research Scientist Synamatix applications development Dr. Arif Anwar – VP, Synamatix 10 yrs+ post-Ph.D. US and EU genomics background Ex – Agilent, CLONTECH and Axon Instruments
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Questions to answer today? 1.What is a SynaBASE? 2.What are the advantages of using SynaBASE? 3.In which situations has SynaBASE been applied to? 4.Does the use of SynaBASE offer any advantages for phylogenetics?
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Core IP - SynaBASE ™ - PLATFORM Main partners and users in US and EU 50+ staff split across group Open approach to development – engine not software Focused on efficient HPC for Genomics and Life Sciences
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = API calls Graphical Interface Command line interface Applications SXSequenceRefs SXLRESearch SXFuzzyPatternSearch SXAlign Sxpet SXParse CORE Database platform Data analysis Develop Tools SynaRex Bulk SynaProbe Bulk SynaSearch Bulk
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Software policy More than 40 existing applications All open source to licensees of SynaBASE Users can also develop, modify and share all applications
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = What do we know about data ? Similarity & association Common PATTERNS and functionality
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = ACT AAACCTTC AACACTCTC AACTACTC AACTC Pattern Trie Going to leaf node finds all sources and positions More memory efficient than variable length data structures
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = ACT AAACCTTC AACACTCTC AACTACTC AACTC Pattern Trie f=20f=100 AAA Low complexity repeats - filtered High frequency patterns removed from alignment seeding
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Building a SynaBASE – easy and fast
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Takes 8 minutes for Swissprot The fields in the build form are equivalent to the command-line XML configuration Fields data is converted into XML format and added to the existing entry in the Synabase XML configuration file
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = ACT AAACCTTC AACACTCTC AACTACTC AACTC Pattern Trie Trie Boundary Frequency is greater than build limit
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Flexibility to use CMD line
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Single-server IT architecture SynaBASE & SynaSuite Server HP Integrity rx4640 server Dual Intel Itanium2 1.5GHz CPU 64 GB DDR memory 146GB Ultra320 SCSI hard disk x 2 Red Hat Enterprise Linux AS 3 for IA64
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = SynaBASE scales efficiently
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = SynaBASE enables very fast access Number of levels small For a query: Match 1 st longest pattern Follow Eulerian path through network, picking up longest matching pattern for each posn. In query Processing time is: Proportional to query size to obtain all unique subpatterns ACT AAACCTTC AACACTCTC AACTACTC AACTC ACTCG CTCG CTCGA TCGA
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Efficiency leads to high performance Only 15million nodes are needed to represent 56million residues The storage of the shorter nodes has little effect
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = SynaBASE is very fast - Q* logN base A Size of database mega bp Speed milliseconds Conventional SynaBASE
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = BLASTN vs. SynaSearch-Bulk Cumulative Number of hits shows SynaSearch Bulk found extra hits at low-mid identities SynaBASE and Blast DB of Bacterial ORFs queried with 100 1kb sequences Novel hits
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = The elephant and the giraffe walked up the mountain A graph showing Frequency of “string (word)” patterns in a sentence does not reflect meaning A graph showing Probabilities of predicting Precessor and Successor Characters/events (string Significance) reflecting meaning 4. Novel annotation using SynaBASE
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = S ig (a 1 a 2 a 3 ) = F(a 1 a 2 a 3 ) / Ef(a 1 a 2 a 3 ) = F r (a 1 a 2 a 3 ) * F(a 2 ) F(a 1 a 2 ) * F(a 2 a 3 ) a1a1 a2a2 a3a3 a1a2a1a2 a2a3a2a3 a1a2a3a1a2a3 Expected Frequency Ef(a 1 a 2 a 3 ) = F(a 1 a 2 ) * F(a 2 a 3 ) F(a 2 ) Actual Freq/Expec Freq SIGNIFICANCE
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = PIM1 Oncogene F2 F3 Ensembl Gene Gene models correlate with “ SIGNIFICANCE”
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Example assembly result 400,000 reads assembled into 11 contigs in 11 minutes, 2 minutes for error correction Genome coverage 99.89%
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = FragBASE – using the SynaBASE structure…. Select patterns of high coverage Use corrected FragBASE Use FragBASE network* to extend patterns Increase pattern size to overcome shorter repeat sections
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Example 2 - Microarrays Probe design – mer probes, 8 per gene in 8h compared to previous 3 month+ process Probe evaluation and mapping Mapping of 600,000 Affymetrix 25mer probes to Human genome in 17s Compares to over 2 weeks with BLAST
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Example 3 – Comparative Genomics 3 yrs SynaBASEBLAST 6h PatternHunter 22days
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Example 4 – Genome mapping Aims: Mapping of whole genome shotgun reads from a mammalian genome to the Human Genome, to facilitate genome assembly using Synamatix and public tools. Compare sensitivity, specificity and performance advantages of Synamatix technologies. Results: In comparison to BLASTz, SynaSearch: Is 219 fold faster Finds 11% more true positives Finds 17% more unique hits to queries Has a higher specificity: 113% fewer false positives fewer multiple placements per read – 2.7 v 5.3 Benefits: Enables significant enhancements in workflow throughput. 219 fold compute time improvement SynaSearch requires only 1 search process whereas BLASTz requires genome to be separated into 5MB chunks and apportioned across multiple processors. Results in better assemblies of new genomes. Reduces current reliance on outsourcing of BLASTz analysis.
Copyright © 2004 Synamatix sdn bhd ( U) “Inference of a phylogenetic network of whole prokaryotic genomes using SynaBASE” Further example of use of SynaBASE engine: applying SynaBASE to Phylogenetics
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Outline of study Primary data set 1: 101 Bacterial and Archaeal Genomes Used “SynaTree” – exhaustive comparison between “Sequences” in SynaBASE structure Generates phylogenetic tree Used prototype Synamatix application: “SXComparePattern” – exhaustive pattern based similarity matching Evaluation of methods using: C-score method* Group visualisation and clustering analysis Tested “SXComparePattern” method with a larger 488 Bacterial Genome data set *Henz S.R., Huson D.H., Auch A.F. Struwe K.N-. and Schuster S.C. (2005) Whole-genome prokaryotic phylogeny. Bioinformatics. 21(10):
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Phylogenetics using SynaTree For each query genome, can search SynaBASE for all alignments with all other genome sequences {srefs, posn, length} The alignment scores can then be used to calculate a distance matrix: The distance matrix is used to generate a phylogenetic tree Where: A = alignment score L = length of respective genomes
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = SynaTree Interface
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = It can be seen from the chart that the resulting triplet in a sliding window include significant alignments and also spurious short matches that are not significant. The SynaBASE align function, SXAlign, includes a filter to remove the random short alignments or 'noise' from the alignment data. The alignment scores are then used to calculate a distance matrix SynaTree uses the SXAlign API for comparing alignments SynaTree uses SXAlign API for comparing alignments
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Example of filtering Chart shows the effect of using diagonal alignment filter on the alignment of 2 Serine Kinase aa sequences
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = minutes! Compared to 7 days with BLAST SynaTree for 101 bacterial & archaeal genomes
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Cyanobacteria FirmicuteChlamydiae SynaTree for 101 bacterial & archaeal genomes
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = nd method: SXComparePattern Frequency of each pattern Raw score for patterns Calculation of distance matrix from raw score by distance formula
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = SXComparePattern Approach Distance matrix calculated is the same as before with some exceptions: Here, the calculation is based on shared patterns between each genomic sequences Where: A =shared patterns between genomes i and j L= number of patterns for respective genomes
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = seconds! Compared to 7 days with BLAST SXComparePattern tree for 101 bacterial and archaeal genomes
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Chlamydiae Cyanobacteria Firmicute SXComparePattern tree for 101 bacterial and archaeal genomes
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Perfomance based on grouping
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Evaluation of phylogenetic networks Evaluation of phylogenetic networks based on c-score proposed by Henz, et al. (2005) Which is essentially a sum of compatible non-trivial splits (Tc) divided by the sum of all non-trivial splits in the test tree Assumption is that the compatability of non-trivial splits is compared against a reference tree which is deemed 'correct'.
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = NCBI Reference Tree
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN =
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Zoomed tree of 488 Bacterial Genomes
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Performance comparison Rapid method for inferring phylogenetic networks. SXComparePattern highlighted above and marked with * is with 488 bacterial sequences
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = Summary SynaBASE platform extensible to phylogenetics Pattern based approach provides for a very rapid and scalable means of clustering genomes into phylogenetic networks Enables multi-supercomputer performance from a single server This same approach can be used to cluster and analyse previously improbable data sets, e.g. All primate genomes All genes Iterative analysis of evolutionary phylogenetics
Copyright © 2006 Synamatix sdn bhd ( U) For audio of webcast please dial: +44 (0) (omit zero if calling from outside the UK) PIN = END OF WEBCAST Thank you for your participation! Next Webcast will be on April 30 – “Use of SynaBASE for assembly of reads from 454 Life Sciences sequencing platform” A full paper of the work presented will be sent to you on Monday next week Please if you have any questions or would like a free trial