Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2004 Synamatix sdn bhd (538481-U) For audio portion of webcast please dial: +44 (0)870 22 333 65 (please omit zero if calling from outside.

Similar presentations


Presentation on theme: "Copyright © 2004 Synamatix sdn bhd (538481-U) For audio portion of webcast please dial: +44 (0)870 22 333 65 (please omit zero if calling from outside."— Presentation transcript:

1 Copyright © 2004 Synamatix sdn bhd (538481-U) For audio portion of webcast please dial: +44 (0)870 22 333 65 (please omit zero if calling from outside the UK) PIN = 444888

2 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Personal Introductions Robert Hercus - MD and Inventor, Synamatix Over 30 years IT experience Pioneered many large-scale IT projects “Language of Biology” basis of Synamatix Interests: Linguistics, Genomics, Artificial Intelligence Ali Zamli – Bioinformatician Research Scientist Synamatix applications development Dr. Arif Anwar – VP, Synamatix 10 yrs+ post-Ph.D. US and EU genomics background Ex – Agilent, CLONTECH and Axon Instruments

3 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Questions to answer today? 1.What is a SynaBASE? 2.What are the advantages of using SynaBASE? 3.In which situations has SynaBASE been applied to? 4.Does the use of SynaBASE offer any advantages for phylogenetics?

4 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Core IP - SynaBASE ™ - PLATFORM Main partners and users in US and EU 50+ staff split across group Open approach to development – engine not software Focused on efficient HPC for Genomics and Life Sciences

5 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 API calls Graphical Interface Command line interface Applications SXSequenceRefs SXLRESearch SXFuzzyPatternSearch SXAlign Sxpet SXParse CORE Database platform Data analysis Develop Tools SynaRex Bulk SynaProbe Bulk SynaSearch Bulk

6 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Software policy More than 40 existing applications All open source to licensees of SynaBASE Users can also develop, modify and share all applications

7 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 What do we know about data ? Similarity & association Common PATTERNS and functionality

8 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 ACT AAACCTTC AACACTCTC AACTACTC AACTC Pattern Trie Going to leaf node finds all sources and positions More memory efficient than variable length data structures

9 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 ACT AAACCTTC AACACTCTC AACTACTC AACTC Pattern Trie f=20f=100 AAA Low complexity repeats - filtered High frequency patterns removed from alignment seeding

10 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Building a SynaBASE – easy and fast

11 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Takes 8 minutes for Swissprot The fields in the build form are equivalent to the command-line XML configuration Fields data is converted into XML format and added to the existing entry in the Synabase XML configuration file

12 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 ACT AAACCTTC AACACTCTC AACTACTC AACTC Pattern Trie Trie Boundary Frequency is greater than build limit

13 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Flexibility to use CMD line

14 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Single-server IT architecture SynaBASE & SynaSuite Server HP Integrity rx4640 server Dual Intel Itanium2 1.5GHz CPU 64 GB DDR memory 146GB Ultra320 SCSI hard disk x 2 Red Hat Enterprise Linux AS 3 for IA64

15 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 1. SynaBASE scales efficiently

16 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 2. SynaBASE enables very fast access Number of levels small For a query: Match 1 st longest pattern Follow Eulerian path through network, picking up longest matching pattern for each posn. In query Processing time is: Proportional to query size to obtain all unique subpatterns ACT AAACCTTC AACACTCTC AACTACTC AACTC ACTCG CTCG CTCGA TCGA

17 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Efficiency leads to high performance Only 15million nodes are needed to represent 56million residues The storage of the shorter nodes has little effect

18 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 3. SynaBASE is very fast - Q* logN base A Size of database mega bp Speed milliseconds 1101001000 100 200 300 400 500 600 700 800 900 Conventional SynaBASE

19 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 BLASTN vs. SynaSearch-Bulk Cumulative Number of hits shows SynaSearch Bulk found extra hits at low-mid identities SynaBASE and Blast DB of 700000 Bacterial ORFs queried with 100 1kb sequences Novel hits

20 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 The elephant and the giraffe walked up the mountain A graph showing Frequency of “string (word)” patterns in a sentence does not reflect meaning A graph showing Probabilities of predicting Precessor and Successor Characters/events (string Significance) reflecting meaning 4. Novel annotation using SynaBASE

21 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 S ig (a 1 a 2 a 3 ) = F(a 1 a 2 a 3 ) / Ef(a 1 a 2 a 3 ) = F r (a 1 a 2 a 3 ) * F(a 2 ) F(a 1 a 2 ) * F(a 2 a 3 ) a1a1 a2a2 a3a3 a1a2a1a2 a2a3a2a3 a1a2a3a1a2a3 Expected Frequency Ef(a 1 a 2 a 3 ) = F(a 1 a 2 ) * F(a 2 a 3 ) F(a 2 ) Actual Freq/Expec Freq SIGNIFICANCE

22 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 PIM1 Oncogene F2 F3 Ensembl Gene Gene models correlate with “ SIGNIFICANCE”

23 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Example 1 - 454 assembly result 400,000 reads assembled into 11 contigs in 11 minutes, 2 minutes for error correction Genome coverage 99.89%

24 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 FragBASE – using the SynaBASE structure…. Select patterns of high coverage Use corrected FragBASE Use FragBASE network* to extend patterns Increase pattern size to overcome shorter repeat sections

25 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Example 2 - Microarrays Probe design – 30000 75mer probes, 8 per gene in 8h compared to previous 3 month+ process Probe evaluation and mapping Mapping of 600,000 Affymetrix 25mer probes to Human genome in 17s Compares to over 2 weeks with BLAST

26 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Example 3 – Comparative Genomics 3 yrs SynaBASEBLAST 6h PatternHunter 22days

27 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Example 4 – Genome mapping Aims: Mapping of whole genome shotgun reads from a mammalian genome to the Human Genome, to facilitate genome assembly using Synamatix and public tools. Compare sensitivity, specificity and performance advantages of Synamatix technologies. Results: In comparison to BLASTz, SynaSearch: Is 219 fold faster Finds 11% more true positives Finds 17% more unique hits to queries Has a higher specificity: 113% fewer false positives fewer multiple placements per read – 2.7 v 5.3 Benefits: Enables significant enhancements in workflow throughput. 219 fold compute time improvement SynaSearch requires only 1 search process whereas BLASTz requires genome to be separated into 5MB chunks and apportioned across multiple processors. Results in better assemblies of new genomes. Reduces current reliance on outsourcing of BLASTz analysis.

28 Copyright © 2004 Synamatix sdn bhd (538481-U) “Inference of a phylogenetic network of whole prokaryotic genomes using SynaBASE” Further example of use of SynaBASE engine: applying SynaBASE to Phylogenetics

29 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Outline of study Primary data set 1: 101 Bacterial and Archaeal Genomes Used “SynaTree” – exhaustive comparison between “Sequences” in SynaBASE structure Generates phylogenetic tree Used prototype Synamatix application: “SXComparePattern” – exhaustive pattern based similarity matching Evaluation of methods using: C-score method* Group visualisation and clustering analysis Tested “SXComparePattern” method with a larger 488 Bacterial Genome data set *Henz S.R., Huson D.H., Auch A.F. Struwe K.N-. and Schuster S.C. (2005) Whole-genome prokaryotic phylogeny. Bioinformatics. 21(10): 2329-2335

30 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Phylogenetics using SynaTree For each query genome, can search SynaBASE for all alignments with all other genome sequences {srefs, posn, length} The alignment scores can then be used to calculate a distance matrix: The distance matrix is used to generate a phylogenetic tree Where: A = alignment score L = length of respective genomes

31 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 SynaTree Interface

32 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 It can be seen from the chart that the resulting triplet in a sliding window include significant alignments and also spurious short matches that are not significant. The SynaBASE align function, SXAlign, includes a filter to remove the random short alignments or 'noise' from the alignment data. The alignment scores are then used to calculate a distance matrix SynaTree uses the SXAlign API for comparing alignments SynaTree uses SXAlign API for comparing alignments

33 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Example of filtering Chart shows the effect of using diagonal alignment filter on the alignment of 2 Serine Kinase aa sequences

34 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 95 minutes! Compared to 7 days with BLAST SynaTree for 101 bacterial & archaeal genomes

35 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Cyanobacteria FirmicuteChlamydiae SynaTree for 101 bacterial & archaeal genomes

36 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 2 nd method: SXComparePattern Frequency of each pattern Raw score for patterns Calculation of distance matrix from raw score by distance formula

37 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 SXComparePattern Approach Distance matrix calculated is the same as before with some exceptions: Here, the calculation is based on shared patterns between each genomic sequences Where: A =shared patterns between genomes i and j L= number of patterns for respective genomes

38 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 23seconds! Compared to 7 days with BLAST SXComparePattern tree for 101 bacterial and archaeal genomes

39 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Chlamydiae Cyanobacteria Firmicute SXComparePattern tree for 101 bacterial and archaeal genomes

40 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Perfomance based on grouping

41 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Evaluation of phylogenetic networks Evaluation of phylogenetic networks based on c-score proposed by Henz, et al. (2005) Which is essentially a sum of compatible non-trivial splits (Tc) divided by the sum of all non-trivial splits in the test tree Assumption is that the compatability of non-trivial splits is compared against a reference tree which is deemed 'correct'.

42 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 NCBI Reference Tree

43 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888

44 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Zoomed tree of 488 Bacterial Genomes

45 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Performance comparison Rapid method for inferring phylogenetic networks. SXComparePattern highlighted above and marked with * is with 488 bacterial sequences

46 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 Summary SynaBASE platform extensible to phylogenetics Pattern based approach provides for a very rapid and scalable means of clustering genomes into phylogenetic networks Enables multi-supercomputer performance from a single server This same approach can be used to cluster and analyse previously improbable data sets, e.g. All primate genomes All genes Iterative analysis of evolutionary phylogenetics

47 Copyright © 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) PIN = 444888 END OF WEBCAST Thank you for your participation! Next Webcast will be on April 30 – “Use of SynaBASE for assembly of reads from 454 Life Sciences sequencing platform” A full paper of the work presented will be sent to you on Monday next week Please email: enquiries@synamatix.com if you have any questions or would like a free trial


Download ppt "Copyright © 2004 Synamatix sdn bhd (538481-U) For audio portion of webcast please dial: +44 (0)870 22 333 65 (please omit zero if calling from outside."

Similar presentations


Ads by Google