Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of Calgary
What am I doing here? Next Generation Sequencing Next Generation Web Future challenges Genome Canada Bioinformatics Platform
Better tech: less DNA, more sequence 44μm 70nm
PhytoMetaSyn
Sprockets: Hierarchical Gene Models from ESTs Developed in collaboration with BASF Plant Sciences
Genozymes
Hydrocarbon Metagenomics
Exploring gene expression patterns CAVEman Java 3D-based, world-first complete 3D human body atlas (adult male) – 2,335 organs, hierarchical organization following Terminologia Anatomica Numerous applications involving mapping of genetic and disease data More information: Patient MRI stack mapped onto atlas and registered by landmarks Pharmacokinetics visualization (Absorption-distribution-metabolism- excretion of Aspirin)
Basic Research Archaeal UV-light response Large-scale human genome organization ING-protein interactions (cancer and ageing-rated proteins)
Research Applications Kidney transplants: improved rejection diagnostics in Edmonton Mad cow disease/chronic wasting disease: live diagnostics Desulf.: mechanisms of oil pipeline corrosion and its prevention
DNA Diagnostics Discovery for Mad Cow PreclinicalClinicalPreinoculation Controls Control animal #6 Ball toy Photo: S. Czub, CFIA Lethbridge
Next-gen Motif finding (elk dataset) 61 blood samples 107 million base pairs 432 billion pairwise alignments ( ) mers or smaller Uninfected Infected 3 universal Infected Thousands of animal coverage/timepoint combos (CPU intensive) Decypher hardware accelerator
Motif Results
↑ EVI1 ↑PLZF Retrovirus PrP sc (+?) ↓PLZF-controlled genes Infectious agent Circulating Nucleic Acids Endogenous Retrovirus? Consistent with protein-only evidence… Neurovirulent? (e.g. M.L. Labat 1999) Possible mode of action? Virus particles? ~25nm PrP Amyloid fibres Vacuole Manuelidis et al, PNAS 2007 Protected promoters (Motifs A & B) Feedback PrP Integration Nucleoprotein complexes Cell death CNA Export Carp et al., EMBO J., 2006 Leblanc et al., EMBO J Stengel et al., Biochem. Biophys. Res. Commun Lee et al., Biochem. Biophys. Res. Commun Etc. Activation
Better tech: less input, more results Better tech: less DNA, more sequence Generate Manuscript Now Generate Manuscript Now
Where are we at? Bioinformatics Web Emerging Technologies Life Sciences Semantic Web Source: Gartner Inc.
How software works… Functions/ Rules Parameters/Input Results/ Output (article, allele,…) (Gene name, DNA sequence, QTL…)
The problem with the Web Once you label me, you negate me. Søren Kierkegaard 1998 Now
Bluejay Comparative genomics BioMoby linking Waypoints Gene expression integration
The task at hand (biologist) Sequencer Data File (Binary) ACCGT… Known Proteins BLAST Report (related proteins) (computer scientist)
DNASequence NCBI_gi Sequence_Alignment
Audience God Amoeba Taverna self-starters Willing to take training Capable but fearful Self-perception of computer skills
The need for shoehorns The current vision of the Semantic Web intends to create a new structure starting up with no reference to its vast, functioning, but more primitive predecessor … things just don’t happen like that
All the Web as Workflows Seahawk Proxied Web page Drag ‘n’ drop Seahawk prompting
What’s Ahead? The more a man learns, the more he realizes how little he knows
Semantic Web
Take home messages As tech improves, we can ask better questions We will need shoehorns to access existing resources for the foreseeable future